Read Introduction to Teradata RDBMS text version

Introduction to Teradata® RDBMS

V2R4.1

B035-1091-061A June 2001

The product described in this book is a licensed product of NCR Corporation. BYNET is an NCR trademark registered in the U.S. Patent and Trademark Office. CICS, CICS/400, CICS/600, CICS/ESA, CICS/MVS, CICSPLEX, CICSVIEW, CICS/VSE, DB2, DFSMS/MVS, DFSMS/ VM, IBM, NQS/MVS, OPERATING SYSTEM/2, OS/2, PS/2, MVS, QMS, RACF, SQL/400, VM/ESA, and VTAM are trademarks or registered trademarks of International Business Machines Corporation in the U. S. and other countries. DEC, DECNET, MICROVAX, VAX and VMS are registered trademarks of Digital Equipment Corporation. HEWLETT-PACKARD, HP, HP BRIO, HP BRIO PC, and HP-UX are registered trademarks of Hewlett-Packard Co. KBMS is a trademark of Trinzic Corporation. INTERTEST is a registered trademark of Computer Associates International, Inc. MICROSOFT, MS-DOS, MSN, The Microsoft Network, MULTIPLAN, SQLWINDOWS, WIN32, WINDOWS, WINDOWS 2000, and WINDOWS NT are trademarks or registered trademarks of Microsoft Corporation. SAS, SAS/C, SAS/CALC, SAS/CONNECT, and SAS/CPE are registered trademarks of SAS Institute Inc. SOLARIS, SPARC, SUN and SUN OS are trademarks of Sun Microsystems, Inc. TCP/IP protocol is a United States Department of Defense Standard ARPANET protocol. TERADATA and DBC/1012 are registered trademarks of NCR International, Inc. UNICODE is a trademark of Unicode, Inc. UNIX is a registered trademark of The Open Group. X and X/OPEN are registered trademarks of X/Open Company Limited. YNET is a trademark of NCR Corporation. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN "AS-IS" BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES, SO THE ABOVE EXCLUSION MAY NOT APPLY TO YOU. IN NO EVENT WILL NCR CORPORATION (NCR) BE LIABLE FOR ANY INDIRECT, DIRECT, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS OR LOST SAVINGS, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. The information contained in this document may contain references or cross references to features, functions, products, or services that are not announced or available in your country. Such references do not imply that NCR intends to announce such features, functions, products, or services in your country. Please consult your local NCR representative for those features, functions, products, or services available in your country. Information contained in this document may contain technical inaccuracies or typographical errors. Information may be changed or updated without notice. NCR may also make improvements or changes in the products or services described in this information at any time without notice. To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization, and value of this document. Please e-mail: [email protected] or write: Information Engineering NCR Corporation 100 North Sepulveda Boulevard El Segundo, CA 90245-4361 U.S.A. Any comments or materials (collectively referred to as "Feedback") sent to NCR will be deemed non-confidential. NCR will have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display, transform, create derivative works of and distribute the Feedback and derivative works thereof without limitation on a royalty-free basis. Further, NCR will be free to use any ideas, concepts, know-how or techniques contained in such Feedback for any purpose whatsoever, including developing, manufacturing, or marketing products or services incorporating Feedback. Copyright © 1996-2001, NCR Corporation All Rights Reserved

Preface

Supported Software Release

This book supports Teradata RDBMS release V2R4.1.

Changes to This Book

This book includes the following changes to support the current release:

Date Description

June 2001

Added the following features and enhancements: · Single Sign On · Hash Index · UTF-8 character set support · Integrated Database Query Manager · Resource Check Tool · 128 K data block support · Increased number of global and volatile temporary tables Updated the following features: · Stored procedures · Indexes · Backup utilities

September 2000

· Updated glossary. Removed references to UNISYS and KBMS/Intellect. Replaced reference to Nomad2 to NOMAD.

Introduction to Teradata RDBMS

i

Preface

Date

Description

June 2000

· COLLECT STATISTICS: Collected statistics are now stored in a spool table so that you can collect statistics at the same time you execute queries against the table. · Fallback: You can now define join index subtables with fallback. · Online Analytical Processing (OLAP): The operation of OLAP sampling is now optimized in the file system by only accessing the data blocks that contain the target row positions instead of scanning all the data blocks. · PDE Tools Utility (NT Only): A new PDE tools utility allows you to run the ctl, DBS control, DIP, and Vproc Manager utilities from any TPA or non-TPA or from the AWS. You can also start, reset, or stop the PDEs. · EXPLAIN: The size of the EXPLAIN text is now unlimited. · Stored Procedures: NCR has developed the Teradata Stored Procedures (TDSP) feature for the Teradata RDBMS. The feature, enables development of database applications using the Stored Procedure Language (SPL).

April 1999

· A new virtual BYNET driver has eliminated the need for the vnet driver on systems that have no BYNET hardware. · Addition of the use of triggers with SQL statements. · Addition of a join index to improve performance · Internationalization of Kanji characters · Increase in the maximum number of vprocs to 16k · Increase in the Teradata RDBMS row size to 64k · Addition of OLAP features for statistical functions, extended date/calendar capability, and sampling. · Addition of Timestamping to the Data Dictionary.

ii

Introduction to Teradata RDBMS

Preface About This Book

About This Book

Purpose

This book provides an introduction to the Teradata RDBMS covering: · · · · · Teradata RDBMS architecture and the relational model Applications and operations Data definitions and data manipulation of Teradata Structured Query Language (SQL) System administration and security System maximum capacities

Audience

This book is intended for users who interface with the Teradata RDBMS. Such individuals may include database users or administrators.

How This Book Is Organized

This book contains the following chapters and a glossary: Chapter 1: "Overview,"presents an overview of the Teradata RDBMS. This includes the development and design of the RDBMS, an overview of the client and server software, and a description of the process flow of an SQL query to the database. Chapter 2: "Teradata RDBMS Architecture," describes the RDBMS hardware and software architecture. Chapter 3: "The Relational Model," presents an overview of the relational model for database management, including an introduction to normalization. It also describes and explains referential integrity and constraints, and indexes. Chapter 4: "Structured Query Language (SQL)," describes the data definition capabilities of Teradata SQL, the Structured Query Language, including how to create, change, and delete databases, tables, indexes, stored procedures and macros. Chapter 5: "Data Dictionary," describes what the data dictionary is, how it is organized, and how you use it. Chapter 6: "Application Development," introduces application development in the Teradata RDBMS environment, including the use of embedded SQL and CLI calls in client programming languages. Chapter 7: "Fault Tolerance," describes fault tolerance in the Teradata RDBMS, including both hardware and software elements.

Introduction to Teradata RDBMS

iii

Preface About This Book

Chapter 8: "Concurrency Control and Transaction Recovery," describes object locking, serializing transactions, and the two-phase commit protocol for distributed databases. Chapter 9: "Security and Integrity," discusses how the Teradata RDBMS is designed to protect your data from corruption and unauthorized access. Chapter 10: "System Administration," discusses RDBMS space allocation, accounting, monitoring, and server-resident utilities. Chapter 11: "System Maximum Capacities," lists the maximum storage capacities for the system, databases and sessions on Teradata RDBMS.

Prerequisites

To gain an understanding of Teradata RDBMS, you should be familiar with the following: · · · · Basic computer technology NCR system hardware X Window System Windows 2000

iv

Introduction to Teradata RDBMS

Preface List of Acronyms

List of Acronyms

The following acronyms, listed in alphabetical order, are used in this book:

1NF 2NF 2PC 3NF 4NF 5NF AMP ANSI API ASCII ASF2 AWS BCNF BTEQ BYNET CICS CLIv2 CMS CNS DB2 DBC DBQM DBS DBW DD DDE DDL First Normal Form Second Normal Form Two-Phase Commit Third Normal Form Fourth Normal Form Fifth Normal Form Access Module Process American National Standards Institute Application Programming Interface American Standard Code for Information Interchange Archive Storage Facility 2 Administration Workstation Boyce-Codd Normal Form Basic Teradata Query Facility Banyan Network (high-speed interconnect) Customer Information Control System Call-Level Interface, Version 2 Conversational Monitor System Console Subsystem DATABASE 2 Database Computer Database Query Manager Database System or Database Software Database Window Data Dictionary Dynamic Data Exchange Data Definition Language

Introduction to Teradata RDBMS

v

Preface List of Acronyms DIP DML EBCDIC FIPS HI IBM I/O JBOD JI LAN LUN MIPS MOSI MPP MTDP MVS NIST NUPI NUSI ODBC OLAP OLTP OS/VS PDE PE PJ/NF PP2 RAID RCT RDBMS RI Database Initialization Program Data Manipulation Language Extended Binary Coded Decimal Interchange Code Federal Information Processing Standards Hash Index International Business Machines Corporation Input/Output Just a Bunch Of Disks Join Index Local Area Network Logical Unit Millions of Instructions Per Second Micro Operating System Interface Massively Parallel Processing Micro Teradata Director Program Multiple Virtual Storage National Institute of Standards and Technology Non-Unique Primary Index Non-Unique Secondary Index Open Database Connectivity Online Analytical Processing Online Transaction Processing Operating System/Virtual Storage Parallel Database Extensions Parsing Engine Projection-Join Normal Form Preprocessor2 Redundant Array of Independent Disks Resource Check Tools Relational Database Management System Referential Integrity

vi

Introduction to Teradata RDBMS

Preface List of Acronyms SIA SMP SPL SQL SQL/DS SSO TCP/IP TDP TDSP TOS TPA TS/API TSO UPI USI VM/CMS VM/SP VNET VPROC VS Shared Information Architecture Symmetric Multi-Processing Stored Procedure Language Structured Query Language Structured Query Language/Data System Single Sign On Transmission Control Protocol/Internet Protocol Teradata Director Program Teradata Stored Procedures Teradata Operating System Trusted Parallel Application Transparency Series/Application Program Interface Time Sharing Option Unique Primary Index Unique Secondary Index Virtual Machine/Conversational Monitor System Virtual Machine/System Product Virtual Network Virtual Processor Virtual Storage

Introduction to Teradata RDBMS

vii

Preface Technical Information on the Web

Technical Information on the Web

The NCR home page (http://www.ncr.com) provides links to numerous sources of information about Teradata. Among the links provided are sites that deal with the following subjects: · · · · · · · Contacting technical support Enrolling in customer education courses Ordering and downloading product documentation Accessing case studies of customer experiences with Teradata Accessing third party industry analyses of Teradata data warehousing products Accessing white papers Viewing or subscribing to various online periodicals

viii

Introduction to Teradata RDBMS

Contents

Preface Supported Software Release ............................................................................................ i Changes to This Book ....................................................................................................... i About This Book .................................................................................................................iii List of Acronyms ................................................................................................................. v Technical Information on the Web................................................................................ viii

Chapter 1: Overview What is the Teradata RDBMS? ...................................................................................... 1­2 Design Perspectives ........................................................................................................ 1­3 Development ................................................................................................................. 1­3 Design............................................................................................................................. 1­4 Shared Information Architecture ............................................................................... 1­4 Teradata Software ........................................................................................................... 1­5 Client Software.............................................................................................................. 1­5 Server Software............................................................................................................. 1­7 Structured Query Language (SQL) ............................................................................ 1­7 Stored Procedures and SPL....................................................................................... 1­10 Third-Party Software ................................................................................................. 1­10 For More Information ................................................................................................... 1­11

Chapter 2: Teradata RDBMS Architecture SMP and MPP Machines ................................................................................................ 2­2 The BYNET ....................................................................................................................... 2­3 BYNET Functions ......................................................................................................... 2­3 Virtual Processor Connectivity................................................................................... 2­4 Disk Arrays....................................................................................................................... 2­5 Drive Groups and LUNS ............................................................................................. 2­5 Pdisks ............................................................................................................................. 2­5 Virtual Disks (vdisks) .................................................................................................. 2­5 Cliques .............................................................................................................................. 2­6

Introduction to Teradata RDBMS

ix

Contents

Parallel Database Extensions (PDE).............................................................................. 2­7 The Teradata File System ............................................................................................ 2­7 Trusted Parallel Applications ........................................................................................ 2­9 Virtual Processors.......................................................................................................... 2­10 Parsing Engine (PE).................................................................................................... 2­10 AMPs ............................................................................................................................ 2­11 AMP Clusters .............................................................................................................. 2­12 Processing of SQL Requests ...................................................................................... 2­12 The Dispatcher ............................................................................................................ 2­14 SQL Example............................................................................................................... 2­15 Processing the Steps ................................................................................................... 2­16 Data Communications Management.......................................................................... 2­18 Call Level Interface..................................................................................................... 2­18 WinCLI and ODBC..................................................................................................... 2­18 Teradata Director Program (TDP)............................................................................ 2­18 Micro TDP.................................................................................................................... 2­19 Application Programming Facilities........................................................................... 2­20 Stored Procedures....................................................................................................... 2­20 Embedded SQL ........................................................................................................... 2­20 Call-Level Interface .................................................................................................... 2­20 ODBC............................................................................................................................ 2­21 Archiving and Data Loading Utilities ........................................................................ 2­22 Archive/Recovery Products ..................................................................................... 2­22 ARC and ASF2 ............................................................................................................ 2­22 Open Teradata Backup .............................................................................................. 2­22 Data Load Utilities...................................................................................................... 2­23 FastExport.................................................................................................................... 2­23 Database Utilities and Tools ..................................................................................... 2­23 Teradata Manager....................................................................................................... 2­24 Database Query Manager (DBQM).......................................................................... 2­24 Workstation Types and Available Platforms ............................................................ 2­25 Administration Workstation..................................................................................... 2­25 Database Window ...................................................................................................... 2­25 Supervisor Subwindow ............................................................................................. 2­26 RDBMS Gateway ........................................................................................................... 2­28 For More Information ................................................................................................... 2­29

Chapter 3: The Relational Model What is a Relational Model? .......................................................................................... 3­2 What is a Relational Database?...................................................................................... 3­3 Tables................................................................................................................................. 3­4

x

Introduction to Teradata RDBMS

Contents

Permanent and Temporary Tables............................................................................. 3­4 Global Temporary Tables ............................................................................................ 3­4 Volatile Temporary Tables.......................................................................................... 3­4 Maximum Number of Temporary Tables................................................................. 3­5 Normalization .................................................................................................................. 3­6 Relational Database Terminology .............................................................................. 3­7 First, Second, and Third Normal Forms ...................................................................... 3­9 First Normal Form (1NF) ............................................................................................ 3­9 Second Normal Form (2NF)........................................................................................ 3­9 Third Normal Form.................................................................................................... 3­10 Boyce-Codd and Higher Normal Forms.................................................................... 3­11 Boyce-Codd Normal Form........................................................................................ 3­11 Fourth Normal Form.................................................................................................. 3­11 Fifth Normal Form ..................................................................................................... 3­11 Referential Integrity ...................................................................................................... 3­12 Enforcing RI in the Teradata RDBMS...................................................................... 3­12 Referencing (Child) Table ......................................................................................... 3­12 Referenced (Parent) Table ......................................................................................... 3­12 Terms Related to Referential Integrity .................................................................... 3­13 Why Is Referential Integrity Important? ................................................................. 3­13 Referential Constraints .............................................................................................. 3­14 Referential Constraint Checks .................................................................................. 3­15 Indexes ............................................................................................................................ 3­16 Teradata RDBMS Indexes ......................................................................................... 3­16 Primary Indexes.......................................................................................................... 3­16 How Are Primary Keys and Primary Indexes Related? ....................................... 3­17 Secondary Indexes...................................................................................................... 3­17 Join Indexes ................................................................................................................. 3­18 Hash Indexes ............................................................................................................... 3­18 Using Indexes.............................................................................................................. 3­19 For More Information ................................................................................................... 3­20

Chapter 4: Structured Query Language (SQL) Why SQL? ...................................................................................................................... 4­1 What is SQL? ................................................................................................................. 4­2 Teradata SQL Data Types ........................................................................................... 4­2 Data Type Attributes.................................................................................................... 4­3 Statements, Requests, Queries, and Transactions.................................................... 4­4 Table Constraints.......................................................................................................... 4­4 Default Database........................................................................................................... 4­4 Heterogeneous Kanji Client Support......................................................................... 4­4

Introduction to Teradata RDBMS

xi

Contents

SQL Functional Families................................................................................................. 4­5 Data Definition.............................................................................................................. 4­5 Data Control .................................................................................................................. 4­6 Data Manipulation ....................................................................................................... 4­6 The SELECT Statement ................................................................................................... 4­8 Only SELECT Uses Set Operators .............................................................................. 4­8 Joins ................................................................................................................................ 4­9 OLAP Functions............................................................................................................ 4­9 Statement Punctuation................................................................................................. 4­9 Query Facilities .............................................................................................................. 4­11 Basic Teradata Query ................................................................................................. 4­11 Embedded SQL ........................................................................................................... 4­11 Cursors ......................................................................................................................... 4­12 SQL Flagger ................................................................................................................. 4­12 Teradata Stored Procedures......................................................................................... 4­13 Why Use Stored Procedures?.................................................................................... 4­13 Elements of SPL .......................................................................................................... 4­13 Indexes ............................................................................................................................ 4­15 Specifying an Index .................................................................................................... 4­15 Index Access Methods ............................................................................................... 4­15 Triggers ........................................................................................................................... 4­17 Trigger Functions........................................................................................................ 4­17 Restrictions on Using Triggers ................................................................................. 4­17 Views ............................................................................................................................... 4­18 Creating a View .......................................................................................................... 4­18 Restrictions on Using Views ..................................................................................... 4­18 Altering a Teradata RDBMS View ........................................................................... 4­18 For More Information ................................................................................................... 4­19

Chapter 5: Data Dictionary DD Objects ..................................................................................................................... 5­1 DD Users........................................................................................................................... 5­2 Structure of the Data Dictionary ................................................................................... 5­3 What Is a View? ............................................................................................................ 5­3 Why Use Views? ........................................................................................................... 5­3 Summary of DD Views ................................................................................................ 5­4 End User Views............................................................................................................. 5­4 Supervisory User Views .............................................................................................. 5­5 Recovery Control User Views..................................................................................... 5­5 Administrative Views .................................................................................................. 5­5

xii

Introduction to Teradata RDBMS

Contents

Using the Data Dictionary.............................................................................................. 5­6 SQL Access to DD......................................................................................................... 5­6 For More Information ..................................................................................................... 5­7

Chapter 6: Application Development Explicit SQL Development.......................................................................................... 6­1 Implicit SQL Development ......................................................................................... 6­1 Using Embedded SQL Applications............................................................................. 6­2 What Is Embedded SQL? ............................................................................................ 6­2 How Does an Application Program Use Embedded SQL?.................................... 6­2 Supported Languages and Platforms ........................................................................ 6­3 Using Macros as SQL Applications .............................................................................. 6­4 Creating a Macro .......................................................................................................... 6­4 Using a Macro ............................................................................................................... 6­4 Modifying a Macro....................................................................................................... 6­5 Deleting a Macro........................................................................................................... 6­5 Using Stored Procedures as SQL Applications........................................................... 6­6 Creating and Using Stored Procedures..................................................................... 6­6 Executing Stored Procedures ...................................................................................... 6­7 Basic Teradata Query Utility ......................................................................................... 6­9 BTEQ Support ............................................................................................................... 6­9 BTEQ Communication................................................................................................. 6­9 Call-Level Interface ....................................................................................................... 6­10 What Is the CLI? ......................................................................................................... 6­10 Extended CLI .............................................................................................................. 6­10 WinCLI......................................................................................................................... 6­11 ODBC Driver.................................................................................................................. 6­12 Queryman....................................................................................................................... 6­13 Third-Party Products .................................................................................................... 6­14 TS/API Products ........................................................................................................ 6­14 Compatible Third-Party Software Products........................................................... 6­14 The EXPLAIN Statement.............................................................................................. 6­15 How the EXPLAIN Statement Works ..................................................................... 6­15 Example: EXPLAIN With Unique Primary and Non-Unique Secondary Indexes....................................................................... 6­15 Example: EXPLAIN With Where Condition .......................................................... 6­16 For More Information ................................................................................................... 6­17

Introduction to Teradata RDBMS

xiii

Contents

Chapter 7: Fault Tolerance Software Fault Tolerance................................................................................................ 7­2 Vproc Migration............................................................................................................ 7­2 Fallback Tables.............................................................................................................. 7­3 AMP Clusters ................................................................................................................ 7­4 Journaling ...................................................................................................................... 7­5 Archive/Recovery ........................................................................................................ 7­7 Table Rebuild Utility .................................................................................................... 7­7 Hardware Fault Tolerance ............................................................................................. 7­8 Dual BYNETs ................................................................................................................ 7­8 RAID Disk Units ........................................................................................................... 7­8 Multiple Channel and LAN Connections ................................................................. 7­8 Isolation From Client Hardware Defects .................................................................. 7­8 Battery Backup .............................................................................................................. 7­9 Redundant Power Supplies and Fans........................................................................ 7­9 Hot Swap Capability for Node Components ........................................................... 7­9 Cliques............................................................................................................................ 7­9 For More Information ................................................................................................... 7­10

Chapter 8: Concurrency Control and Transaction Recovery Concurrency Control.................................................................................................... 8­1 Recovery......................................................................................................................... 8­1 Concept of a Transaction................................................................................................ 8­2 Definition of a Transaction.......................................................................................... 8­2 Definition of Serializability ......................................................................................... 8­2 Transaction Semantics ................................................................................................. 8­2 ANSI Mode Transactions ............................................................................................ 8­2 Rolling Back an ANSI Transaction............................................................................. 8­3 Teradata Mode Transactions....................................................................................... 8­3 How To Undo an Update ............................................................................................ 8­4 Concept of a Lock ............................................................................................................ 8­5 Overview of Teradata RDBMS Locking.................................................................... 8­5 Why do Database Management Systems Require Locking?.................................. 8­5 User Lock Levels........................................................................................................... 8­6 Levels of Lock Types.................................................................................................... 8­7 Teradata Automatic RDBMS Lock Levels ................................................................ 8­8 Deadlocks and Deadlock Resolution ......................................................................... 8­8

xiv

Introduction to Teradata RDBMS

Contents

Host Utility Locks............................................................................................................ 8­9 HUT Lock Types........................................................................................................... 8­9 HUT Lock Characteristics ........................................................................................... 8­9 System and Media Recovery ....................................................................................... 8­10 System Restarts ........................................................................................................... 8­10 Transaction Recovery................................................................................................. 8­10 Down AMP Recovery ................................................................................................ 8­11 Two-Phase Commit Protocol....................................................................................... 8­12 Definition of Participant ............................................................................................ 8­12 Definition of Coordinator.......................................................................................... 8­12 For More Information ................................................................................................... 8­13

Chapter 9: Security and Integrity Definition of Security ................................................................................................... 9­1 Definition of Integrity .................................................................................................. 9­1 Enforcing System Security .......................................................................................... 9­2 System Integrity............................................................................................................ 9­2 Resource Access Control ................................................................................................ 9­3 User Identifiers ............................................................................................................. 9­3 Client Identifiers ........................................................................................................... 9­3 Logon Policies ............................................................................................................... 9­3 TDP Security.................................................................................................................. 9­4 Single Sign On............................................................................................................... 9­5 Password Security Features ........................................................................................ 9­6 DBC.SysSecDefaults Table .......................................................................................... 9­7 Password Encryption................................................................................................... 9­7 Logon Control Using GRANT and REVOKE LOGON........................................... 9­7 Controlling Data Access by Granting or Revoking Access .................................... 9­7 DD System Views for Access Information................................................................ 9­8 Establishing a Security Policy and Physical Access Control..................................... 9­9 Key Elements of a Security Policy.............................................................................. 9­9 Establishing a Security Policy..................................................................................... 9­9 For More Information ................................................................................................... 9­10

Chapter 10: System Administration Space Allocation ......................................................................................................... 10­1 Accounting .................................................................................................................. 10­1 Monitoring................................................................................................................... 10­1

Introduction to Teradata RDBMS

xv

Contents

Space Allocation: Databases and Users...................................................................... 10­2 Creating Databases and Users .................................................................................. 10­2 Example: Creating a Finance and Administration Database ............................... 10­2 Example: Creating Databases ................................................................................... 10­4 Example: Creating Users ........................................................................................... 10­6 Accounting ..................................................................................................................... 10­7 Session Management.................................................................................................. 10­7 Establishing a Session ................................................................................................ 10­7 Logon Parameters....................................................................................................... 10­7 Session Requests ......................................................................................................... 10­7 Account Usage ............................................................................................................ 10­8 Account String Expansion......................................................................................... 10­8 Account Performance Groups .................................................................................. 10­8 System Monitoring ...................................................................................................... 10­10 Teradata Manager..................................................................................................... 10­10 System and Configuration Status........................................................................... 10­10 Resource Usage ......................................................................................................... 10­11 ResUsage Data .......................................................................................................... 10­11 ResUsage Macros...................................................................................................... 10­11 ResUsage Data Categories....................................................................................... 10­11 ResUsage Data in Summarization Mode .............................................................. 10­12 Controlling Collection and Logging of ResUsage Data...................................... 10­12 Performance Monitoring ............................................................................................ 10­13 The TDPTMON......................................................................................................... 10­13 System Management Facility .................................................................................. 10­13 The PM/API.............................................................................................................. 10­13 Gateway Control Utility .......................................................................................... 10­14 System Utility Software ........................................................................................... 10­14 Resource Check Tools .............................................................................................. 10­15 For More Information ................................................................................................. 10­17

Chapter 11: System Maximum Capacities Database and Data Storage Capacities ....................................................................... 11­2 System Capacities .......................................................................................................... 11­4 Session Capacities.......................................................................................................... 11­5 For More Information ................................................................................................... 11­6

xvi

Introduction to Teradata RDBMS

Contents

Glossary...............................................................................................................Glossary­1

Index ......................................................................................................................... Index­1

Introduction to Teradata RDBMS

xvii

Contents

xviii

Introduction to Teradata RDBMS

Chapter 1:

Overview

This chapter presents an overview of the Teradata Relational Database Management System (RDBMS). This includes: · · · · A definition of Teradata RDBMS A perspectives on system design A description of the software architecture An overview of the process flow of an SQL query to the database

Introduction to Teradata RDBMS

1­1

Chapter 1: Overview What is the Teradata RDBMS?

What is the Teradata RDBMS?

The Teradata RDBMS is a complete relational database management system. The system can use either of two attachment methods as illustrated in the following table:

This attachment method... Allows the system to be attached...

Channel Netword

directly to an I/O channel of a mainframe computer.

to intelligent workstations through a local area network (LAN).

With the Teradata RDBMS, you can access, store, and operate on data using Teradata Structured Query Language (Teradata SQL). It is broadly compatible with IBM and ANSI SQL. Users of the client system send requests to the Teradata RDBMS through the Teradata Director Program (TDP) using the Call-Level Interface (CLI) program (Version 2) or via Open Database Connectivity (ODBC) using the Teradata ODBC Driver.

1­2

Introduction to Teradata RDBMS

Chapter 1: Overview Design Perspectives

Design Perspectives

The Teradata Relational Database Management System (RDMBS) was designed with several objectives in mind, as viewed from the following perspectives: · · · Development Design Architecture

Each perspective is discussed in the following sections.

Development

The Teradata RDBMS was developed to allow users to view and manage large amounts of data as a collection of related tables. The goal of the Teradata RDBMS is to provide the benefits of a flexible relational database to its users. Benefits include:

Teradata RDBMS provides... Which...

capacity

includes: · Terabytes of detailed data stored in billions of rows · Thousands of Microinstructions per Second (MIPS) to process data

parallel processing single data store

makes Teradata RDBMS faster than other relational systems. · can be accessed by network-attached host systems · Supports the requirements of many diverse clients

fault tolerance data integrity

automatically detects and recovers hardware failures. ensures that transactions either complete or rollback to a stable state if a fault occurs. allows expansion without sacrificing performance. serves as a standard access language that permits customers to control data.

scalable growth SQL

Introduction to Teradata RDBMS

1­3

Chapter 1: Overview Design Perspectives

Design

Teradata RDBMS developers wanted to build an inexpensive, high-quality system from mostly off-the-shelf hardware components, which exceeds the performance of conventional relational database management systems. The hardware components of the Teradata RDBMS evolved from those of a database machine into those of a general-purpose, massively parallel machine running the Teradata RDBMS as a Trusted Parallel Application (TPA). The architecture supports both Symmetric Multi-Processing (SMP) and Massively Parallel Processing (MPP) systems where the distributed functions communicate by means of a fast interconnect structure. The interconnect structure in the current architecture is the BYNET for MPP systems and the boardless BYNET for SMP systems.

Shared Information Architecture

One of the principal goals for the design of the Teradata RDBMS was to provide a single data store for any number of client architectures. This Shared Information Architecture (SIA) eliminates the need for maintaining duplicate databases on multiple platforms. With the SIA, most mainframe clients, workstations, and personal computers can access and manipulate the same database simultaneously. The following figure illustrates the principle of the SIA. In this figure the mainframes are attached via channel connections and other systems are attached via network connections.

Bull GCOS mainframe IBM MVS mainframe

Teradata DBS single data store Local Area Network Personal Computer (Windows) 9x/NT

IBM VM mainframe

UNIX workstation

FG01d001

1­4

Introduction to Teradata RDBMS

Chapter 1: Overview Teradata Software

Teradata Software

The Teradata system software is the foundation of the relational database server. The purpose of the software is to support SQL manipulations of the database. Teradata software includes: · · · · · Client software, which includes UNICODE support Server software used to support the Teradata RDBMS Structured Query Language (SQL) and its uses for application programming and interactive database queries Stored procedures Third-party software

Client Software

Teradata RDBMS supports the UCS Transformation Format 8-bit form (UTF-8) UNICODE client character set on Teradata clients on a per session basis. Oneto three-byte UTF-8 UNICODE characters from the client are stored on Teradata RDBMS as two-byte UNICODE characters. By supporting UNICODE, the Teradata RDMBS can: · · Support data in multiple languages in the same session Support non-Latin data, for example, Japanese, Chinese, and so forth, on all systems

Teradata RDBMS supports running with or without a channel- or networkattached client. The following table describes the available client software, recognizing that the "client" may be the system running the Teradata RDBMS itself:

Software Function Supported Access

BTEQ C Preprocessor COBOL Preprocessor PL/I Preprocessor

Interactive and batch query processor/report generator Permits embedding SQL in C programs Permits embedding SQL in COBOL programs Permits embedding SQL in PL/I programs

Channel- and network-attached clients Channel- and network-attached clients Channel-attached clients Some network-attached clients Channel-attached clients

Introduction to Teradata RDBMS

1­5

Chapter 1: Overview Teradata Software

Software Function Supported Access

CLI

Packages and unpackages requests Can be embedded in application programs using function calls

Channel- and network-attached clients

TDP

Manages data communication Handles sessions, logging, recovery, restarts, physical I/O from the PEs, and security

Channel-attached clients

MTDP/MOSI

Manages data communication Handles logging, recovery, restarts, and physical I/O from the PEs Session and security management are handled by the Gateway software on the server.

Network-attached clients

Archive/ Restore ASF2 (MR-RAS products only)

Archives data to tape; restores taped data to Teradata RDBMS Archives data to tape; restores taped data to Teradata RDBMS Note: This software is supported on legacy systems only. Extracts large volumes of data from the Teradata RDBMS Does high-performance data loading from client into empty tables Does high-performance data maintenance, including inserts, updates, and deletions to existing tables Are open architecture products that provide backup and restore functions for Windows clients

Channel-attached clients

Network-attached clients

FastExport FastLoad

Channel- and network-attached clients Channel- and network-attached clients Channel- and network-attached clients

MultiLoad

Open Teradata Backup (OTB) includes the following: · NetBackup · NetVault

Network-attached clients

1­6

Introduction to Teradata RDBMS

Chapter 1: Overview Teradata Software

Software Function Supported Access

TPump

Provides for continuous update of tables; performs insert, update, and delete operations or a combination of these operations on tables using the same source feed Provides for access to the Teradata RDBMS from various tools increasing the portability of data

Channel- and network-attached clients

ODBC

Network-attached clients.

Contact your NCR representative for information on supported platforms for each product and for custom ports to other platforms.

Server Software

The server software includes the following: · · · · Database Window Channel and/or network gateway communications support Teradata Parallel Database Extensions (PDE) Teradata database management software · Request dispatcher · Session controller · Access module process (AMP) · Teradata file system SQL parser (syntaxer, resolver, optimizer, and so forth)

·

A server may also contain data loading utilities such as MultiLoad and FastLoad, data export utilities like FastExport, and the SQL data access utility BTEQ.

Structured Query Language (SQL)

Structured Query Language (SQL) is a data language designed specifically for manipulating data in relational databases. SQL is the only language that the Teradata RDBMS understands. Therefore, all database manipulations, whether embedded in an application program or resulting from an interactive query, must use SQL. The following figure shows the process flow of an SQL statement through the Teradata RDBMS on a channel-attached system. Process flow in a network-attached system is somewhat similar to that of a channel-attached system, except the Micro Operating System Interface (MOSI) and the Micro Teradata Director Program (MTDP) replace the TDP.

Introduction to Teradata RDBMS

1­7

Chapter 1: Overview Teradata Software

1

SQL query

Results table

16

2

CLI

15

3

TDP

14 Client

Parsing Engine

4

Parser and Dispatcher

Server 13

5

Interprocessor Network

12

BYNET

6

Database Management System

11

Database Engine

7

File System

10

8

Disk Subsystem

9

FG01B002

1­8

Introduction to Teradata RDBMS

Chapter 1: Overview Teradata Software

The following table describes the process flow shown in the previous figure:

Stage Process

1

A user generates an SQL query on the channel-attached client. The query can originate from a BTEQ session at an interactive terminal, from a compatible fourth-generation language, or from within an application program coded in the host language. The CLI request handler packages the request and sends it to the Teradata Director Program (TDP) for routing to the server. The TDP establishes a session, then routes the request across the communications channel to the parsing engine (PE). The parser component of the PE opens the request package and parses the SQL code for processing, interprets it, checks the syntax, evaluates the semantics, and optimizes the access plan.

IF the SQL source code parses... THEN the...

2 3 4

without errors

parser decodes the request into a series of work steps and passes them to the dispatcher. dispatcher receives the appropriate error message and returns it to the requester. Processing terminates.

with errors

The dispatcher sequences the steps and passes them to the BYNET with instructions about whether the steps are for one Access Module Process (AMP), an AMP group, or all AMPs.

5 6

The BYNET (or virtual BYNET on a single-node system) distributes the execution steps to the appropriate AMP for processing. The AMPs process the execution steps by performing operations on the database. The AMPs make these operations by making calls to the file system. The file system performs primitive physical data block operations by locating the data blocks to be manipulated and then passing control to the disk subsystem. The disk subsystem retrieves the requested blocks for the file system. The disk manager returns the requested blocks to the file system. The file system returns the requested data to the database manager. The database manager sends a message back to the dispatcher stating that the data is ready to be returned to the requesting user. Then the database manager sorts and transmits the data to the interface engine over the BYNET.

7

8 9 10 11

Introduction to Teradata RDBMS

1­9

Chapter 1: Overview Teradata Software

Stage Process

12 13 14 15 16

The BYNET (or virtual BYNET on a single-node system) merges the sorted response and returns it to the requesting interface engine for packaging. The dispatcher builds the response message and routes it to the communications channel driver for return to the requesting client system. The TDP receives and unpacks the response messages and makes them available to the CLI. The CLI passes the received data back to the requesting application in blocks. The requesting application receives the response data in the form of a relational table.

For ODBC, the process flow is the same as above, except that ODBC replaces the CLI and MTDP.

Stored Procedures and SPL

Stored procedures provide a set of control and condition handling statements that make Teradata SQL a computationally complete programming language. A stored procedure, also known as Persistent Stored Module (PSM), is an application development tool. It provides a procedural interface to the Teradata RDBMS using the Stored Procedural Language (SPL) statements. A stored procedure is a database object and is executed on the server.

Third-Party Software

You can operate on data stored in the Teradata RDBMS using third-party software products such as QMF, CICS, and IMS/DC. These products enable you to access the system without writing an application program. Typically, these software products are interfaces for specific network workstations.

1 ­ 10

Introduction to Teradata RDBMS

Chapter 1: Overview For More Information

For More Information

For more information on the topics presented in this chapter, see the following Teradata RDBMS books:

IF you want to learn more about... THEN see...

Structured Query Language Data flows through the Teradata RDBMS General aspects of the Teradata RDBMS

Teradata RDBMS SQL Reference Teradata RDBMS Database Design Teradata RDBMS Database Design

Introduction to Teradata RDBMS

1 ­ 11

Chapter 1: Overview For More Information

1 ­ 12

Introduction to Teradata RDBMS

Chapter 2:

Teradata RDBMS Architecture

The purpose of this chapter is to briefly describe the Teradata RDBMS hardware components and the software architecture they support. The hardware that supports Teradata software is based on off-the-shelf Symmetric Multiprocessing (SMP) technology combined with a communications network connecting the SMP systems to form Massively Parallel Processing (MPP) systems.

Introduction to Teradata RDBMS

2­1

Chapter 2: Teradata RDBMS Architecture SMP and MPP Machines

SMP and MPP Machines

The components of the (SMP) and Massively Parallel Processing (MPP) hardware platforms are:

Component Description Function

Processor Node

A hardware assembly containing several, tightly coupled central processing units (CPUs). Note: An MPP is a configuration of two or more loosely coupled SMP nodes with shared SCSI access to multiple disk arrays.

Comprises an SMP configuration with a single processor node connected to one or more disk arrays with the following installed on the node: · Database software · Client interface software · Operating system · Multiprocessor sharedmemory processors · Failsafe power provisions

BYNET

Hardware interprocessor network to link nodes on an MPP system. Note: Single-node SMP systems use a softwareconfigured virtual BYNET driver to implement BYNET services.

Implements broadcast, multicast, or point-to-point communication between processors, depending on the situation.

These platforms use virtual processors that run a set of software processes on a node under the Parallel Database Extensions (PDE). Virtual processors (vprocs) provide the parallel environment that enables the Teradata RDBMS to run on SMP and MPP systems.

2­2

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture The BYNET

The BYNET

This section describes the BYNET, which is the interprocessor network technology used by the Teradata RDBMS.

BYNET Functions

At the most elementary level, you can look at the BYNET as a bus that loosely couples all the SMP nodes in a multinode system. However, this view does an injustice to the BYNET, because the capabilities of the network range far beyond those of a simple system bus. The BYNET also possesses high-speed logic arrays that provide bidirectional broadcast, multicast, and point-to-point communication and merge functions. A multinode system has two BYNETs. This creates a fault-tolerant environment and enhances interprocessor communication. When BYNET traffic becomes particularly heavy, the two BYNETs can handle separate (rather than redundant) traffic. Load-balancing software optimizes this process. The total bandwidth for each network link to a processor node is 10 megabytes. The total throughput available for each node is 20 megabytes, since each node has two network links and the bandwidth is linearly scalable. For example, a 16-node 5100M system has 320 megabytes of bandwidth for point-to-point connections. The total, available broadcast bandwidth for any size system is 20 megabytes. The BYNET software also provides a standard TCP/IP interface for communication among the SMP nodes.

Introduction to Teradata RDBMS

2­3

Chapter 2: Teradata RDBMS Architecture The BYNET

The following figure shows how the BYNET connects individual SMP nodes to create an MPP system.

BYNET Interconnect

SMP

SMP

SMP

SMP

SMP Nodes

SCSI Busses

Disk Arrays

GG01B002

Virtual Processor Connectivity

Single-node SMP systems use Boardless BYNET (or virtual BYNET) software to simulate the BYNET hardware driver. Both the SMP and MPP machines run the set of software processes called vprocs on a node under the Parallel Database Extensions (PDE). The table below describes the two types of vprocs:

Type Description

PE AMP

Performs session control and dispatching tasks as well as parsing functions Performs database functions to retrieve and update data on the virtual disks (vdisks)

Later, the chapter explains these virtual processors in more detail.

2­4

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Disk Arrays

Disk Arrays

A disk array is a matrix of independent but interconnected physical disk storage units. For the Teradata RDBMS, the disks are organized as a Redundant Array of Independent Disks (RAID), as RAID 1 (mirroring), or as RAID 5 or RAIDS (parity) technology using RAID Manager. Each array typically consists of from one to four ranks of disks, with up to five disks per rank.

Drive Groups and LUNS

The RAID Manager uses drive groups. A drive group is a set of drives that have been configured into one or more logical units (LUNs). All the disks in a drive group must be of the same RAID level (1,5, or S). An LUN is a portion of every drive in a drive group. This portion is configured to represent a single UNIX disk. Each LUN is uniquely identified and sliced into one or more UNIX slices.

Pdisks

A pdisk is a slice of LUN that is assigned to an AMP. Each pdisk is uniquely identified and independently addressable.

Virtual Disks (vdisks)

The group of pdisks assigned to an AMP is collectively identified as a vdisk. A vdisk is used to control the assignment of pdisks to an AMP.

Introduction to Teradata RDBMS

2­5

Chapter 2: Teradata RDBMS Architecture Cliques

Cliques

The clique is a feature of multinode systems that physically groups nodes together by multiported access to common disk array units. Inter-node disk array connections are made using SCSI buses. Shared SCSI-II paths enable redundancy to ensure that loss of a processor node or disk controller does not limit data availability. The nodes do not share data. They only share access to the disk arrays. The following figure illustrates a four-node clique.

Node 1

MCA

Q 720

Node 2

MCA

Q 720

Node 3

MCA

Q 720

Node 4

MCA

Q 720

MCA

MCA

MCA

MCA

SCSI

D A C

GG01A003

A clique is the physical medium that supports the migration of vprocs during times of node failure. If a node in a clique fails, then its vprocs migrate to another node in the clique and continue to operate while recovery occurs on their home node. PEs for channel-attached hardware cannot migrate because they are dependent on the hardware that is physically attached to the node to which they are assigned. PEs for LAN-attached connections do migrate when a node failure occurs, as do all AMPs.

2­6

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Parallel Database Extensions (PDE)

Parallel Database Extensions (PDE)

The PDE software is an interface layer on top of the operating system. The PDE provide the ability to: · · · · Execute vprocs (virtual processors) Run the Teradata RDBMS in a parallel environment Apply a flexible priority scheduler to Teradata RDBMS sessions Debug the operating system kernel and the Teradata RDBMS using resident debugging facilities

The PDE also enable an MPP system to: · · Take advantage of hardware features such as the BYNET and shared/reliable disk arrays Process user applications written for an operating system (on nodes and disks different from those configured for the parallel database)

A PDE Tools utility GUI is now available for NT. You can run the utility from any of the nodes Trusted Parallel Applications (TPA) and non-TPA in the configuration or from the Administrative Workstation (AWS). The following utilities are included in the GUI: · · · · · ctl Database Initialization Program (DIP) dbscontrol Supervisor window Vproc Manager

In addition, you can start, reset, and stop the PDEs using the GUI. This is a Windows NT-only feature.

The Teradata File System

The special-purpose Teradata file system is a layer of software between the Teradata RDBMS layer and the PDE layer. Teradata file system service calls allow the Teradata RDBMS to store and retrieve data efficiently regardless of low-level operating system interfaces. The data block is a disk-resident structure that contains one or more rows from the same table and is the physical I/O unit for the Teradata file system. Block sizes range between 6144 bytes and 128 KB (127.5 KB in reality), or from 12 to 255 sectors.

Introduction to Teradata RDBMS

2­7

Chapter 2: Teradata RDBMS Architecture Parallel Database Extensions (PDE)

You can set the default maximum data block size as follows:

Set this value either... Using...

as a system default individually

DBS Control utility. a table level attribute, such as CREATE or ALTER table.

2­8

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Trusted Parallel Applications

Trusted Parallel Applications

The PDE provide a series of parallel operating system services to a special class of tasks called a trusted parallel application (TPA). On an SMP or MPP system, the TPA is the Teradata RDBMS. On an MPP system, the TPA services include: · · · · · · Facilities to manage parallel execution of the TPA on multiple nodes Dynamic distribution of execution processes Coordination of all execution threads, whether on the same or on different nodes Balancing of the TPA workload within a clique Resident debugging facilities in addition to kernel and application debuggers A Dump Freeze Elimination feature in which all nodes remain available on the BYNET while the systems captures a PDE dump.

Introduction to Teradata RDBMS

2­9

Chapter 2: Teradata RDBMS Architecture Virtual Processors

Virtual Processors

The versatility of the Teradata RDBMS is based on virtual processors (vprocs) that eliminate dependency on specialized physical processors. Vprocs are a set of software processes that run on a node under the Teradata Parallel Database Extensions (PDE) within the multitasking environment of the operating system. A single system can support a maximum of 16,384 vprocs. The maximum number of vprocs per node can be as high as 128. Each vproc is a separate, independent copy of the processor software, isolated from other vprocs, but sharing some of the physical resources of the node, such as memory and CPUs. Multiple vprocs can run on an SMP platform or a node. Vprocs and the tasks running under them communicate using unique-address messaging, as if they were physically isolated from one another. This message communication is done using the Boardless BYNET Driver software on singlenode platforms or BYNET hardware and BYNET Driver software on multinode platforms. The following sections describe each type of vproc.

Parsing Engine (PE)

The Parsing Engine is the virtual processor that communicates with the client system on one side and with the AMPs (via the BYNET or Boardless BYNET) on the other side. Each PE executes the database software that manages sessions, decomposes SQL statements into parallel steps, and returns the answer rows to the requesting client.

2 ­ 10

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Virtual Processors

The software, as shown in the following figure consists of the following elements:

Parsing Engine Elements Process

Parser (including the Optimizer) Dispatcher

Decomposes SQL into relational data management processing steps Receives processing steps from the Parser and sends them to the appropriate AMPs Provides user session management such as establishing and terminating sessions

Session Control

Parser (including Optimizer) Dispatcher PDE Operating System

Session Control

GG01B029

AMPs

The Access Module Processor (AMP) is the heart of the Teradata RDBMS. The Access Module Processor is a virtual processor (vproc) that provides a BYNET interface and performs many database and file-management tasks. AMPs control the management of the Teradata RDBMS and the disk subsystem, with each AMP being assigned to a virtual disk. Each AMP controls the following set of functions: · · BYNET (or Boardless BYNET) interfacing Database management · Locking · Joins · Sorting · Aggregation · Output data conversion · Disk space management · Accounting · Journaling

Introduction to Teradata RDBMS

2 ­ 11

Chapter 2: Teradata RDBMS Architecture Virtual Processors

·

File-system and disk management

Each AMP, as shown in the following figure is assigned a portion of the database to control. Each AMP also maintains and stores its portion of the database tables on disks.

Relational Database Management File System/Data Management PDE Operating System

GG01B028

AMP Clusters

AMPs are grouped into logical clusters to enhance the fault- tolerance capabilities of the Teradata RDBMS. For more information on this method of creating additional fault tolerance in a system see Chapter 7: "Fault Tolerance."

Processing of SQL Requests

The SQL parser handles all incoming SQL requests. It processes an incoming request as follows:

Stage Process

1

The Parser looks in the Request cache to determine if the request is already there.

IF the request is... THEN the Parser...

in the Request cache not in the Request cache

generates AMP steps and passes them to the gncApply software begins processing the request with the Syntaxer

2 ­ 12

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Virtual Processors

Stage Process

2

The Syntaxer checks the syntax of an incoming request.

IF there are... THEN the Syntaxer...

no errors errors

converts the request to a parse tree and passes it to the Resolver passes an error message back to the requestor

3

The Resolver adds information from the Data Dictionary cache to convert database, table, view, stored procedure and macro names to numeric identifiers. Then the resolver produces lists of objects and access rights. The output is a Resolver tree, which the Resolver passes to a security-checking mechanism. The security module checks access rights in the Data Dictionary.

IF the access rights are... THEN the Security module...

4

valid not valid

passes the request to the Optimizer aborts the request

5 6 7

The Optimizer determines the most effective way to access the data needed by a request. The Optimizer scans the request to determine where locks should be placed, then passes the optimized parse tree to the Generator. The Generator transforms the optimized parse tree into plastic steps and passes them to the gncApply software. Plastic steps are directives to the database management system that do not contain data values.

8

gncApply takes the plastic steps produced by the Generator and transforms them into concrete steps. Concrete steps are directives to the database management system that contain user- and session-specific information as well as data parcels.

9

gncApply passes the concrete steps to the Dispatcher.

Introduction to Teradata RDBMS

2 ­ 13

Chapter 2: Teradata RDBMS Architecture Virtual Processors

The Dispatcher

The Dispatcher controls the sequence in which steps are executed. It also passes the steps to the BYNET (or Boardless BYNET on single-node systems) to be distributed to the AMP database management software as follows:

Stage Process

1 2

The Dispatcher receives concrete steps from gncApply. The Dispatcher places the first step on the BYNET (or Boardless BYNET); tells the BYNET whether the step is for one AMP, several AMPS, or all AMPs; and waits for a completion response. Whenever possible, the Teradata RDBMS performs steps in parallel to enhance performance.

3

The Dispatcher receives a completion response from one or several AMPs and places the next step on the BYNET. It continues to do this until all the AMP steps associated with a request are done.

Note that the Dispatcher can send AMP steps in any one of the following ways: · · · Between one PE and one AMP using the hashing algorithm Among a selected group of AMPs (referred to as a dynamic BYNET or Boardless BYNET) Among all AMPs in the system

2 ­ 14

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Virtual Processors

The BYNET software controls the transmission of messages to and from the AMPs. See the figure where 12 rows of a table are distributed among disks attached to four AMPs. In the following figure, if a request is for a single row, the PE transmits steps to a single AMP, as shown at PE 1. If the request is for many rows (an all-AMP request), the PE makes the BYNET broadcast the steps to all AMPs as shown at PE2. To minimize system overhead, the PE can send a step to a subset of AMPs.

PE 1 BYNET or Boardless BYNET

PE 2

AMP 1

AMP 2

AMP 3

AMP 4

Disk R1, R5, R9

Disk R2, R6, R10

Disk R3, R7, R11

Disk R4, R8, R12

HD14A001

SQL Example

As an example, consider the following Teradata SQL statements from a table used for checking account information:

1. SELECT * FROM Table_01 WHERE AcctNo = 129317 ; 2. SELECT * FROM Table_01 WHERE AcctBal > 1000 ;

In this example: · · · PEs 1 and 2 receive requests 1 and 2. The data for account 129317 is contained in table row R9 and stored on AMP. Information about all account balances is distributed evenly among the disks of all four AMPs

Introduction to Teradata RDBMS

2 ­ 15

Chapter 2: Teradata RDBMS Architecture Virtual Processors

Stage

Process

1 2

The PE 1 Parser determines that the request is a primary index retrieval, which calls for the access and return of one specific row. The Dispatcher in PE 1 issues a message to the BYNET containing an appropriate read step and R9/AMP 1 routing information. Once AMP 1 receives the desired record, PE 1 transmits the data back to the TDP. The PE 2 Parser determines that this is an all-AMPs request, then issues a message to the BYNET containing the appropriate read step to be broadcast to all four AMPs. Once the AMPs receive the results, PE 2 transmits the data back to the TDP.

3

4

To enhance system performance, the RDBMS executes steps in parallel whenever possible. Parallel steps can work with multi-statement requests, macros, and single statements and can provide a significant improvement in response time.

Processing the Steps

The AMPs are responsible for obtaining the rows required to process the request. An AMP step can be sent using one of the following methods: · · · Between one PE and one AMP using the hashing algorithm Among a selected set of AMPs, called a dynamic BYNET group Among all AMPs in the system

An AMP step is broadcast to all AMPs when a full-table scan is requested or when the operation uses Non-Unique Secondary Indexes (NUSIs). When an operation uses a Unique Primary Index (UPI), Non-Unique Primary Index (NUPI), or Unique Secondary Index (USI), the message includes the row hash value, which the BYNET uses to route the message to the correct vproc.

2 ­ 16

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Virtual Processors

The sequence of AMP step processing is as follows:

Step Step Name Function

1

Lock

Ensures that users who are concurrently trying to update the same rows do not violate the consistency of the data. If the operation uses a UPI, NUPI, or USI, this step is incorporated into step 2.

2

Operation

Performs the actual task required: select, delete, insert, update, or sort. Subsequent processing may include many operation steps.

3

End transaction

Required only for multiple AMP steps If the request is for a UPI, processing requires no end transaction step. The end transaction step tells all AMPs that worked on the request that processing is complete.

Each AMP is associated with disks and uses its file system software to control the reading and writing of data on its disks. The file system controls primitive physical data block reads and translates AMP software row requests into physical data block requests.

Introduction to Teradata RDBMS

2 ­ 17

Chapter 2: Teradata RDBMS Architecture Data Communications Management

Data Communications Management

This section discusses the Teradata RDBMS components that support all data communication management as follows: · · · · Call Level Interface (CLI) WinCLI and ODBC Teradata Director Program (TDP) Micro TDP

Call Level Interface

SQL requests from a client-based user, whether made as an interactive query or sent from an application program, are transmitted in the form of Call Level Interface (CLI) packet messages. These packages are uniform blocks that are routed to the server by the Teradata Director Program in IBM mainframe configurations or by the Micro Teradata Director Program (MTDP) in network configurations. Result tables returned to the requesting terminal or application are similarly routed by the TDP to the appropriate requester where they are unpackaged and presented as a results table.

WinCLI and ODBC

Personal computers running Microsoft Windows® can use the Windows CLI (WinCLI) package to access the Teradata RDBMS. WinCLI uses the Dynamic Data Exchange (DDE) protocol to communicate with application programs. The industry-standard ODBC driver to the Teradata RDBMS is another API used for packaging and unpackaging SQL requests.

Teradata Director Program (TDP)

In the Teradata RDBMS, the data communications manager for channelattached clients is called the TDP. The TDP does the following procedures: · · · · · · Establishes and manages session control Routes requests Routes logons Verifies users Initiates recovery and restart processing Monitors and controls security

The Teradata RDBMS also provides facilities to enable the TDP to communicate with client application services.

2 ­ 18

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Data Communications Management

Micro TDP

Workstation clients run a version of the TDP called the Micro TDP (MTDP) and an additional component called the Micro Operating System Interface (MOSI). MOSI contains libraries of procedures to handle operating-system-dependent and communications-protocol-dependent services. The MTDP calls MOSI routines for system services like: · · · Interrupt processing I/O processing Network connection and processing

Introduction to Teradata RDBMS

2 ­ 19

Chapter 2: Teradata RDBMS Architecture Application Programming Facilities

Application Programming Facilities

This section discusses the application programming facilities provided by the Teradata RDBMS software.This software falls into several broad categories: · · · · Embedded SQL Stored procedures Call Level Interface ODBC

All application programming facilities ultimately make their queries against the database using the SQL, because it is the only language that the Teradata RDBMS understands.

Stored Procedures

Using stored procedures, you can build large and complex database applications. In addition to a set of Stored Procedure Language (SPL) procedural statements and condition handling statements, a stored procedure can contain the following: · · · · · · Multiple input and output parameters Local variables SQL transaction statements SQL Data Definition Language (DDL), SQL Data Control Language (DCL), and SQL Data Manipulation Language (DML) statements Dynamic SQL statements LOCKING modifier

For more information on stored procedures and the benefits of using applications based on them, see "Teradata Stored Procedures" on page 4-13.

Embedded SQL

Using a C, COBOL, or PL/I preprocessor from certain platforms, you can write an application program that contains embedded Teradata SQL requests and instructions for the Teradata RDBMS in source code.

Call-Level Interface

The Call-Level Interface (CLI) is an application programming interface providing facilities that enable any client application programming language that supports a CALL statement to query the Teradata RDBMS. NCR servers running the Teradata RDBMS also directly support the CLI. A Windows®-based version of CLI, called WinCLI, is also available.

2 ­ 20

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Application Programming Facilities

ODBC

Open Database Connectivity (ODBC) is an industry standard application programming interface, which can be used with Microsoft Windows®, Windows® 95, Windows® 2000, and UNIX to make SQL queries against a Teradata RDBMS database. The ODBC Driver for the Teradata RDBMS provides Core-level SQL and Extension-level 1 (with some Extension-level 2) function call capability using the Windows® Sockets (WinSock) TCP/IP communications software interface. ODBC operates independently of CLI and WinCLI. An additional software package, the Database Query Manager, permits the Teradata RDBMS to manage queries received from ODBC in addition to other interfaces including: · · · Basic Teradata Query (BTEQ) Call Level Interface (CLI) Java Database Connectivity (JDBC)

For more information, see "Database Query Manager (DBQM)" on page 2-24.

Introduction to Teradata RDBMS

2 ­ 21

Chapter 2: Teradata RDBMS Architecture Archiving and Data Loading Utilities

Archiving and Data Loading Utilities

The Teradata RDBMS offers a wide variety of utilities, tools, and peripherals. With these you can load and backup data, save dumps, and investigate and control the Teradata RDBMS configuration, user sessions, and various aspects of its operation and performance. Data-loading utilities are typically used in a decision support environment where the client machine gathers data during the day and dumps it to the server at night. This topic briefly describes these utilities.

Archive/Recovery Products

Teradata offers the following backup utilities: · · · Archive/Recovery (ARC) (for channel-attached and network attached systems) ASF2 (only exists on legacy network-attached systems) Open Teradata Backup products including: · NetBackup (network-attached systems) · NetVault (network-attached systems)

ARC and ASF2

The Archive/Recovery utility and the Archive Storage Facility (ASF2) tape unit support archiving of databases, individual tables, or permanent journals to any of the following media: · · · 3500/4500/5100 tape (ASF2 only on legacy systems) Client tape Client file

ASF2 runs ARC and connects to the Teradata RDBMS through a network attachment. ASF2 is a standalone unit that backs up data to its own tape unit. These utilities also restore databases from those archival media to the Teradata RDBMS.

Open Teradata Backup

Open Teradata Backup (OTB) supports open architecture products that provide backup and restore functions for Windows clients. Products included in OTB include: · NetBackup NetBackup for Teradata supports parallel backups and restores coordinated across multiple hosts connected to a single Teradata RDBMS. The full functionality of the NetBackup server and the multiple media servers is

2 ­ 22

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Archiving and Data Loading Utilities

·

realized in this product. In addition, this product uses an Administrative Host, which contains a graphical user interface to provide object browsing and selection, automatic script generation and centralized job monitoring. NetVault The NetVault Teradata Module is a backup system that allows you to graphically select databases and tables and specify the kinds of backups (distributed, online, and so forth) you want to perform.

Data Load Utilities

Data load utilities perform the following functions: · The MultiLoad utility loads large quantities of data into unpopulated tables. MultiLoad also supports bulk inserts, updates, and deletions against populated tables. Both the client and server environments support MultiLoad. The FastLoad utility loads unpopulated tables only. Both the client and server environments support FastLoad. The TPump utility uses standard SQL/DML to maintain data in tables. It also contains a method whereby the user can specify the percentage of system resources to be used for the operations on tables. This allows background maintenance for insert, delete, and update operations to take place at any time of day.

· ·

FastExport

FastExport provides parallel export of data. The FastExport utility exports large quantities of data from the Teradata RDBMS to a client and is the functional complement of the FastLoad and MultiLoad utilities.

Database Utilities and Tools

Database utilities and tools include: · Utilities for investigating active sessions and the state of the Teradata RDBMS configuration, such as Query Session, Query Configuration, Gateway Control, and Ferret The Priority Scheduler Facility, for reviewing in realtime the resources being allocated to each performance group (if you assign your user accounts to performance groups) Tools for monitoring the state of the database and its resources during a Teradata RDBMS reset, including tools for saving dumps, the Vproc Manager utility for administering processor states, and the Recovery Manager utility for monitoring transaction recovery activities

·

·

Introduction to Teradata RDBMS

2 ­ 23

Chapter 2: Teradata RDBMS Architecture Archiving and Data Loading Utilities

Teradata Manager

The Teradata Manager runs on a network-attached PC running WindowsNT. The functions of the Teradata Manager include: · · · · · Sending an alert when CPU usage and parallelism are not optimal Monitoring the resource utilization of active sessions within a hierarchical structure, which you define Listing and, optionally, aborting idle sessions Displaying charts of dynamic utilization statistics Investigating historical resource utilization

Database Query Manager (DBQM)

The Database Query Manager (DBQM) is an application that lets you control the queries entering the Teradata RDBMS system. The DBQM functions for rules processing and SQL validation are integrated in Teradata RDBMS. You can create restrictions based on one or more of the following parameters: · · · Date and time Names, such as user and group logon ids and the names of databases and database objects (tables, views, stored procedures, and macros) The kind of resources involved in the query, such as: · Processing time · Number of rows returned · Joins or full-table scans

SQL queries entering the system, regardless of the source of the request, can be blocked, including queries received from: · · · · Basic Teradata Query (BTEQ) Call Level Interface (CLI) Open Database Connectivity (ODBC) Java Database Connectivity (JDBC)

All database information is maintained in a user database, not within Teradata RDBMS system tables. Note: The preceding description is for the integrated version of DBQM. The non-integrated version, which operates only on ODBC requests, is still available. For performance reasons, Teradata recommends using one version or the other, not both.

2 ­ 24

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Workstation Types and Available Platforms

Workstation Types and Available Platforms

Some of the workstation types are only available on specific platforms. The following table shows which workstations are appropriate for the different platforms and how workstations are connected to the node.

Type of Workstation Platform Description

System console Administration Workstation Remote connection through LAN: · UNIX workstation · PC with X Windows server

SMP MPP SMP and MPP

Connected directly to the SMP node LAN-connected through an Ethernet card on the node LAN-connected through an Ethernet card on the node

Administration Workstation

The Administration Workstation (AWS) performs many of the functions of a system console for MPP systems. The AWS is an intelligent workstation whose primary roles are to: · · · Provide a single-system view in the multinode environment Monitor system performance Provide an input mechanism for the system administrator

Database Window

The system console provides all of the following functions: · · · · Display of system status Display of current system configuration Display of performance statistics Control of various AMP utilities

The console software for the Teradata RDBMS is called the Database Window (DBW). Use the DBW to access utilities and tools for administering and monitoring the Teradata RDBMS. The Database Window provides an interface to all the following windows: · · · Supervisor Database Message Application Windows (including any currently active support utilities)

Introduction to Teradata RDBMS

2 ­ 25

Chapter 2: Teradata RDBMS Architecture Workstation Types and Available Platforms

The DBW communicates with the Teradata RDBMS through the console subsystem (CNS), which is part of the PDE. You will sometimes see CNS messages in the DBW, because the CNS manages the DBW. The DBW consists of a main window and several subwindows. The principal subwindow, called the Supervisor Subwindow, permits an operator to run utilities and enter various commands.

Supervisor Subwindow

The Supervisor Subwindow provides access to many utilities used to control, monitor, and configure the RDBMS. The following table lists the commands available from the Supervisor Subwindow of the DBW.

Command Function

CNSGET CNSSET DBWTIMEOUT CNSSET LINES CNSSET STATEPOLL CNSSET TIMEOUT

Displays the setting for the CNS variables Sets how often the CNS checks the connection between itself and the DBW Sets the number of lines that are saved and available to you in the output display area after a reconnect to the CNS Sets how often the CNS checks the RDBMS state and substate Sets the interval between the time you type a request and the time the DBW rejects it, should a program fail to solicit the input Prevents new sessions from logging on Restores the ability for new sessions to log on Displays the current system configuration Displays the status of logging to the specified resource usage tables Displays the resource collection and logging rates, and the memory-clearing rate of a vproc or node Displays the current date and time Displays the PDE and RDBMS version numbers Logs the specified text into the error log Displays the current state of the RDBMS Restarts the RDBMS Enables or disables logging to the specified resource usage tables

DISABLE LOGONS ENABLE LOGONS GET CONFIG GET LOGTABLE GET RESOURCE GET TIME GET VERSION LOG QUERY STATE RESTART TPA SET LOGTABLE

2 ­ 26

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture Workstation Types and Available Platforms

Command Function

SET RESOURCE START STOP

Sets the resource collection and logging rates, and the memory buffer clearing rate of a vproc or node Starts a RDBMS utility in a DBW application subwindow Stops a RDBMS utility in a DBW application subwindow

Introduction to Teradata RDBMS

2 ­ 27

Chapter 2: Teradata RDBMS Architecture RDBMS Gateway

RDBMS Gateway

The RDBMS Gateway is a server-resident program that provides a pathway for applications running on a network-connected client to access the Teradata RDBMS. The gateway runs as a separate operating system task and is the interface between the network and Teradata RDBMS. The RDBMS Gateway permits clients running locally to communicate with the Teradata RDBMS. The Gateway software validates messages from clients that generate sessions over the network. The session capabilities for network-attached clients are as follows: · · · · · · The maximum number of gateways per node is one. You can define the maximum number of network sessions (from 1-1200) that can be logged on per node. The maximum per network host group (the set of PEs assigned to one logical network attachment) is 1200 sessions. Each logical network attachment requires at least one PE. Each PE can support up to 128 sessions. Each logical network attachment requires 10 PEs to support the maximum, 1200-session capability of the network host group.

Configure the number of sessions per gateway through the gtwcontrol utility. Theoretically, the number is unlimited, but TCP limits the number of concurrent TCP connections allowed. The INET package configures some, usually 128, of the TCP sessions, but not all of these are available, because the system uses some of the sessions.

2 ­ 28

Introduction to Teradata RDBMS

Chapter 2: Teradata RDBMS Architecture For More Information

For More Information

For more information on the topics presented in this chapter, see the following Teradata RDBMS books.

IF you want to learn more about... THEN see...

System process flows Teradata SQL and Stored Procedures General Teradata software architecture The Teradata Director Program Preprocessor2 Embedded SQL

Teradata RDBMS Database Design Teradata RDBMS SQL Reference Teradata RDBMS Database Design Teradata TDP Reference Teradata Application Programming (PP2) · Teradata RDBMS SQL Reference (volume 6, "Data Manipulation Statements") · Teradata Application Programming (PP2) CLI for Channel Attached Systems (CAS) CLI for Network Attached Systems (NAS) Getting Started with Teradata Manager

Call Level Interface programming Teradata Manager

· · ·

· Teradata Manager Installation Guide

Introduction to Teradata RDBMS

2 ­ 29

Chapter 2: Teradata RDBMS Architecture For More Information

2 ­ 30

Introduction to Teradata RDBMS

Chapter 3:

The Relational Model

This chapter describes the relational model for database management including normalization, referential integrity, and referential constraints. Database management systems based on the hierarchical, network, and objectoriented models use different languages to define and manipulate a database.

Introduction to Teradata RDBMS

3­1

Chapter 3: The Relational Model What is a Relational Model?

What is a Relational Model?

The relational model for database management is based on concepts derived from the mathematical theory of sets. Roughly speaking, set theory defines a table as a relation. Each entity in a relation is called a tuple, and each column is called an attribute. The number of tuples (rows) is the cardinality of the relation, and the number of attributes (columns) is the degree. Any manipulation of a table in a relational database has a consistent, predictable outcome, because the mathematical operations on relations are well defined. By way of comparison, database-management products based on hierarchical, network, or object-oriented architectures are not built on rigorous theoretical foundations. Therefore, the behavior of such products is not as predictable as that of relational products. The SQL optimizer uses relational algebra to build the most efficient access path to requested data. The optimizer can readily adapt to changes in system variables by rebuilding access paths without programmer intervention. This adaptability is necessary because database definitions can change from time to time.

3­2

Introduction to Teradata RDBMS

Chapter 3: The Relational Model What is a Relational Database?

What is a Relational Database?

Users perceive a relational database as a collection of tables, which are easily manipulated. These tables correspond to the following relations:

Set Theory Term Relational Database Term

Relation Tuple Attribute

Table Row (or record) Column

Note: Relational databases are a generalization of the mathematics of set theory relations. Thus, the correspondences between set theory and relational databases are not always direct.

Introduction to Teradata RDBMS

3­3

Chapter 3: The Relational Model Tables

Tables

Two-dimensional tables consisting of rows and columns are the format in which data is organized and presented to users of a relational database. References between tables define the relationships and constraints of data inside the tables themselves. This section describes the two types of tables, permanent and temporary, that users create using SQL data definition language (DDL) capabilities.

Permanent and Temporary Tables

Users can store the results of multiple queries of the Teradata RDBMS in tables, or they can load tables directly with data. Permanent storage of tables is necessary when different sessions and users must share table contents. When tables are required for only a single session, the system creates temporary tables. With this type of table, a user can save query results for use in subsequent queries within the same session. Also, a user can break down complex queries into smaller queries by storing results in a temporary table for use during the same session. When the session ends, the system automatically drops the temporary table.

Global Temporary Tables

Global temporary tables are tables that exist only for the duration of the SQL session in which they are used. The contents of these tables are private to the session, and the system automatically drops the table at the end of that session. However, the system saves the global temporary table definition permanently in the data dictionary. In addition, global temporary tables allow the database administrator to define a template in the schema, which a user can reference for their exclusive use during a session.

Volatile Temporary Tables

A volatile temporary table resides in memory but does not survive across a system restart. If a user needs a temporary table for a single use only, they should define a volatile temporary table. Using volatile temporary tables improves performance even more than using global temporary tables, since the system does not store the definitions of volatile temporary tables in the data dictionary. Moreover, users require no privilege to access volatile temporary tables. For a detailed discussion of the syntax for creating the definition of a temporary table, see to "SQL Data Definition, Control, and Manipulation" in Teradata RDBMS SQL Reference, Vol 1.

3­4

Introduction to Teradata RDBMS

Chapter 3: The Relational Model Tables

Maximum Number of Temporary Tables

The maximum number of temporary tables is provided in the following table:

The maximum number of... Is...

global temporary tables (materialized) volatile tables

2000 per session. 1000 per session.

Introduction to Teradata RDBMS

3­5

Chapter 3: The Relational Model Normalization

Normalization

Normalization is the process of reducing a complex data structure into a simple, stable one. Generally this process involves removing redundant attributes, keys, and relationships from the conceptual data model. Normalization theory is constructed around the concept of normal forms that define a system of constraints. If a relation meets the constraints of a particular normal form, we say that relation is "in normal form." Think of normal forms as an onion, with the outermost layer being the set of all relations, including unnormalized relations. As you work your way to the core of the onion, you must pass through each lower normal form. As a result, a relation that has achieved fifth normal form has also achieved first, second, third, and fourth normal forms. By definition, a relational database is always normalized to some degree, because the field values are always atomic. But to simply leave it at that invites a number of problems including redundancy and potential update anomalies. The higher normal forms were developed to correct those problems.The following figure illustrates the layers of normalization.

All relations 1NF relations 2NF relations 3NF relations

BCNF relations

4NF relations 5NF relations

FG04A001

3­6

Introduction to Teradata RDBMS

Chapter 3: The Relational Model Normalization

Relational Database Terminology

The table below defines some important terms in order to make the following discussion of normal forms easier to understand:

Term Definition

Primary key

A unique identifier for a relation. Set theory (and relational database theory), does not allow duplicate rows in a primary key. However, commercially available relational databases often allow duplicate rows in relations. In those cases, the relation does not have a primary key. Relations with a primary (or candidate) key do not permit duplicate rows. The Teradata RDBMS permits enforcement of the no duplicates rule even when no primary key is specified.

Candidate key

One of multiple unique identifiers for a relation. Any relation might have multiple unique identifiers. A candidate key must satisfy the properties of uniqueness and minimality. That is, for any attribute, no 2 rows of the table may have the same value for that attribute, and if it is composite, no component can be eliminated without destroying the uniqueness property.

Alternate key Foreign key

Any candidate key not chosen as the primary key. A primary key in another relation that is also a column value in the current relation. Foreign keys are used to join tables and may be part of the primary key. Attribute X is functionally dependent on attribute Y if and only if each Y value in the relation has associated with it exactly one X value. Attribute X is fully functionally dependent on attribute Y if and only if it is functionally dependent on Y and not functionally dependent on any proper subset of Y. A state in which an attribute is fully functionally dependent but by means of an intermediate attribute. Transitive dependence is a state that normalization strives to eliminate. Any attribute on which some other attribute is fully functionally dependent. Given a relation with attributes X, Y, and Z, multivalued dependence holds if and only if the set of Y-values matching a given (X-value, Z-value) pair depends only on the X-value and is independent of the Z-value.

Functional dependence Full functional dependence Transitive dependence Determinant Multivalued dependence

Introduction to Teradata RDBMS

3­7

Chapter 3: The Relational Model Normalization

Term Definition

Join Join dependency

An operation in which data is retrieved from more than one table. A relation satisfies join dependency if and only if it is equal to the join of its projections on its component attributes.

3­8

Introduction to Teradata RDBMS

Chapter 3: The Relational Model First, Second, and Third Normal Forms

First, Second, and Third Normal Forms

This section describes the first three normal forms, including what they are, why we need them, and how to achieve them. These first three normal forms are stepping stones to the Boyce-Codd normal form and, when appropriate, the higher normal forms. The next section describes Boyce-Codd (BCNF) and higher normal forms.

First Normal Form (1NF)

First normal form (1NF) is definitive of a relational database. If we are to consider a database relational, then all relations in the database must be in 1NF. We say a relation is in 1NF if all fields within that relation (simple domains in mathematics) are atomic. This means that a field can contain one and only one value. We sometimes refer to this concept as the elimination of repeating groups from a relation. Furthermore, first normal form allows no hierarchies of data values. The formal definition of first normal form is as follows: For a relation to be in 1NF, the relationship between the primary key of the relation and each of the other attributes must be one-to-one (in that direction). In other words, all underlying simple domains of the relation may contain atomic values only. In this way, the nonkey attributes are functionally dependent on the key. Note: A nonkey attribute is any attribute that is not part of the primary key for the relation.

Second Normal Form (2NF)

Second normal form (2NF) deals with the elimination of circular dependencies from a relation. We say a relation is in 2NF if it is in 1NF and if every nonkey attribute is fully dependent on the entire primary key. The formal definition of second normal form is as follows: For a relation to be in 2NF, the relationship between any portion of the primary key of a relation and each of the other columns must not be one-to-one (in that direction). In other words, the nonkey columns are fully functionally dependent on the primary key.

Introduction to Teradata RDBMS

3­9

Chapter 3: The Relational Model First, Second, and Third Normal Forms

Third Normal Form

Third normal form (3NF) deals with the elimination of nonkey attributes that do not describe the primary key. The formal definition of third normal form is as follows: For a relation to be in 3NF, the relationship between any two nonprimary key columns or groups of columns in a relation must not be one-to-one in either direction. In other words, the nonkey columns are nontransitively dependent upon each other and the key. No transitive dependencies implies no mutual dependencies. We say attributes are mutually independent if none of them is functionally dependent on any combination of the others. This mutual independence ensures that we can update individual attributes without any danger of affecting any other attribute in a row.

3 ­ 10

Introduction to Teradata RDBMS

Chapter 3: The Relational Model Boyce-Codd and Higher Normal Forms

Boyce-Codd and Higher Normal Forms

When the relational model of database management was originally proposed, it only addressed the first three normal forms. Later work with the model showed that 3NF required further refinement to ensure that update anomalies would never occur. This section describes Boyce-Codd normal form and briefly mentions fourth and fifth normal forms for completeness.

Boyce-Codd Normal Form

Third normal form does not handle situations in which a relation has multiple composite candidate keys with overlapping attributes. To eliminate these problems, Codd developed the so-called Boyce-Codd normal form (BCNF), which reduces to 3NF whenever the special situation that defines this problem does not apply. A relation is in BCNF if and only if every determinant is a candidate key. This means that only determinants are candidate keys.

Fourth Normal Form

We say a relation is in fourth normal form (4NF) if and only if, whenever a multivalued dependency exists in the relation (for example, say X multiply determines Y), then all attributes of the relation are also functionally dependent on X. In practice, we rarely see the need for 4NF.

Fifth Normal Form

So far it has been possible to normalize a relation by decomposing it into two of its projections. In rare occasions, simple projections are not sufficient to decompose a nonnormal relation into two relations. In these rare instances, we use fifth normal form (5NF) to decompose the unnormalized relation into three or more projections of the original relation. We say a relation is fifth normal form (5NF - sometimes called projection-join normal form, or PJ/NF) if and only if every join dependency in the relation is a consequence of the candidate keys of the relation. This makes 5NF the final possible normal form to be achieved by taking projections and using joins. It is guaranteed to be free of all anomalies that can be removed by taking projections, but not necessarily of all possible anomalies.

Introduction to Teradata RDBMS

3 ­ 11

Chapter 3: The Relational Model Referential Integrity

Referential Integrity

Referential integrity (RI) is the concept of relationships between tables, based on the definition of a primary key and a foreign key. RI allows specification of columns within a referencing table that are foreign keys for columns in some other referenced table. You must define referenced columns as either primary key columns or unique columns. RI is a reliable mechanism, which prevents accidental database corruptions when users perform inserts, updates, and deletes. Referential integrity states that a row cannot exist in a table with a non-null value for a referencing column if an equal value does not exist in a referenced column.

Enforcing RI in the Teradata RDBMS

To implement RI in the Teradata RDBMS, you have three choices: · · · Use the referential constraint checks supplied by the database software. Write your own, site-specific macros. Enforce constraints through application code.

Referencing (Child) Table

We call the referencing table the Child table, and we call the specified Child table columns the referencing columns. Referencing columns must have the same numbers and types of columns, data types, and sensitivity as the referenced table keys.

Referenced (Parent) Table

A Child table must have a parent table, and the referenced table is referred to as the Parent table. The parent key columns that exist in the Parent table are the referenced columns. Since the referenced columns are defined as unique constraints, the referenced columns must be one of the following: · · A unique primary index (UPI), not null A unique secondary index (USI), not null

3 ­ 12

Introduction to Teradata RDBMS

Chapter 3: The Relational Model Referential Integrity

Terms Related to Referential Integrity

We use the following terms to explain the Referential Integrity concept:

Term Definition

Child Table Parent Table Primary Key Foreign Key

A table where the referential constraints are defined. Also called the "referencing table." The table being referenced by a Child table. Also called the "referenced table." Uniquely identifies a row of a table. Refers to column(s) in the Child table. May consist of up to 16 columns.

Why Is Referential Integrity Important?

Referential integrity is important, because it keeps you from accidentally corrupting your database. Suppose you have a table like the following: O RDER PART

Order Number Part Number Quantity

PK FK 1 1 2 FK 1 2 1 Not Null 110 275 152

Part number and order number, each foreign keys in this relation, also form the composite primary key. Suppose you were to go the PART NUMBER table and delete the row defined by the primary key value 1. The foreign key for the first and third rows in the ORDER PART table would now be corrupted, because there would be no row in the PART NUMBER table with a primary key of 1 to support it. Such a situation shows a loss of referential integrity. Now, suppose you had a mechanism to prevent this from happening? If you tried to delete the row with a primary key value of 1 from the PART NUMBER table, the database management system would not allow you to remove the row. In fact, this is the way the Teradata RDBMS maintains referential integrity. If you select a row for deletion, insertion, or updating that is in any way related to rows in another table, those related rows are also deleted, inserted, or updated.

Introduction to Teradata RDBMS

3 ­ 13

Chapter 3: The Relational Model Referential Integrity

Besides data integrity and data consistency, referential integrity provides these benefits:

Benefit Description

Increase in development productivity

You do not need to code SQL statements to enforce referential constraints, because the Teradata RDBMS automatically enforces Referential Integrity. All update activities are programmed to ensure that referential constraints are not violated, because the Teradata RDBMS enforces Referential Integrity in all environments. No additional programs are required. The Teradata RDBMS uses the most efficient method to enforce the referential constraints.

Requirement of fewer written programs

Performance Improvement

Referential Constraints

The combination of the foreign key, the parent key, and the relationship between the two is called the referential constraint. Referential constraints must meet the following criteria: · · · · · · The parent key must exist when the referential constraint is defined. The parent key columns must be either a Unique Primary Index (UPI) or a Unique Secondary Index (USI). The foreign and parent keys must have the same number of columns and their data types must match. The foreign and parent keys cannot exceed 16 columns. Duplicate referential constraints are not allowed. You cannot drop or alter either foreign or parent keys using an ALTER TABLE statement after a referential constraint has been defined. To drop a foreign or parent key after a referential constraint has been defined, you must first drop the constraint and then alter the table. The foreign key must be equal to the parent key or it must be null. Self-reference is allowed, but the foreign and parent keys cannot consist of identical columns. You can define no more than 64 referential constraints per table.

· ·

3 ­ 14

Introduction to Teradata RDBMS

Chapter 3: The Relational Model Referential Integrity

Referential Constraint Checks

The Teradata RDBMS performs referential constraint checks whenever any of the following instances occur: · · · A referential constraint is added to a populated table. A row is inserted, deleted, or updated. A parent or foreign key is modified.

The following table summarizes these actions:

Action on RDBMS Constraint Check Performed

INSERT into parent table INSERT into child table DELETE from parent table DELETE from child table UPDATE parent table UPDATE child table

None Must have matching parent key value if the foreign key is not null Abort the request if the deleted parent key is referenced by any foreign key None Abort the request if the parent key is referenced by any foreign key New value must match the parent key when the foreign key is updated

Introduction to Teradata RDBMS

3 ­ 15

Chapter 3: The Relational Model Indexes

Indexes

Indexes on tables in a relational database function much like indexes in books-- they speed up information retrieval. In general, systems use indexes to: · · · · Distribute data rows (primary index only). Locate data rows. Improve performance (indexed access is usually more efficient than searching all rows of a table). Ensure uniqueness of the index values (only one row of a table can have a particular value in the column(s) defined as a unique index).

The Teradata RDBMS uses indexes to define row uniqueness. This means that either a unique primary index or a unique secondary index must identify each row in a table. A typical index contains two fields: a value and a pointer to instances of that value in a data table. Because the Teradata RDBMS uses hashing to distribute rows across the AMPs, the value is condensed into an entity called a row hash, which the system uses as the pointer. The row hash is not a value, but a mathematically transformed address for a particular row. The Teradata RDBMS uses this transformed address as a retrieval index.

Teradata RDBMS Indexes

The Teradata RDBMS supports the following types of indexes: · · · · Primary Index Secondary Index Join Index Hash Index

Primary Indexes

The Teradata RDBMS requires one and only one primary index for each table. It does not require secondary indexes. Primary indexes: · · · Affect the distribution of rows across AMPs Do not have subtables Can be unique or non-unique

The primary index for a table should represent the data values that SQL queries use most often to access the data for the table.

3 ­ 16

Introduction to Teradata RDBMS

Chapter 3: The Relational Model Indexes

The value(s) chosen for the unique index of a table are frequently the same values identified as the primary key during the data modeling process, but no hard and fast rule makes this so. In fact, physical database design considerations often lead to a choice of values other than those of the primary key for the unique index of a table.

How Are Primary Keys and Primary Indexes Related?

The following table describes some of the conceptual differences between primary keys and primary indexes:

Primary Key Primary Index

Naming convention used to ensure referential integrity Required by the Teradata RDBMS only if referential integrity checks are to be performed

IF the Teradata RDBMS performs... THEN the column limit is...

Physical access mechanism Required by Teradata RDBMS

16-column limit

referential integrity checks no referential integrity checks

16. unlimited.

Defined by CREATE TABLE statement Must be unique Identifies a row uniquely Values cannot be changed May not be null Does not imply access path

Defined by CREATE TABLE statement May be unique or non-unique Distributes rows Values can be changed May be null Defines most common access path

Secondary Indexes

Secondary indexes allow access to information in a table by alternate, less frequently used paths. Secondary indexes require the maintenance of a subtable and the associated overhead (additional storage space and maintenance).

Introduction to Teradata RDBMS

3 ­ 17

Chapter 3: The Relational Model Indexes

Join Indexes

A join index is an indexing structure containing columns from one or more tables. Unlike other indexes, join indexes do not store pointers to their associated base table rows. Instead, they are a final access point that eliminates the need to join the base tables they represent. They substitute for, rather than point to base table rows with the exception of single-table joins. For more information, see "Single-Table Join Indexes". Multi-table and single-table join indexes are discussed briefly in the following paragraphs. For detailed information about the benefits and capabilities of each type of index, see "Join Indexes" in Teradata RDBMS Database Design. For information on the Join Index syntax structure, see "Data Definition Language Statement Syntax" in Teradata RDBMS SQL Reference, Vol 4.

Multi-Table Join Indexes

The multi-table join index allows frequently executed join queries to be processed more efficiently than using the Join Query action alone. The index is useful for queries in which the index structure contains all the columns referenced by one or more joins. In this instance, the index covers the query, so that the requested data can be retrieved from the index rather than from its underlying base tables. For this reason, an index of this kind is called a covering index.

Single-Table Join Indexes

Single-table join indexes are defined to facilitate joins by hashing a subset of base table columns to the same AMP as the table rows to which they are frequently joined. Depending on how the single-table join index is defined, this type of index can be useful for queries where the index structure contains only some of the columns referenced in the statement. This index is referred to as partial covering. If the index contains the rowID (a 32-bit numeric value that uniquely identifies the row) of the base table or columns that comprise the UPI or NUPI plus a USI or rowID of the base table, the system can use the index to join back to the base table for the additional columns needed to cover the query.

Hash Indexes

The hash index provides a space-efficient index structure that can be hash distributed to AMPs in various ways. The index has characteristics similar to a single-table join index with a rowID that provides transparent access to the base table.

3 ­ 18

Introduction to Teradata RDBMS

Chapter 3: The Relational Model Indexes

The hash index has been designed to improve query performance in a manner similar to a single-table join index. In particular, you can specify a hash index to: · · Cover columns in a query so that the base table does not need to be accessed Serve as an alternate access method to the base table in a join or retrieval operation.

Using Indexes

Unlike other database management systems, relational systems typically do not permit explicit use of indexes in application programs or queries. Instead, the optimizer decides at the moment of SQL compilation which index or indexes (if any) to use to optimize the query. You cannot force the optimizer to use any index--it selects whichever index or indexes will return the query result most quickly. In some cases, the optimizer will process the query without using any index. Selection of indexes: · · · Can have a direct impact on overall Teradata performance Is not always a straightforward process Is based partly on usage information

Introduction to Teradata RDBMS

3 ­ 19

Chapter 3: The Relational Model For More Information

For More Information

For more information on the topics presented in this chapter, see the following Teradata RDBMS books:

IF you want to learn more about... THEN see...

The relational model of database management Normalization

Teradata RDBMS Database Design Teradata RDBMS Database Design

3 ­ 20

Introduction to Teradata RDBMS

Chapter 4:

Structured Query Language (SQL)

This chapter describes SQL, which is the ANSI standard language for relational database management. All application programming facilities ultimately make queries against the database using SQL, because it is the only language the Teradata RDBMS understands. The first part of this chapter describes the data definition and manipulation capabilities of Teradata SQL. This includes basic statements used for describing and defining entities and for manipulating and retrieving data. The rest of the chapter discusses query facilities and using SQL to create stored procedures, secondary and join indexes, triggers, and views.

Why SQL?

SQL has the advantage of being the most commonly used language for relational database management systems. Because of this, both the data structures in the database and the commands for manipulating those structures are controlled using SQL. Additionally, all applications, whether written in a client language with embedded SQL, a macro, or an ad-hoc SQL query, are written and executed using the same set of instructions and syntax. Other database management systems use different languages for data definition and data manipulation and may not permit ad-hoc queries of the database. Teradata RDBMS lets you use one language to define, query, and update your data.

Introduction to Teradata RDBMS

4­1

Chapter 4: Structured Query Language (SQL)

What is SQL?

In principle, the SQL language is a combination of at least two subordinate languages and the SELECT statement as shown in the following table:

Teradata SQL includes ... Which...

Data Definition Language (DDL)

provides statements for the definition and description of entities as follows: · · · · CREATE ALTER MODIFY DROP

Data Control Language (DCL)

allows you to review current definitions and control data access and object ownership as follows: · · · · SHOW HELP GRANT REVOKE GIVE

Data Manipulation Language (DML)

supports statements for manipulating and processing database values as follows: · · · · INSERT UPDATE DELETE ROLLBACK

With DML statements, you can insert new rows into a table, update one or more values in stored rows, or delete a row. SELECT statement retrieves data.

Teradata SQL Data Types

You must specify a data type for each column when you use SQL to create a table, because Teradata does not provide a default data type. Include a data type to specify data conversions in expressions. A data type phrase does the following: · · Determines how data is stored on the Teradata RDBMS Specifies how data is presented to the user

4­2

Introduction to Teradata RDBMS

Chapter 4: Structured Query Language (SQL)

The following table contains information about supported Teradata SQL data types:

Teradata supports... Including...

Teradata SQL data types

Numeric. Character. DateTime. Interval. Byte.

various data types that client systems use

Numeric. Character. Byte.

Data Type Attributes

Data type attributes control the import format (internal representation of stored data) and export format (how data is presented for a column or an expression result). The following table contains information about the supported data types:

Teradata supports... Including the following data types...

ANSI-compliant DateTimes and Intervals

DATE. TIME. TIMESTAMP. TIMEZONE. INTERVAL.

At the system, user, and session levels, the user has the option of specifying ANSI or Teradata mode for the date data type import and export format.

Introduction to Teradata RDBMS

4­3

Chapter 4: Structured Query Language (SQL)

Statements, Requests, Queries, and Transactions

You can enter any Teradata SQL statement as illustrated in the following table:

You can enter the... As a...

statement statement, or

single request. solitary statement, or as the last statement in an "explicit transaction as follows: · In Teradata mode: One or more requests enclosed by user-supplied BEGIN TRANSACTION and END TRANSACTION statements In ANSI mode: One or more requests ending with the COMMIT keyword

· solitary statement

macro.

Table Constraints

During table creation and modification, you can specify constraints on single column values as part of a column definition or on multiple columns using the CREATE and ALTER statements.

Default Database

While engaged in a Teradata session, you may repeatedly query, define, and manipulate data from multiple databases. In fact, you can establish a default database by specifying the name in an SQL DATABASE statement in order to avoid having to key in the name of the most frequently used database each time you enter a statement. After you have defined it, the default database remains in effect until the end of a session or until a subsequent DATABASE statement replaces it. Note: To establish a default database, you must have some privilege on a database, macro, table, user, stored procedure or view in that database.

Heterogeneous Kanji Client Support

The internationalization feature of the Teradata RDBMS allows users from different platforms supporting various Kanji character sets to share Kanji data. This support spans across character sets for IBM, mainframe, UNIX and DOS/Windows clients. In addition, Teradata RDBMS supports Japanese Imperial Era data and time formatting.

4­4

Introduction to Teradata RDBMS

Chapter 4: Structured Query Language (SQL) SQL Functional Families

SQL Functional Families

SQL provides facilities for defining database objects, for defining user access to those objects, and for manipulating the data stored within them. The principal functional families of SQL are: · · · SQL Data Definition Language (DDL) SQL Data Control Language (DCL) SQL Data Manipulation Language (DML)

The functional families of Teradata SQL are introduced in the following paragraphs. For more information functional families and binding styles, see "SQL Data Handling Fundamentals" in Teradata RDBMS SQL Reference, Vol. 1.

Data Definition

DDL enables the structure and instances of a database to be defined. This section describes the data definition capabilities of Teradata SQL, emphasizing the basic definition statements and data types. DDL provides statements for the definition and description of entities. The following table explains the basic statements of the SQL data definition:

Statement Action performed

CREATE

Defines a new database, user, table, trigger, index, macro, stored procedure or view, depending on the object of the CREATE statement Removes a table, trigger, index, macro, stored procedure or view definition, depending on the object of the DROP statement Changes a table, trigger or protection definition Changes a database or user definition Changes the names of tables, triggers, views, stored procedures, and macros Replaces macros triggers, stored procedures, or views specifies time zones or a different collation or character set for a session gathers statistics on a column or index

DROP

ALTER MODIFY RENAME REPLACE SET COLLECT

Successful execution of a data definition statement automatically creates, updates, and removes entries in the Data Dictionary (DD).

Introduction to Teradata RDBMS

4­5

Chapter 4: Structured Query Language (SQL) SQL Functional Families

Note: You can collect statistics at the same time that you execute queries against the table. Dictionary locks are acquired only after all the statistics are collected. After the locks are placed, the DD is updated with the statistics information. This allows statistics to be collected in parallel from several sessions against the same table.

Data Control

DCL statements grant and revoke access to database objects and change ownership of those objects from one user or database to another. The results of DCL statement processing also are recorded in the DD. Information in the following table explains DCL statements:

Statement Action

GRANT/REVOKE GRANT LOGON/REVOKE GIVE HELP and SHOW

Controls access rights of the users on an object Controls logon rights to a host (client) or host group (if the special security user is enabled) Gives a database object to another database object provides help about object definitions such as: · · · · · HELP DATABASE HELP TABLE HELP CONSTRAINT HELP PROCEDURE HELP TRIGGER, and so forth

provides help about: · · · Sessions and statistics SQL statement syntax Summaries of database object definitions, such as SHOW TABLE, SHOW JOIN INDEX, SHOW MACRO, SHOW TRIGGER, and so forth

Data Manipulation

DML supports statements for manipulating and processing database values. With DML statements, you can insert new rows into a table, update one or more values in stored rows, or delete a row. The following table lists and describes basic data manipulation statements:

Statement Description

INSERT UPDATE

Inserts new rows into a table Modifies data in one or more rows of a table

4­6

Introduction to Teradata RDBMS

Chapter 4: Structured Query Language (SQL) SQL Functional Families

Statement Description

DELETE COMMENT These statements: · ABORT · ROLLBACK · COMMIT · BEGIN/END · TRANSACTION CHECKPOINT DATABASE ECHO

Removes a lock or a row (or rows) from a table Insert a text comment for a database object Allow you to better manage transactions

Check points a journal Specifies a default database. Echoes a string or command to a client

Introduction to Teradata RDBMS

4­7

Chapter 4: Structured Query Language (SQL) The SELECT Statement

The SELECT Statement

The SELECT statement is probably the most frequently used SQL statement. It specifies the table columns from which to obtain the data you want, the corresponding database (if no default database is established), and the table or tables that you need to reference within that database. The SELECT statement further specifies how, in what format, and in what order the system returns the set of result data. You can use the following variations with the SELECT statement to request data from the Teradata RDBMS: · · · · · · · · DISTINCT option FROM list WHERE clause, including subqueries SAMPLE clause GROUP BY clause HAVING clause QUALIFY clause ORDER BY clause · CASESPECIFIC option · International sort orders WITH clause Query expressions and set operators

· ·

Another variation is the SELECT INTO statement, which is used in embedded SQL and stored procedures. This statement selects at most one row from a table and assigns the values in that row to host variables in embedded SQL, and to local variables or parameters in stored procedures.

Only SELECT Uses Set Operators

The SELECT statement is the only SQL statement that can use the set operators UNION, INTERSECT, and MINUS. These set operators allow the user to manipulate the answers to two or more queries by combining the results of each query into a single result set. Use the above set operators within the following operations: · · · · View definitions Derive tables Subqueries INSERT SELECT clauses

4­8

Introduction to Teradata RDBMS

Chapter 4: Structured Query Language (SQL) The SELECT Statement

Joins

A SELECT operation can reference data in one or more tables and/or views. In this way, the SELECT statement defines a join of specified tables or views to retrieve data more efficiently than without defining a join of tables. You can specify both inner joins and outer joins.

OLAP Functions

The Teradata RDBMS provides a number of On Line Analytical Processing (OLAP) functions as extensions to SQL. These functions let you perform sophisticated data mining of your databases, which enables you to get answers to a class of questions that standard SQL alone cannot provide. The OLAP functions include: · · Operations on grouped rows Filtering of a QUALIFY clause analogous to the HAVING clause of ordinary SQL to eliminate rows on the basis of the function value

Unlike the aggregate functions, which return one value for all qualified rows examined, the OLAP functions return a new value for each of the qualifying rows involved in the query. The following features comprise Teradata OLAP functionality: · · · Statistical Functions Extended Date/Calendar Capability Sampling

When performing OLAP sampling, a disjoint set of records in a table are accessed using a set of generated row positions (15th, 35th, and so forth). This operation is optimized in the file system by only accessing the data blocks that contain the target row positions instead of scanning all the data blocks.

Statement Punctuation

A typical SQL statement consists of a statement keyword, one or more column names, a database name, a table name, and one or more optional clauses introduced by keywords.

Introduction to Teradata RDBMS

4­9

Chapter 4: Structured Query Language (SQL) The SELECT Statement

SQL statements are punctuated using the following syntax elements:

This syntax element... Named... Performs this function in a SQL statement...

.

period

separates database names from table names and table names from a particular column name (for example, personnel.employee.deptno) separates and distinguishes column names in the select list, or column names or parameters in an optional clause delimits the boundaries of character string constants group expressions or define the limits of a phrase separates statements in multi-statement requests and terminates requests submitted via certain utilities such as BTEQ define the extent of nonstandard names prefixes reference parameters or client system variables

,

comma

` ( ) ;

apostrophe left and right parentheses semicolon

" :

quotation marks colon

To include an apostrophe or show possession in a title, double the apostrophes.

4 ­ 10

Introduction to Teradata RDBMS

Chapter 4: Structured Query Language (SQL) Query Facilities

Query Facilities

A request to the Teradata RDBMS consists of one or more SQL statements, and can span any number of input lines. The Teradata RDBMS can receive and execute statements that are: · · · · Embedded in an application program that is written in a procedural language Entered interactively through the Basic Teradata Query interface Submitted in a Basic Teradata Query script as a batch job Submitted through other supported methods (such as ODBC)

Basic Teradata Query

The Basic Teradata Query (BTEQ) facility is an SQL formatter/report generator that lets you create and submit SQL queries either interactively or in batch mode from an interactive terminal. BTEQ supports the following facilities: · · · · Multiple Teradata SQL statements per request Read from and write to client data files Management of multiple sessions per job Format output and write sophisticated reports

The following platforms support BTEQ: · · · Channel-attached client Network-attached client Teradata server

Embedded SQL

The Teradata RDBMS provides a preprocessing facility that lets you include SQL statements in your application programs which use embedded SQL. The SQL preprocessor parses your application code for SQL statements, converts the statements to CLI calls, and then comments out the SQL statements. After the Teradata RDBMS Preprocessor2 processes the application code, you can submit processed code to your client application language compiler.

Introduction to Teradata RDBMS

4 ­ 11

Chapter 4: Structured Query Language (SQL) Query Facilities

Preprocessor2 (PP2) supports the following client programming languages:

This programming language... Is supported on this platform...

PL/I COBOL

IBM mainframe clients · IBM mainframe clients · Selected workstation clients

C

· IBM mainframe clients · UNIX clients

Cursors

Traditional application development languages cannot deal with results tables without some kind of intermediary mechanism, because SQL is a set-oriented language. The intermediary mechanism is the cursor. A cursor is a pointer that the application program uses to move through a results table one row/record in a table at a time. You declare a cursor for a SELECT statement, and then open the named cursor. The act of opening the cursor executes the SQL statement. You use the FETCH ... INTO ... statement to individually fetch and write the rows into host variables. The application can then use the host variables to do computations. Preprocessor2 uses cursors to mark or tag the first row accessed by an SQL query. Preprocessor2 then increments the cursor as needed.

SQL Flagger

The Teradata RDBMS has an optional feature that detects non-ANSI SQL extensions (for entry level ANSI SQL92 only) and returns them to the user (either to an embedded SQL program or to BTEQ) without terminating execution of the query.

4 ­ 12

Introduction to Teradata RDBMS

Chapter 4: Structured Query Language (SQL) Teradata Stored Procedures

Teradata Stored Procedures

A stored procedure is a combination of procedural statements and SQL statements. It provides a procedural interface to the Teradata RDBMS using the Stored Procedure Language (SPL). A stored procedure is a database object and is executed on the Teradata RDMBS server space. Each stored procedure has a corresponding "stored procedure table" in the database, containing the SPL source text specified by the user and the corresponding SPL object code. The parameters and attributes of the procedure object are stored in the Data Dictionary tables.

Why Use Stored Procedures?

Applications based on stored procedures provide the following benefits: · They perform well by reducing network traffic in the client-server environment, because stored procedures reside and execute on the database server. They allow encapsulation and enforcement of business rules on the server, contributing to improved application maintenance. They provide better transaction control. They provide better security by restricting user access to the procedures rather than the data tables. SPL provides an exception handling mechanism to handle the runtime exception conditions generated by the application. All SQL and SPL statements embedded in a stored procedure can be executed on the server by submitting one SQL CALL statement. Nested CALL statements further extend the performance.

· · · · ·

Elements of SPL

The following elements constitute SPL: · SQL DML statements such as SELECT INTO, SELECT (only in cursors), INSERT (including INSERT-SELECT), UPDATE (including UPSERT feature), DELETE, and CALL SQL transaction statements such as ROLLBACK, ABORT, COMMIT, BEGIN TRANSACTION, and END TRANSACTION SQL DDL statements with some exceptions SQL DCL statements Dynamic SQL statements LOCKING modifiers with all supported transaction statements and DML statements except CALL

· · · · ·

Introduction to Teradata RDBMS

4 ­ 13

Chapter 4: Structured Query Language (SQL) Teradata Stored Procedures

· ·

·

·

·

Arithmetic, logic control, and assignment statements Completion condition and exception condition handlers of CONTINUE and EXIT types: · SQLSTATE-based condition handlers · Generic exception condition handler SQLEXCEPTION · Generic completion condition handlers NOT FOUND and SQLWARNING Updatable and read-only cursors. Cursors can be specified in stored procedures only as part of the SPL FOR statement. Local variable declaration statements. Local variables of any Teradata-supported data type can be specified in SPL DECLARE statements within a BEGIN-END block of the stored procedure. Teradata as well as ANSI SQL-99 style comments. Nested Teradata-style comments are not allowed

For more information, see "Using Stored Procedures as SQL Applications" on page 6-6.

4 ­ 14

Introduction to Teradata RDBMS

Chapter 4: Structured Query Language (SQL) Indexes

Indexes

This section discusses using SQL to create indexes on tables.

Specifying an Index

All tables require a primary index. If you do not designate a column or group of columns as a primary index for a table when you create it, the system chooses a default primary index for you. To specify a primary index, use the UNIQUE PRIMARY INDEX clause of the CREATE TABLE statement. Create a non-unique primary index in the same way, but omit the keyword UNIQUE. Create optional secondary indexes the same way, but omit the keyword PRIMARY. A join index provides an index across multiple tables.

Index Access Methods

Choose a primary index for your table carefully, since the choice of a primary index can affect system performance. The following table explains the strengths and weaknesses of the various indexing methods:

Access Method Comments

Unique Primary Index (UPI)

· · ·

Is very efficient Involves 1 AMP and 1 row Requires no spool file Is efficient when the number of rows per value is less than 100 Involves 1 AMP and multiple rows May require spool file Is very efficient Involves 2 AMPs and 1 row Requires no spool file Is efficient only when the number of rows accessed is less than the number of data blocks in the table Involves all AMPS and multiple rows May require spool file

Nonunique Primary Index (NUPI)

· · ·

Unique Secondary Index (USI)

· · ·

Nonunique Secondary Index (NUSI)

·

· ·

Introduction to Teradata RDBMS

4 ­ 15

Chapter 4: Structured Query Language (SQL) Indexes

Access Method Comments

Full table scan

· · ·

Is efficient because each data block is touched only once Involves all AMPS and all rows Requires spool file possibly as large as the base table Is efficient because it eliminates the process of joining the tables May use nonunique primary or secondary access into the join index without accesses to the user tables May use full table scan of the join index Is efficient in the same manner as single-table join index Provides a transparent access path to base table Requires execution plans to access index using a direct probe, full table scan, or range scan

Join Index

· ·

· Hash Index · · ·

4 ­ 16

Introduction to Teradata RDBMS

Chapter 4: Structured Query Language (SQL) Triggers

Triggers

A trigger (essentially, a stored SQL statement associated with a table) is a database object that defines events that happen when some other event, called a triggering event, occurs. Create a trigger by using the CREATE TRIGGER statement. Triggers execute when an INSERT, UPDATE, or DELETE modifies a specified column or columns in the subject table. Typically, the stored SQL statements perform an UPDATE, INSERT, or DELETE on a table different from the subject table. Sometimes a statement fires a trigger, which in turn, fires another trigger. Thus the outcome of one triggering event can itself become another trigger. The Teradata RDBMS processes and optimizes the triggered and triggering statements in parallel to maximize system performance.

Trigger Functions

Use triggers to perform various functions: · · Define a trigger on the parent table to ensure that UPDATEs and DELETEs performed to the parent table are propagated to the child table. Use triggers for auditing. For example, you can define a trigger which causes INSERTs in a log record when an employee receives a raise higher than 10%. Use a trigger to disallow massive UPDATEs, INSERTs, or DELETEs during business hours.

·

For example, you can use triggers to set thresholds for inventory of each item by store, to create a purchase order when the inventory drops below a threshold, or to change a price if the daily volume does not meet expectations.

Restrictions on Using Triggers

Teradata triggers do not support FastLoad and MultiLoad utilities and, and you must disable triggers before you run load utilities. In addition, a positioned (updatable cursor) UPDATE or DELETE is not allowed to fire a trigger and generates an error. Note: You cannot define a join index on a table with a trigger.

Introduction to Teradata RDBMS

4 ­ 17

Chapter 4: Structured Query Language (SQL) Views

Views

Views are actually virtual tables that you can use (as if they were physical tables) to retrieve data from the database by defining columns from underlying views and/or tables. The Data Dictionary stores view definitions.

Creating a View

A view does not contain data and is not materialized until an SQL statement references it. Views are useful, because they can simplify access to information in the Teradata database. Use the CREATE VIEW statement to define a view. This statement names the view and columns of the view and defines a SELECT on one or more columns from other tables and/or views.

Restrictions on Using Views

You can only use Views if they were tables in SELECT statements. In addition, views are subject to some restrictions regarding the INSERT, UPDATE, and DELETE statements.

Altering a Teradata RDBMS View

Use the REPLACE VIEW statement to alter the characteristics of an existing view.

4 ­ 18

Introduction to Teradata RDBMS

Chapter 4: Structured Query Language (SQL) For More Information

For More Information

For more information on the topics presented in this chapter, see the following Teradata RDBMS books:

If you want to learn more about... THEN see...

Teradata SQL

· ·

Teradata RDBMS Database Design Teradata RDBMS SQL Reference Teradata BTEQ Reference Teradata RDBMS SQL Vol 6 Teradata Application Programming (PP2) Teradata RDBMS Database Design Teradata RDBMS SQL Reference Teradata RDBMS Security Administration Teradata RDBMS SQL Reference

BTEQ Embedded SQL

· · ·

Triggers

· · ·

Stored Procedures and Stored Procedure Language

·

Introduction to Teradata RDBMS

4 ­ 19

Chapter 4: Structured Query Language (SQL) For More Information

4 ­ 20

Introduction to Teradata RDBMS

Chapter 5:

Data Dictionary

The Data Dictionary (DD) is the system catalog for the Teradata RDBMS. It contains metadata: table, view and index definitions, parameters and attributes of macros and stored procedures, resource usage statistics, and much more. The DD is a system database--a repository containing data about user databases and properties of those databases. The DD also contains a good deal of administrative information about the Teradata RDBMS. Unlike the system catalogs of nonrelational systems, the Teradata Data Dictionary is a fully relational database that uses SQL as its data sublanguage just like the user databases.

DD Objects

Among the objects which the DD defines and administers are the following: · · · · · · · · · · Database and user profiles System journals Security audit and logon information Error and message logs Archive information Lock journals Session status information Space allocation information Accounting information Database, table, view, index, stored procedure, and macro definitions

Introduction to Teradata RDBMS

5­1

Chapter 5: Data Dictionary DD Users

DD Users

The DD is useful to all of the following user categories: · · · · · System administrator Database administrator Supervisory users Operations control personnel End users

This chapter describes the properties and capabilities of the Data Dictionary (DD).

5­2

Introduction to Teradata RDBMS

Chapter 5: Data Dictionary Structure of the Data Dictionary

Structure of the Data Dictionary

This section introduces the components of the DD. In particular, it discusses the various supplied views for the DD, which are loaded from DIPviews.script.

What Is a View?

A view is a virtual table which the user sees as a base table. Think of a view as a dynamic window to the underlying database. A view is constructed from one or more base tables (or views). However, a view usually presents only a subset of the columns and rows in the base table or tables that comprise the view. Some view columns do not exist in the underlying base tables. For example, it is possible to present data summaries in a view (for example, an average), which you cannot maintain in a base table. You can create hierarchies of views in which views can be created on views. This can be useful, but you should be aware that deleting any of the lower-level views invalidates dependencies of higher-level views in the hierarchy.

Why Use Views?

There are at least four reasons to use views. Views provide all of the following: · · · · A simplified user perception of the database Security for restricting table access and updates Well-defined, well-tested, high-performance access to data Logical data independence, which minimizes application modification if base tables require restructuring

A user may perceive one of roughly 50 different views of DD tables. The following table describes the four main types of user audience:

User Audience Description

End

Responsible for personal databases. Needs to know what information is available, in what form it is, how to get it, and what access rights have been granted to others

Supervisory

Responsible for databases and users Creates and organizes databases, monitors space usage, defines new users, allocates control privileges, creates indexes, and performs archives

Introduction to Teradata RDBMS

5­3

Chapter 5: Data Dictionary Structure of the Data Dictionary

User Audience Description

Administrative

Responsible for operation and administration of the system Needs to know about system performance, status and statistics, errors, and accounting

Recovery Control

Responsible for archive and recovery activities Needs to know about transaction journaling, down AMP recovery, and two-phase commit

Summary of DD Views

The following sections discuss the more important types of DD views, classifying them in accordance with who uses them. Some of these views exist in one of two forms. One form provides information about the entire group of objects controlled by the view, and the other form provides information only about those objects that the user executing the view owns, has created, or has privileges on. For a detailed discussion of these views, including the view names and specification of which forms pertain to which views, see "System Views" in Teradata RDBMS Data Dictionary.

End User Views

End-user views enable end users to show: · · · · · · · · The names assigned to and information about user-defined character sets Attributes of columns, tables and views, and parameters of stored procedures and macros. Characteristics of the database Information about journal-to-table mapping Information about client systems that are a part of the Teradata RDBMS configuration Information about tables, views, stored procedures, and macros that have been created in a database Privileges the user has granted to other users and privileges the user has been granted on databases, tables, views, stored procedures and macros The time an object was created, who created it, the last update time the object was altered, and who altered it

5­4

Introduction to Teradata RDBMS

Chapter 5: Data Dictionary Structure of the Data Dictionary

Supervisory User Views

Supervisory views enable supervisory users to show: · · · Accounts that are available to a given user The kinds of indexes defined for a given table Information about a user which they own or have created

Recovery Control User Views

Recovery control user views enable recovery control users such as the system administrator to show: · · · · Information about journal-to-table mapping An audit trail of all archive and recovery activity Detail of all archive and recovery activity that did not affect all AMPs Information about archive and recovery events that involve removable media

Administrative Views

Administrative views enable the database administrator to show: · · · · · · · · · · · Information about the access log record Information about access logging rules that are entered by the BEGIN/END LOGGING statements Information about rights of users AMP-by-AMP information about disk space usage for any database, table, or account. The displayed information includes spool space usage. AMP-by-AMP information about AMP, CPU, and I/O usage for each user base and account Names of databases and users that a user owns Logged information (over 30 days old) removed from the Access Log table AMP-by-AMP information about disk space usage for each database or account. The displayed information includes spool space usage Information about the logon rules entered by the GRANT/REVOKE LOGON statements Information about users who are currently logged on AMP-by-AMP information about disk space usage (not including spool space) for any database, table, or account

In addition, administrative views let the database administrator: · · · Maintain a log of system errors and a record of all logon and logoff activity Summarize information about processor utilization that is useful for capacity planning Log all statements entered by all users that affect access rights

Introduction to Teradata RDBMS

5­5

Chapter 5: Data Dictionary Using the Data Dictionary

Using the Data Dictionary

Every time a user logs onto the system, performs an SQL query, or types a password, they are using the DD.

SQL Access to DD

For security and data integrity reasons, the only SQL DML command you can use on the DD is the SELECT statement. You cannot use the INSERT, UPDATE, or DELETE SQL commands to alter the DD in any way. You can use SELECT to examine any view in the DD to which your administrator has granted you access. For example, if you need to access information in the Personnel database, then you can query the DBC.DataBases view as shown:

SELECT Databasename, Creatorname, Ownername, Permspace FROM DBC.DataBases WHERE Databasename='Personnel' ;

The query above produces a report like this:

Databasename Personnel Creatorname Jones Ownername Jones Permspace 1,000,000

5­6

Introduction to Teradata RDBMS

Chapter 5: Data Dictionary For More Information

For More Information

For more information on the topics presented in this chapter, see the following Teradata RDBMS books:

IF you want to learn more about... THEN see...

Types of Data Dictionary views Performing join operations on views

Teradata RDBMS Data Dictionary · · Teradata RDBMS Database Design Teradata RDBMS SQL Reference

Introduction to Teradata RDBMS

5­7

Chapter 5: Data Dictionary For More Information

5­8

Introduction to Teradata RDBMS

Chapter 6:

Application Development

Application development for the Teradata RDBMS falls into one of two categories: · · Explicit SQL Implicit SQL

Explicit SQL Development

Under explicit SQL application development you have the following tools: · · · · · · · · · Embedded SQL Macros Stored Procedures BTEQ CLI ODBC Queryman Third-party products that package and submit SQL EXPLAIN statement

More information about each tool is provided in following sections of this chapter.

Implicit SQL Development

Under implicit SQL application development, you have tools such as Teradata and third-party products that permit various fourth- generation languages and application generators to be translated into SQL. This chapter describes the various facilities and tools used to develop applications for the Teradata RDBMS in these environments.

Introduction to Teradata RDBMS

6­1

Chapter 6: Application Development Using Embedded SQL Applications

Using Embedded SQL Applications

This section describes writing applications using embedded SQL.

What Is Embedded SQL?

When you write applications using embedded SQL, you insert SQL statements into your native language application program. Because third-generation application development languages do not have facilities for dealing with results sets, embedded SQL contains extensions to executable SQL that permit declarations. Embedded SQL declarations include: · · Code to encapsulate the SQL from the native application language Cursor definition and manipulation

A cursor is a pointer device you use to read through a results table one record/row at a time.

How Does an Application Program Use Embedded SQL?

The client application languages that support embedded SQL are all compiled languages. SQL is not defined for any of them. For this reason, you must precompile your embedded SQL code to translate the SQL into native code before you can compile the source using a native compiler. The precompiler tool is called Preprocessor2, and you use it to: · · · Read your application source code to look for the defined SQL code fragments Interpret the intent of the code after it isolates all the SQL code in the application and translates it into CLI calls Comment out all the SQL source

The output of the precompiler is native language source code with CLI calls substituting for the SQL source. After the precompiler generates the output, you can process the converted source code with the native language compiler.

6­2

Introduction to Teradata RDBMS

Chapter 6: Application Development Using Embedded SQL Applications

Supported Languages and Platforms

Preprocessor2 supports the following application development languages on the specified platforms:

Application Development Language Platform

C COBOL

· · · ·

IBM mainframe clients UNIX clients IBM mainframe clients Some workstation clients

PL/I

IBM mainframes

Introduction to Teradata RDBMS

6­3

Chapter 6: Application Development Using Macros as SQL Applications

Using Macros as SQL Applications

Teradata macros are SQL statements which the server stores and executes. The advantages of using macros include the generation of less channel traffic and easy execution of frequently used SQL operations. Macros are particularly useful for enforcing data integrity rules, providing data security, and improving performance.

Creating a Macro

You use the CREATE MACRO statement to create Teradata macros. The format of CREATE MACRO is similar to CREATE VIEW. For example, suppose you want to define a macro for adding new employees to the Employee table and incrementing the EmpCount field in the Department table. The CREATE MACRO statement looks like this:

CREATE MACRO NewEmp (name (VARCHAR(12)), number (INTEGER, NOT NULL), dept (INTEGER, DEFAULT 100) ) AS (INSERT INTO Employee (Name, EmpNo, DeptNo ) VALUES (:name, :number, :dept ) UPDATE Department SET EmpCount=EmpCount+1 WHERE DeptNo=:dept ; ) ;

This macro defines parameters which a user must fill in each time they execute the macro. A leading colon (:) indicates a required parameter.

Using a Macro

The following example shows how to use the NewEmp macro to insert data into the Employee and Department tables.

6­4

Introduction to Teradata RDBMS

Chapter 6: Application Development Using Macros as SQL Applications

The information to be inserted is the name, employee number, and department number for employee H. Goldsmith. The EXECUTE macro statement looks like this:

EXECUTE NewEmp (`Goldsmith H', 10015, 600);

Modifying a Macro

The following example shows how to modify a macro. Suppose you want to change the NewEmp macro so that the default department number is 300 instead of 100. The REPLACE MACRO statement looks like this:

REPLACE MACRO NewEmp (name (VARCHAR(12)), number (INTEGER, NOT NULL), dept (INTEGER, DEFAULT 300) ) AS (INSERT INTO Employee (Name, EmpNo, DeptNo ) VALUES (:name, :number, :dept ) UPDATE Department SET EmpCount=EmpCount+1 WHERE DeptNo=:dept ; ) ;

Deleting a Macro

The example which follows shows how to delete a macro. Suppose you want to drop the NewEmp macro from the database. The DROP MACRO statement looks like this:

DROP MACRO NewEmp;

Introduction to Teradata RDBMS

6­5

Chapter 6: Application Development Using Stored Procedures as SQL Applications

Using Stored Procedures as SQL Applications

Introduction

The Stored Procedure Language (SPL) is the programming language used to create stored procedure applications. SPL conforms to the ANSI SQL-99 (SQL3) with some exceptions. Stored procedures consist of a set of SPL control and condition handling statements that provide a procedural interface to the Teradata RDBMS. The following can be specified in stored procedures: · · · · · Multiple SQL statements Input and output parameter Local variables Procedural constructs Condition handlers

Stored procedures also provide many benefits that embedded SQL does. See SQL Reference, Volume 4 for more information about stored procedures.

Creating and Using Stored Procedures

Teradata SQL supports creating, modifying, dropping, renaming and access rights control of stored procedures through DDL and DCL statements. You can create or modify a stored procedure through the COMPILE command in BTEQ and TeqTalk. See Teradata BTEQ Reference for details. You can also create or modify a stored procedure using the SQL CREATE PROCEDURE or REPLACE PROCEDURE statement from CLIv2, ODBC, and JDBC applications, and the QueryMan utility.

6­6

Introduction to Teradata RDBMS

Chapter 6: Application Development Using Stored Procedures as SQL Applications

The following example shows the CREATE PROCEDURE statement for creating a stored procedure spSample with one output parameter, one local variable declaration and multiple condition handler declarations:

CREATE PROCEDURE spGetEmpNo(OUT pErrorText VARCHAR(50), OUT pEmpNo INTEGER) BEGIN DECLARE vEmpNo INTEGER DEFAULT 0; DECLARE EXIT HANDLER FOR SQLSTATE '42000' BEGIN SET pErrorText = 'NextEmployee table does not exist.'; SET pEmpNo = 0; END; DECLARE EXIT HANDLER FOR SQLSTATE '21000' BEGIN SET pErrorText = 'Multiple rows in NextEmployee table.'; SET pEmpNo = 0; END; DECLARE CONTINUE HANDLER FOR SQLEXCEPTION SET vEmpNo = 0; SELECT NextEmpNo INTO :vEmpNo FROM NextEmployee; UPDATE NextEmployee SET NextEmpNo = :vEmpNo + 1; SET pEmpNo = vEmpNo; SET pErrorText = 'Employee number is: '; END;

Executing Stored Procedures

You can execute stored procedures from Teradata client utilities using the SQL CALL statement. Arguments for all IN, INOUT or OUT parameters of a stored procedure must be submitted with the SQL CALL statement. Stored procedure execution and DDL operations are supported by BTEQ as well as other Teradata client utilities. These include: · · · · CLIv2 JDBC ODBC PP2 Note: DDL statements are not supported from PP2; that is, you cannot create or modify stored procedures from PP2. · · QueryMan TeqTalk (DMTEQ)

Introduction to Teradata RDBMS

6­7

Chapter 6: Application Development Using Stored Procedures as SQL Applications

You can drop or rename a stored procedure, using the following SQL statements: · · DROP PROCEDURE RENAME PROCEDURE

You can view all the parameters and their attributes of a stored procedure, the attributes of the procedure, or the current definition (SPL source text) of the procedure, using the following SQL statements: · · · HELP PROCEDURE ... [ATTRIBUTES] HELP `SPL...' SHOW PROCEDURE

For details of stored procedure execution and parameter usage, see "SQL Stored Procedure Language" in Teradata RDBMS SQL Reference, Vol 6.

6­8

Introduction to Teradata RDBMS

Chapter 6: Application Development Basic Teradata Query Utility

Basic Teradata Query Utility

BTEQ is an SQL front-end utility that runs on all client platforms. It resides on the client portion of either a channel-attached or network-attached system and communicates with one or more Teradata RDBMS systems residing on the server. BTEQ allows you to create and submit SQL queries either interactively or in batch mode from an interactive terminal.

BTEQ Support

BTEQ supports the following facilities: · · · · · Multiple Teradata SQL statements per request Read from and write to client data files Management of multiple sessions per job Format output and write sophisticated reports Create stored procedure objects in the Teradata RDBMS

BTEQ Communication

The client system communicates with the RDBMS as described in the following table:

IF your client system is... THEN communication occurs over a...

channel attached network attached

high-speed I/O channel Local Area Network (LAN)

Introduction to Teradata RDBMS

6­9

Chapter 6: Application Development Call-Level Interface

Call-Level Interface

The Teradata RDBMS uses the Call Level Interface (CLI) or ODBC for all communication between a user terminal and the Teradata RDBMS. Whether used explicitly or implicitly, CLI and ODBC are the basis for all communication between users and the Teradata RDBMS.

What Is the CLI?

The CLI is a library of routines that resides in the user address space and provides the interface between the application program and the TDP or Gateway. The CLI packages SQL requests on a client for routing to the Teradata server. When a results set is returned to the client, the CLI unpackages the results for the system to display to the user or write in a report. The CLI can be used directly in application programs written in any language that supports a CALL statement. The exact implementation of the CLI differs slightly between channel-attached and network-attached clients, but the basic functions of the CLI are to: · · · Block and unblock messages Log sessions on and off the server Provide an interface between users and the TDP (or MTDP)

Like all other messages sent to the Teradata RDBMS, the parser on the server processes all CLI messages as SQL. The CLI makes calls to system subroutines in order to package and unpackage this SQL. The CLI is a very flexible tool for developing applications, because any client language that supports a call statement can use the CLI to communicate with a Teradata RDBMS.

Extended CLI

Extended CLI is a programming interface to the CLI that permits fewer higherlevel calls, thus reducing some of the complexity of the programming. Extended CLI is designed for large-volume, multi-session applications.

6 ­ 10

Introduction to Teradata RDBMS

Chapter 6: Application Development Call-Level Interface

WinCLI

WinCLI is a call level interface for MS-DOS and Windows-based applications. CLI routines are provided as object modules that have been compiled or assembled according to standard linkage conventions. CLI routines are available for many client operating environments including: · · · · · · MVS CICS IMS VM/SP UNIX DOS

Custom ports to other platforms are also available.

Introduction to Teradata RDBMS

6 ­ 11

Chapter 6: Application Development ODBC Driver

ODBC Driver

The ODBC Driver for the Teradata RDBMS provides an alternate, CLIindependent interface to Teradata databases using the industry standard ODBC application programming interface. Windows, Windows NT, and Windows 95 environments support ODBC, which provides import/export of data between the database and a PC.

6 ­ 12

Introduction to Teradata RDBMS

Chapter 6: Application Development Queryman

Queryman

Queryman is a front-end SQL query tool for users and database administrators. You implement Queryman in Visual Basic using ODBC. You can use Queryman to retrieve data from various ODBC databases such as Teradata, Oracle, Sybase, and MS Access, and combine all the retrieved data on your desktop. You can then use this data to produce consolidated results or perform analyses on the data using tools such as MS Excel. Queryman electronically records your SQL activities with data source identification, timings, row counts, and notes. Having this historical data allows you to build a script of the SQL that produced the data. The script is useful for data mining. Other features of Queryman include: · · · · · · · Direct export of data from the DBMS to a file on a PC Direct import of data from the PC into the DBMS Display of selected data in grid format Random sampling Ability to limit the amount of data returned Provision of syntax help for some DBMSs (Teradata, Oracle, Sybase, SQL, Server) Bypassing of GUIs and direct typing of actual SQL queries or retrieving them from the Queryman history window

Introduction to Teradata RDBMS

6 ­ 13

Chapter 6: Application Development Third-Party Products

Third-Party Products

The Teradata RDBMS supports many third party software products. The two general components of supported products include those of the transparency series and the native interface products.

TS/API Products

The Transparency Series/Application Program Interface (TS/API) product provides a gateway between the IBM mainframe relational database products DB2 (MVS/TSO) and SQL/DS (VM/CMS) and the Teradata RDBMS. TS/API permits an SQL statement formulated for either DB2 or SQL/DS to be translated into Teradata SQL to allow DB2 or SQL/DS applications to access data stored in a Teradata RDBMS.

Compatible Third-Party Software Products

Many third-party, interactive query products operate in conjunction with the Teradata RDBMS, permitting queries formulated in a native query language to access a Teradata RDBMS. The list of supported third party products changes continuously. For a current list, contact your NCR sales office.

6 ­ 14

Introduction to Teradata RDBMS

Chapter 6: Application Development The EXPLAIN Statement

The EXPLAIN Statement

Teradata SQL supplies a very powerful EXPLAIN statement that allows you to preview various results of the same query. The EXPLAIN modifier in front of any SQL statement generates a fully parsed and optimized request that is not executed. The EXPLAIN statement not only explains how a statement will be processed, but gives an estimate of how many rows will be involved and the performance impact of the request by providing information about the relative time it will take to execute the query. The size of the EXPLAIN text is unlimited.

How the EXPLAIN Statement Works

Although the optimizer uses indexes to maximize query performance, it does not reformulate a query with which it is presented to make it more efficient. The optimizer only performs that particular query in the most efficient way it knows. The power of EXPLAIN is that it allows you to experiment with different approaches to an answer. Then you can select the approach that performs best. EXPLAIN details what indexes (if any) the optimizer would use to process the request, identifies any temporary files that would be generated, shows whether the transactions for the statement would be dispatched in parallel, and so forth.

Example: EXPLAIN With Unique Primary and Non-Unique Secondary Indexes

The Personnel.Employee table has a unique primary index defined on the EmpNo column and a non-unique secondary index defined on the Name column. The EXPLAIN statement to examine this query looks like this:

EXPLAIN SELECT Name, DeptNo FROM Employee WHERE EmpNo = 10009 ;

The output of the query looks like this:

Explanation ---------------------------------------------------1) First, we do a single-AMP RETRIEVE step from Personnel.Employee by way of the unique primary index "PERSONNEL.Employee.EmpNo = 10009" with no residual conditions. The input table will not be cached in memory. The result pool will not be cached in memory. The estimated time for this step is 0.03 seconds -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.03 seconds.

Introduction to Teradata RDBMS

6 ­ 15

Chapter 6: Application Development The EXPLAIN Statement

Example: EXPLAIN With Where Condition

The employee table request EXPLAIN statement below includes a WHERE condition that is based on a column defined as a non-unique index. The Teradata RDBMS places a READ lock on the table. The EXPLAIN statement to examine the query looks like this:

EXPLAIN SELECT EmpNo, DeptNo FROM Employee WHERE Name = `Smith T' ;

The output of the query looks like this:

Explanation ---------------------------------------------------1) First, we lock PERSONNEL.Employee for read. 2) Next, we do an all-AMPS RETRIEVE step from PERSONNEL.Employee by way of an all-rows scan with a condition of ("PERSONNEL.Employee.Name = `Smith T'") into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated to be 2rows. The estimated time for this step is 0.03 seconds. 3) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0 hours and 0.03 seconds.

6 ­ 16

Introduction to Teradata RDBMS

Chapter 6: Application Development For More Information

For More Information

For more information on the topics presented in this chapter, see the following Teradata RDBMS books:

IF you want to learn more about... THEN see...

Teradata SQL data manipulation statements Embedded SQL

Teradata RDBMS SQL Reference · ·

Teradata RDBMS SQL Reference, Vol 6

Teradata Application Programming (PP2)

BTEQ Using the Teradata SQL preprocessor Stored Procedures Call Level Interface programming ODBC TS/API products

BTEQ Reference Teradata Application Programming (PP2) · · · · · Teradata RDBMS SQL Reference CLI for Channel Attached Systems (CAS) CLI for Network Attached Systems (NAS) Teradata TS/API User's Guide Teradata TS/API Installation Guide

Introduction to Teradata RDBMS

6 ­ 17

Chapter 6: Application Development For More Information

6 ­ 18

Introduction to Teradata RDBMS

Chapter 7:

Fault Tolerance

This chapter provides an overview of the fault tolerance capabilities of the Teradata RDBMS. The Teradata RDBMS addresses the critical requirements of reliability, availability, serviceability, usability, and installability (RASUI) by combining multiple microprocessors (in so-called symmetric multiprocessing, or SMP), parity or mirrored disk storage, and protection of the database from operating anomalies of the client platform. Both hardware and software provide fault tolerance, some of which is mandatory and some of which is optional.

Introduction to Teradata RDBMS

7­1

Chapter 7: Fault Tolerance Software Fault Tolerance

Software Fault Tolerance

This section explains the following Teradata RDBMS facilities for software fault tolerance: · · · · · · Vproc migration Fallback tables AMP clusters Journaling Archive/Recovery Table Rebuild Utility

Vproc Migration

Because the Parsing Engine (PE) and Access Module Process (AMP) are software, they can migrate from their home node to another node within the same hardware clique if the home node fails for any reason. Although the system normally determines which vprocs migrate to which nodes, a user can configure preferred migratory destinations. Vproc migration permits complete operation of the affected "processors" during any node failure.

7­2

Introduction to Teradata RDBMS

Chapter 7: Fault Tolerance Software Fault Tolerance

The following figure illustrates vproc migration, where the large X indicates a failed node, and arrows pointing to nodes still running indicate the migration of AMP3, AMP4, and PE2.

PE1

AMP1

AMP2

AMP3

PE2

AMP4

PE3

AMP5

AMP6

Normal

ARRAY

PE1 AMP2

AMP1 AMP3

AMP4 PE3

AMP6

PE2 AMP5

Recovery

ARRAY

GG01A027

Note: PEs for channel-attached connections cannot migrate during a node failure, because they depend on the channel hardware physically attached to their node.

Fallback Tables

A fallback table is a duplicate copy of a primary table. Each row in a fallback table is stored on an AMP different from the one to which the primary row hashes. This reduces the likelihood of loss of data due to simultaneous losses of the 2 AMPs or their associated disk storage. Note that only business-critical tables need to be fallback-protected and anything else is up to the discretion of the DBA. The disadvantage of this method is that it requires twice the storage space and twice the I/O (on inserts, update, and deletes) of tables maintained without fallback. The advantage is that data is almost never lost because of a down

Introduction to Teradata RDBMS

7­3

Chapter 7: Fault Tolerance Software Fault Tolerance

AMP. Data is fully available during an AMP or disk outage, and recovery is automatic after repairs have been made. The Teradata RDBMS permits the definition of fallback for individual tables. As a general rule, it is wise to run all tables critical to your enterprise in fallback mode. You can run other, non-critical tables in non-fallback mode in order to maximize resource usage. The vproc migration feature offered by systems using disk array technology can safely recover this storage capacity by running without fallback. RAID1 or RAID5 will not provide the same level of protection as fallback does. Moreover, running without fallback does not provide availability of data during an AMP outage. You specify whether a table is fallback or not using the CREATE TABLE (or ALTER TABLE) statement. The default is not to create tables with fallback.

AMP Clusters

Clustering is a means of logically grouping AMPs to minimize (or eliminate) data loss that might occur from losing an AMP. Note that AMP clusters are used only for fallback data. Pictures best explain AMP clustering. The following figure illustrates a situation in which fallback is present without AMP clustering.

AMP1 Primary copy area Fallback copy area 1,9,17 21,22,15 AMP5 Primary copy area Fallback copy area 5,13,21 18,11,4

AMP2 2,10,18 1,23,8 AMP6 6,14,22 19,12,24

AMP3 3,11,19 9,2,16 AMP7 7,15,23 20,5,6

AMP4 4,12,20 17,10,3 AMP8 8,16,24 13,14,7

FG10A001

Note that the fallback copy of any row is always located on an AMP different from the AMP which holds the primary copy. This is an entry-level fault tolerance strategy. The data on AMP3 is fallback protected on AMPs 4, 5, and 6. If AMP3 fails, then the data is still available on the other AMPs. However, if AMPs 3 and 6 fail at the same time, then the system becomes unavailable.

7­4

Introduction to Teradata RDBMS

Chapter 7: Fault Tolerance Software Fault Tolerance

The following figure illustrates a way around this problem: clustering.

AMP1 Primary copy area Fallback copy area 1,9,17 2,3,4

AMP2 2,10,18 1,11,12 Cluster A

AMP3 3,11,19 9,10,20

AMP4 4,12,20 17,18,19

AMP5 Primary copy area Fallback copy area 5,13,21 6,7,8

Cluster B AMP6 6,14,22 5,15,16

AMP7 7,15,23 13,14,24

AMP8 8,16,24 21,22,23

FG10A002

The illustration shows the same 8-AMP configuration now partitioned into 2 AMP clusters of 4 AMPs each. A cluster is a group of from 2 to 16 AMPs in which each primary row in the cluster is fallback protected on another AMP in the same cluster. Compare this clustered configuration with the earlier illustration of an unclustered AMP configuration. In the example, the (primary) data on AMP3 is backed up on AMPs 1, 2, and 4 and the data on AMP6 is backed up on AMPs 5, 7, and 8. If AMPs 3 and 6 fail at the same time, the system continues to function normally. Only if two failures occur within the same cluster does the system halt. Performance is the primary factor that determines cluster size. While 2-AMP clusters provide maximum protection against system loss (because the likelihood of both AMPs in a cluster going down simultaneously is very small), this configuration also suffers from a higher workload per AMP in the event of a failure. It is recommended that each cluster contain 4 AMPs. This configuration maximizes the tradeoff between the probability of a failure and performance degradation.

Journaling

The Teradata RDBMS permits several different kinds of journaling. The system does some journaling, while a user specifies other journaling.

Introduction to Teradata RDBMS

7­5

Chapter 7: Fault Tolerance Software Fault Tolerance

The following table explains the different journaling capabilities of the Teradata RDBMS:

Journal Type Description Maintained By

Down AMP recovery journal Transient journal

· · · · ·

Is active during an AMP failure only Includes journals fallback tables only Is discarded after the down AMP recovers Logs BEFORE images for all transactions Is used by system to roll back failed transactions aborted either by the user or by the system Captures: BT/ET images for all transactions Before images for updates and deletes Row IDs for inserts Control records for creates and drops Keeps each image on the same AMP as the row it describes Discards images when the transaction or rollback completes Is active continuously Is available for tables or databases Provides rollforward for hardware failure recovery Provides rollback for software failure recovery Provides full recovery of nonfallback tables Reduces need for frequent, full-table archives

System (automatic)

System (automatic)

· ­ ­ ­ ­ · · Permanent journal · · · · · ·

User (optional)

7­6

Introduction to Teradata RDBMS

Chapter 7: Fault Tolerance Software Fault Tolerance

Archive/Recovery

This section describes using the Archive/Recovery and Archive Storage Facility 2 (ASF2) utilities to copy and restore or recover a table or database. The Archive/Recovery utility backs up data to an IBM channel-attached client, while the ASF2 utility performs a local backup or a backup to a networkattached client.

If you want to... Then...

archive data

copy selected or all tables or databases from your Teradata RDBMS, and archive your Data Dictionary immediately after you archive your data. Note: If your system is used only for decision support and is updated regularly with data loads, you may not want to archive the data. Consult your DBA or system programmer about the advisability of not making regular archives of your databases.

restore data

copy an archive from the client or server back to the database, and restore data to all AMPs, to clusters of AMPs, or to a specific AMP (as long as the data dictionary contains the definitions of the table or database you want to restore). Note: If the table does not have a definition in the data dictionary because of a DROP or RENAME statement, you can still restore data using the COPY statement.

recover data

use a ROLLBACK or ROLLFORWARD statement to apply before or after journal images.

Table Rebuild Utility

Use the Table Rebuild utility to recreate a table, database, or entire disk on a single AMP under the following conditions: · · The table structure or data is damaged because of a software problem, head crash, power failure, or other malfunction. The affected tables are enabled for fallback protection.

Table rebuild can create all of the following on an AMP-by-AMP basis: · · · · Primary or fallback portions of a table An entire table (both primary and fallback portions) All tables in a database All tables on an individual AMP

The Table Rebuild utility can also remove inconsistencies in stored procedure tables in a database. A System Engineer, Field Engineer, or System Support Representative usually runs the Table Rebuild utility.

Introduction to Teradata RDBMS

7­7

Chapter 7: Fault Tolerance Hardware Fault Tolerance

Hardware Fault Tolerance

The Teradata RDBMS provides the following facilities for hardware fault tolerance: · · · · · · · · Dual BYNETs RAID disk units Multiple channel and LAN connections Isolation from client hardware defects Battery backup Redundant power supplies and fans Hot swap capability for node components Cliques

Dual BYNETs

Multinode Teradata RDBMS servers are equipped with two BYNETs. Interprocessor traffic is never stopped unless both BYNETs fail. Within a BYNET, traffic can often be rerouted around failures.

RAID Disk Units

Teradata RDBMS servers use Redundant Array of Independent Disks (RAID) configured for use as RAID1, RAID5, or RAIDS. JBOD is not a supported option. RAID1 arrays offer mirroring (identical copies of data are maintained). RAID5 or RAIDS protect data from single-disk failures with a 25 percent increase in disk storage to provide parity.

Multiple Channel and LAN Connections

In a client-server environment, multiple channel connections between mainframe and network-based clients ensure that most processing continues even if 1 or several connections between the clients and server are not working. The migrating vproc feature is a software feature supporting this hardware issue.

Isolation From Client Hardware Defects

In a client-server environment, a server is isolated from many client hardware defects and can continue processing in spite of such defects.

7­8

Introduction to Teradata RDBMS

Chapter 7: Fault Tolerance Hardware Fault Tolerance

Battery Backup

All cabinets have battery backup in case of building power failures.

Redundant Power Supplies and Fans

Each cabinet in a configuration has redundant power supplies and fans to ensure fail-safe operation.

Hot Swap Capability for Node Components

Teradata RDBMS offers the following components with hot swap capability: · · · RAID arrays Fans Power supplies

Cliques

A clique is a group of nodes sharing access to the same disk arrays. The nodes and disks are interconnected via shared SCSI buses and each node can communicate directly to all disks. This architecture provides and balances data availability in the case of a node failure. Cliques are the physical medium that supports the migration of vprocs during when nodes fail. If a node in a clique fails, then the vprocs in that node migrate to other nodes in the clique and continue to operate while their home node recovers. Migration minimizes the performance impact on the system. PEs for channel-attached hardware cannot migrate, because they depend on the hardware that is physically attached to the assigned node. PEs for LAN-attached connections do migrate when a node failure occurs, as do all AMP vprocs. Note: To ensure maximum fault tolerance, do not place any of the nodes in a clique within the same cabinet. Usually the battery backup feature makes this precaution is unnecessary, but if you want maximum fault tolerance, then plan your cliques so the nodes are never in the same cabinet.

Introduction to Teradata RDBMS

7­9

Chapter 7: Fault Tolerance For More Information

For More Information

For more information on the topics presented in this chapter, see the following Teradata RDBMS books.

IF you want to learn more about... THEN see...

Physical database design Restore/Recovery utilities

Teradata RDBMS Database Design · · Teradata Archive/Recovery Reference Teradata Archive Storage Facility 2 (ASF2) Administration and Operations Guide

Table Rebuild utility

Teradata RDBMS Utilities Vol 2, L-Z, "Table Rebuild Utility")

7 ­ 10

Introduction to Teradata RDBMS

Chapter 8:

Concurrency Control and Transaction Recovery

This chapter describes the concurrency control in relational database management systems and how to use transaction journaling to recover lost data or restore an inconsistent database to a consistent state. The initial sections of this chapter deal with the concepts of transactions and locks. The latter sections describe the closely related topics of concurrency control and recovery.

Concurrency Control

Concurrency control involves preventing concurrently running processes from improperly inserting, deleting, or updating the same data. A system maintains concurrency control through two mechanisms: · · Transactions Locks

Recovery

Recovery is a process by which an inconsistent database is brought back to a consistent state. Transactions play the critical role in this process, because they are used to "play back" a series of updates (using the term in its most general sense) to the database, either taking it back to some earlier state or bringing it forward to a current state.

Introduction to Teradata RDBMS

8­1

Chapter 8: Concurrency Control and Transaction Recovery Concept of a Transaction

Concept of a Transaction

This section describes the concept of a transaction. Transactions are a mandatory facility for maintaining the integrity of a database while running multiple, concurrent operations.

Definition of a Transaction

A transaction is a logical unit of work. It is both the unit of work and the unit of recovery. The statements nested within a transaction must either all happen or none happen. Transactions are atomic: a partial transaction cannot exist.

Definition of Serializability

A set of transactions is said to be serializable if and only if it produces the same result as some arbitrary serial execution of those same transactions for arbitrary input. A set of transactions is correct only if it is serializable. Use of a Two-Phase Locking (2PL) protocol may serialize transactions. The two phases are the growing phase and the shrinking phase. In the growing phase, for any object, a transaction must first acquire a lock on that object before operating on it. In the shrinking phase, a transaction must never acquire any more locks once it has released a lock. Lock release is an all-or-none operation.

Transaction Semantics

The Teradata RDBMS supports both ANSI transaction semantics and Teradata transaction semantics. A system parameter specifies the default transaction mode for a site. However, you can override the default for a session. The Teradata RDBMS returns an error when a transaction operating in Teradata semantics mode issues a COMMIT statement. The Teradata RDBMS supports the ANSI COMMIT statement in ANSI transaction mode.

ANSI Mode Transactions

All ANSI transactions are implicit. Either of the following events opens an ANSI transaction: · · Execution of the first SQL statement in a session Execution of the first statement following the close of a previous transaction

8­2

Introduction to Teradata RDBMS

Chapter 8: Concurrency Control and Transaction Recovery Concept of a Transaction

Transactions close when the application performs a COMMIT, ROLLBACK, or ABORT statement. The last statement in a transaction must be a data definition statement (including DATABASE and SET SESSION, which are considered DDL statements in this context). A session executing under ANSI transaction semantics allows neither the BEGIN TRANSACTION statement, the END TRANSACTION statement, nor the two-phase commit protocol. When an application submits these statements in an ANSI situation, the database software generates an error.

Rolling Back an ANSI Transaction

In ANSI mode, the system rolls back the entire transaction if the current request: · · · Results in a deadlock Performs a DDL statement that aborts Executes an explicit ROLLBACK or ABORT statement

Teradata accepts the ABORT and ROLLBACK statements in ANSI mode, including conditional forms of those statements. If the system detects an error for either a single or multistatement request, it only rolls back that request, and the transaction remains open, except in special circumstances. Application-initiated, asynchronous aborts also cause full- transaction rollback in the ANSI environment.

Teradata Mode Transactions

Teradata mode transactions can be either implicit or explicit. Multistatement requests and macros are examples of implicit transactions. The types of transactions, which embedded SQL applications pass are examples of explicit transactions.

Introduction to Teradata RDBMS

8­3

Chapter 8: Concurrency Control and Transaction Recovery Concept of a Transaction

Consider the following COBOL program with embedded SQL and transactions (the program shows no COBOL code, but the embedded SQL code is of the type which COBOL SQL programs require).

EXEC SQL BEGIN TRANSACTION END-EXEC EXEC SQL DELETE FROM Employee WHERE Name = `Smith T' END-EXEC EXEC SQL UPDATE Department SET EmpCount=EmpCount-1 WHERE DeptNo=500 END-EXEC EXEC SQL END TRANSACTION END-EXEC

If an error occurs during the processing of either the DELETE or UPDATE statement within the BEGIN TRANSACTION and END TRANSACTION statements, the system restores both Employee and Department tables to the states at which they were before the transaction began. If an error occurs during a Teradata transaction, then the system rolls back the entire transaction.

How To Undo an Update

You can undo an update by applying a transaction log or journal to the database to write it back to the state at which it was before the transaction began. The journal contains before images of the database, with which you can undo a transaction. You can also redo a transaction by using a transaction log of after images of the database. Transactions begin and end at a checkpoint or synchronization point. The transaction recovery system uses these checkpoints to apply the data to exactly the right time to recover the database to an earlier or later state.

8­4

Introduction to Teradata RDBMS

Chapter 8: Concurrency Control and Transaction Recovery Concept of a Lock

Concept of a Lock

A lock is a means of claiming usage rights to some resource. You can lock several different types of resources in several different ways.

Overview of Teradata RDBMS Locking

Most locks used on Teradata resources are locked automatically by default. Users can override some locks by making certain lock specifications, but the system only allows overrides when it can assure data integrity. The data integrity requirement of a request decides the type of lock that the system uses. A request for a locked resource by another user is queued until the process using the resource releases its lock on that resource. The Teradata lock manager implicitly locks the following objects:

Object Locked Description

Database Table View Row hash

Locks rows of all tables in the database Locks all rows in the table and any index and fallback subtables Locks all underlying tables in the view Locks the primary copy of a row (all rows that share the same hash code)

Why do Database Management Systems Require Locking?

The lost update anomaly best explains why database management systems, in which multiple processes are accessing the same database, require locks.

Introduction to Teradata RDBMS

8­5

Chapter 8: Concurrency Control and Transaction Recovery Concept of a Lock

The following figure provides an example of this anomaly.

Execution of transaction T1

Database $500.00

Execution of transaction T2

READ Balance

$500.00

$500.00

READ Balance

Add $1,000.00

$1,500.00

$2,500.00

Add $2,000.00

$1,500.00 WRITE result to database $2,500.00 WRITE result to database

FG11A001

This example shows a nonserialized set of transactions. If locking had been in effect, the database would not have been able to add $3000.00 to $500.00 and get two different and wrong results. This example demonstrates the most common problem encountered in a transaction processing system without locks. Although several other problems arise when locking is not in effect, the lost update problem sufficiently illustrates the need for locking.

User Lock Levels

A user can lock the following resource types in a Teradata database: · · · · Database Table View Row Hash

8­6

Introduction to Teradata RDBMS

Chapter 8: Concurrency Control and Transaction Recovery Concept of a Lock

Levels of Lock Types

Users can apply four different levels of locking on Teradata resources. The following table explains these levels.

Lock Type Description

Exclusive

The requester has exclusive rights to the locked resource. No other process can read from, write to, or access the locked resource in any way. Exclusive locks are generally only necessary while a database undergoes structural changes.

Write Read

The requester has exclusive rights to the locked resource except for readers not concerned with data consistency. The requestor has exclusive rights to the locked resource while reading that resource. Read locks ensure consistency during read operations such as those that occur during a SELECT statement. Several users can hold Read locks on a resource, during which the system permits no modification of that resource.

Access

The requestor does not care about the consistency of the data while accessing the database. An access lock permits modifications on the underlying data while the SELECT operation is in progress.

This same information is illustrated in the following table:

Lock Type Held None Access Read Write Exclusive

Lock Request

Access Read Write Exclusive

Granted Granted Granted Granted

Granted Granted Granted Queued

Granted Granted Queued Queued

Granted Queued Queued Queued

Queued Queued Queued Queued

Introduction to Teradata RDBMS

8­7

Chapter 8: Concurrency Control and Transaction Recovery Concept of a Lock

Teradata Automatic RDBMS Lock Levels

The Teradata RDBMS applies most of its locks automatically. The following table illustrates how Teradata applies different locks for various types of SQL statements:

Locking Level by Access Type Type of SQL Statement UPI/NUPI/USI NUSI/Full Table Scan Locking Mode

SELECT UPDATE DELETE INSERT CREATE DATABASE DROP DATABASE MODIFY DATABASE CREATE TABLE DROP TABLE ALTER TABLE

Row Hash Row Hash Row Hash Row Hash Not applicable

Table Table Table Not applicable Database

Read Write Write Write Exclusive

Not applicable

Table

Exclusive

Deadlocks and Deadlock Resolution

A deadlock occurs when transaction 1 places a lock on resource A, and then needs to lock resource B. But resource B has already been locked by transaction 2, which in turn needs to place a lock on resource A. This state of affairs is called a deadlock or a deadly embrace. The Teradata RDBMS resolves deadlocks by aborting one of the transactions. If the transaction originated from BTEQ, then BTEQ resubmits it. Any other client software may or may not resubmit the transaction.

8­8

Introduction to Teradata RDBMS

Chapter 8: Concurrency Control and Transaction Recovery Host Utility Locks

Host Utility Locks

The locking operation that the Archive/Storage Facility 2 (ASF2) and clientresident Archive/Recovery utilities use is very different from the locking operation which the Teradata RDBMS performs. The Teradata RDBMS documentation and utilities frequently refer to archive locks HUT (Host Utility) locks.

HUT Lock Types

Teradata RDBMS places HUT locks as follows:

Lock Type Object Locked

Read Group Read

Any object being dumped Rows of a table being dumped if and only if the table is defined for an after-image permanent journal and if you selected the appropriate option on the DUMP command Permanent journal table being restored All tables in a ROLLFORWARD or ROLLBACKWARD during recovery operations Journal table being deleted. Any object being restored

Write Write Write Exclusive

HUT Lock Characteristics

HUT locks have the following characteristics: · · · · · Associated with the currently logged-on user who entered the statement rather than with a job or transaction Placed only on objects on the AMPs that are participating in a utility operation Placed at the cluster level during a CLUSTER dump Never conflict with a utility lock at another level that was placed on the same object for the same user Remain active until they are released either by the RELEASE LOCK option of the utility command or by the execution of a Teradata SQL RELEASE LOCK statement after a utility operation completes Automatically reinstated following a Teradata RDBMS restart if they had not been released

·

Introduction to Teradata RDBMS

8­9

Chapter 8: Concurrency Control and Transaction Recovery System and Media Recovery

System and Media Recovery

This section describes how the Teradata RDBMS restarts itself after a system or media failure.

System Restarts

Unscheduled restarts occur for one of the following reasons: · · · AMP or disk failure Software failure Parity error

Failures and errors affect all software recovery in the same way. Hardware failures take the affected component offline and it remains offline until repaired or replaced.

Transaction Recovery

Two types of automatic recovery of transactions can occur when an unscheduled restart occurs: · · Single transaction recovery RDBMS recovery

The following table details what happens when these two automatic recovery mechanisms take place:

This recovery type... Happens when the RDBMS...

single transaction

aborts a single transaction because of: · Transaction deadlock timeout · User error · User-initiated abort command · An inconsistent data table · Unavailable resources for parsing Single transaction recovery uses the transient journal to effect its data restoration.

RDBMS

RDBMS restart is caused by: · Hardware failure · Software failure · User command

8 ­ 10

Introduction to Teradata RDBMS

Chapter 8: Concurrency Control and Transaction Recovery System and Media Recovery

Down AMP Recovery

When an AMP fails to come online during system recovery, the RDBMS continues to process transactions using fallback data. When the down AMP comes back online, down AMP recovery procedures begin to bring the data for the AMP up to date as follows:

IF there are... THEN the AMP recovers...

a large number of rows to be processed only a few rows to be processed

offline online

Once all updates are made, we consider the AMP to be fully online.

Introduction to Teradata RDBMS

8 ­ 11

Chapter 8: Concurrency Control and Transaction Recovery Two-Phase Commit Protocol

Two-Phase Commit Protocol

Two-phase commit (2PC) is a protocol for assuring concurrency of data in multiple databases in which each participant database manager votes to either commit or abort the changes. The participants wait before committing the change until they know that all participants can commit. By voting to commit, the participant guarantees that it can either commit or rollback its part of the transaction, even if it crashes before receiving the result of the vote. The 2PC protocol allows the development of CICS and IMS applications that can update one or more Teradata RDBMS databases and/or databases under some other DBMS in a synchronized manner. The result is that all updates requested in a defined unit of work will either succeed or fail.

Definition of Participant

A participant is a database manager that performs some work on behalf of the transaction, and that commits or aborts changes to the database. A participant can also be a coordinator of participants at a lower level. In such cases, the coordinator/participant relays a vote request to its participants, and sends its vote to the coordinator only after determining the outcome of its participants. Any number of participants can engage in a two-phase commit operation. A participant is defined as being in doubt from the time it votes to commit or abort until the time it receives a commit or abort instruction from the coordinator, which is the controlling database manager with respect to the distributed situation. A transaction is in doubt if any of the participants are in doubt.

Definition of Coordinator

The coordinator is never in doubt. Selection of the coordinator is arbitrary. However, with respect to the Teradata RDBMS, it is always either IMS or CICS. There can be only one coordinator per transaction at any given time.

8 ­ 12

Introduction to Teradata RDBMS

Chapter 8: Concurrency Control and Transaction Recovery For More Information

For More Information

For more information on the topics presented in this chapter, see the following Teradata books.

IF you want to learn more about... THEN see...

Specifying transactions in an embedded SQL program Two-phase commit

· · · ·

Teradata RDBMS SQL Reference, Vol.6 Application Programming (PP2) Teradata TDP Reference Teradata Client for MVS Installation Guide

Transaction processing in general

Teradata RDBMS SQL Reference

Introduction to Teradata RDBMS

8 ­ 13

Chapter 8: Concurrency Control and Transaction Recovery For More Information

8 ­ 14

Introduction to Teradata RDBMS

Chapter 9:

Security and Integrity

This chapter describes security and integrity for the Teradata RDBMS. Among the topics described are: · · · · · Establishing a security policy Client password security Server password security Teradata SQL Data Control Language commands for granting and revoking privileges Single sign on for Windows 2000 platforms

The descriptions include both client and server security and Teradata RDBMS user privileges. The Teradata RDBMS provides C2 level or equivalent security. For details, see "Running a Secure Teradata RDBMS" in Teradata RDBMS Security Administration.

Definition of Security

Security is the protection of data against unauthorized access. You can secure programs and data by issuing identification numbers and passwords to authorized users of a computer. The operating system can check passwords to prevent users from logging onto the system in the first place, or the system can check passwords in software, such as in an RDBMS, where each user is assigned an individual view of the database. Although you can take precautions to detect an unauthorized user, determining if a valid user is performing unauthorized tasks is extremely difficult.

Definition of Integrity

Integrity is the process of preventing accidental erasure or corruption of data in a database.

Introduction to Teradata RDBMS

9­1

Chapter 9: Security and Integrity

Enforcing System Security

The four categories of solutions for system security are:

Category Description

Resource access control Physical access control Auditing and accountability Policy

software-enforced access restrictions physical access restrictions system auditing of security-related user actions a sound, well-enforced data center security policy

System Integrity

The Teradata RDBMS provides support for referential integrity to ensure that every foreign key matches a primary key. Users may also provide their own facilities for monitoring referential integrity in the Teradata RDBMS. You can also write macros containing fields that act as foreign keys to enforce the referential integrity of each table in your system. For more information, see Chapter 3: "The Relational Model."

9­2

Introduction to Teradata RDBMS

Chapter 9: Security and Integrity Resource Access Control

Resource Access Control

This section introduces the Teradata software tools you can use to enforce access restrictions. These tools include: · · · · · · User identifiers (user names) Channel or LAN identifiers (host, or client identifiers) Logon policies TDP user security interface Client security Single Sign On

User Identifiers

Teradata access control is based on a user identifier. The security administrator can optionally enforce a channel- or LAN-client identifier as well. A user name is the name defined in a CREATE USER statement. The security administrator must perform one CREATE USER statement for each authorized user in order to establish the user name, define its password, and allocate user disk space. The DBase table stores user names and database names and resides in the space allocated to a system user named DBC. You can retrieve information about user names from the DBC.DBase table by querying the system view named DBC.Users.

Client Identifiers

Any number of different client types can connect to the Teradata RDBMS server. Each connection must have its own unique client identifier. You use the Configuration utility to assign each connection a unique value and define the value to the Teradata RDBMS. Each defined value functions as a client identifier or hostid.

Logon Policies

Users must issue a logon request so the Teradata RDBMS can identify the user and establish a session. The logon string must include a user name that is already established in the system DBase table.

Introduction to Teradata RDBMS

9­3

Chapter 9: Security and Integrity Resource Access Control

The logon string may also include any combination of the following operands:

Operand Definition

tdpid

Each copy of the TDP on a given channel-attached client is assigned a unique tdpid to identify it. The tdpid is a client-based operand and is not transmitted to the Teradata RDBMS. A password authenticates a user request to initiate a Teradata session under the supplied user name. To create a password: The security administrator can use the CREATE USER statement to establish a password for a user. The default is that the password must appear in the user logon string. To logon without a password: If you enable the security administrator user, the security administrator can create a current GRANT LOGON statement containing WITH NULL PASSWORD option for the user.

On IBM mainframe clients... On Windows 2000 clients...

password

TDP security user exit TDPLGUX must acknowledge that the logon string is valid without a password.

Single Sign On provides the ability to use industry standard network authentication to identify users. See "Single Sign On" on page 9-5 for information about this feature.

Note: Because the null password applies only to logging onto the Teradata RDBMS, all other system security measures continue to be enforced. acctid The account id can be used for resource accounting. Each user name may have one or more acctids. The logon processor assigns a default value for the acctid if it detects none in the logon string for a user. The acctid can also contain a priority-level prefix that can be used when interactive users are competing for system resources with longrunning batch jobs. For more information see "Priority Scheduler" in Teradata RDBMS Utilities, Vol 2.

TDP Security

IBM mainframe clients running either MVS or VM have the option of enforcing security at the TDP level using tdpids. The TDP provides a user logon exit called TDPLGUX which you can embed in a user-written routine to process logon requests. Using TDPLGUX, you can reject, accept, provide, or modify any logon request to the Teradata RDBMS.

9­4

Introduction to Teradata RDBMS

Chapter 9: Security and Integrity Resource Access Control

TDPLGUX also permits users to set any of the following options: · · · No logon string (implicit logon) A user id for which the user routine provides a password A user id that can be validated to require no password

You can use TDPGLUX alone or in conjunction with any security package such as: · · · RACF CA-ACF2 CA-TOP SECRET

Single Sign On

When available, the Single Sign On feature allows users of the Teradata RDBMS on Windows 2000 systems to access Teradata RDBMS based on their authorized network usernames and passwords. This feature simplifies the log-on procedure that requires users to enter an additional username and password when running client applications that accessed the database. For the Single Sign On feature to work, it must be enabled on the Teradata RDBMS server as well as on the Gateway. The DBA can turn Single Sign On OFF or ON for the RDBMS using one of the following: · · Teradata RDBMS Database Window (DBW) (DBW is the preferred way.) DBS Control

To turn the feature On or Off on the Gateway, the DBA can use the Gateway Global utility. Note: Single Sign On the Teradata RDBMS is not available on MP-RAS systems. Authentication can be accomplished in several ways. A new field has been added to the Gateway Global utility to indicate the authentication method the client used to log on to the database. The Authentication field has four values:

IF the field contains... THEN authentication was provided by...

DATABASE NEGOTIATE NTLM KERBEROS

the database. This was the method used before Single Sign On was implemented. Windows Negotiate. Windows NTLM. Windows kerberos.

For information about DBS Control, see "DBS Control Utility" in Teradata RDBMS Utilities, Vol. 1 A-K and for information about Gateway Global, see

Introduction to Teradata RDBMS

9­5

Chapter 9: Security and Integrity Resource Access Control

"Gateway Global Utility" in Teradata RDBMS Utilities, Vol 1 A-K. For more information about DBW see Teradata RDBMS Database Window. Single Sign On provides the following benefits: · · Enhances site security because authentication mechanisms do not send passwords across the network Supports the use of alternative security mechanisms that automate logon by eliminating the need for an application to declare or store a password on your client system Saves time

·

Note: The Single Sign On feature requires implementation not only in Teradata RDBMS server software but also in Teradata Utilities Foundation (TUF) client software. As of the publication date for the Release Summary for V2R4.1, Single Sign On is scheduled for implementation in TUF 6.1. Pre-TUF 6.1 applications co-exist with V2R4.1 with Single Sign On unavailable. TUF 6.1 and later applications co-exist with pre-V2R4.1 with the feature disabled.

Password Security Features

You can use a number of add-on features to enhance Teradata RDBMS password security. The following table lists and describes these features:

Password Feature Description

Expiration

Defines a time span during which the password is valid. After that duration, the user must change the password. Restricts the number of characters, digits, or special characters permitted in a password. Defines the sequential number of erroneous logon attempts permitted before locking the user from further attempts. Sets the time duration of the user lock after the user has exceeded the maximum number of erroneous logon attempts. Defines the time span that must elapse before you can reassign a previously used password.

Number of characters/ digits/special characters Maximum logon attempts

Lockout time

Reuse

9­6

Introduction to Teradata RDBMS

Chapter 9: Security and Integrity Resource Access Control

DBC.SysSecDefaults Table

The DBC.SysSecDefaults table stores password information for the Teradata RDBMS. For details on the columns in DBC.SysSecDefaults, see "System Tables" in Teradata RDBMS Data Dictionary.

Password Encryption

Teradata passwords are encrypted. The PasswordString field of the DBC.DBase table stores the encrypted information.

Logon Control Using GRANT and REVOKE LOGON

The Teradata Shared Information Architecture allows multiple clients to connect to the RDBMS simultaneously. By default, the system grants logon permission to all users from all connections. However, the RDBMS provides tools for restricting logons from specific clients. Use the statements GRANT LOGON and REVOKE LOGON to associate specific user names with specific client (host) ids. You can only grant logons using GRANT LOGON if the user is already created in the Teradata RDBMS and if the client (host) id corresponds to a value assigned to a LAN or channel connection by the Teradata RDBMS. You can retract the privileges granted by a GRANT LOGON statement by using the REVOKE LOGON statement. For details on the four permissions available via GRANT LOGON, see "Controlling Acces to Teradata" in Teradata RDBMS Security Administration.

Controlling Data Access by Granting or Revoking Access

The first level of access to the Teradata RDBMS is at the level of the user and the database. This section discusses explicit access rights as controlled by the GRANT and REVOKE statements. These statements grant or remove from a user or group of users one or more privileges on a database, user, table, view, stored procedure, or macro. You must be an owner of the object being controlled or must have GRANT/REVOKE privileges to the object before you can submit GRANT or REVOKE statements. If the object is a view, stored procedure, or macro, then the owner must also have the GRANT privilege and any other applicable privileges on the object or objects referenced by the view, stored procedure, or macro.

Introduction to Teradata RDBMS

9­7

Chapter 9: Security and Integrity Resource Access Control

You cannot grant more privileges on an object than you have yourself on that object. When a user explicitly grants privileges to another user or database, certain rules determine whether, how, and on what object the requested privilege is implemented.

DD System Views for Access Information

The Teradata RDBMS supplies numerous system views for accessing information in the Data Dictionary. These views provide information about users and access rights and grant, logon, and access activities. For details, see "System Views" in Teradata RDBMS Data Dictionary.

9­8

Introduction to Teradata RDBMS

Chapter 9: Security and Integrity Establishing a Security Policy and Physical Access Control

Establishing a Security Policy and Physical Access Control

This section introduces the methods you can use to ensure physical access to your Teradata RDBMS and the hardware on which it runs.

Key Elements of a Security Policy

The security policy for your Teradata RDBMS should include two essential implementation elements: · · System-enforced security features Personnel-enforced security features

You should write a set of security policies and procedures to be distributed to all users of the system. Among the topics you should cover in this document are: · · · · Why security is needed Benefits of the security policy for the users and for the company Suggested security actions for users to follow Required security actions for users to follow

Establishing a Security Policy

The primary consideration for physical access control is establishing a security policy. The security policy is based on two principal elements: · · Identification of security needs Identification of policies and procedures to meet those needs

For establishing a security priority for your Teradata RDBMS, see "Introduction to Teradata RDBMS Security" in Teradata RDBMS Security Administration.

Introduction to Teradata RDBMS

9­9

Chapter 9: Security and Integrity For More Information

For More Information

For more information on the topics presented in this chapter, see the following Teradata RDBMS books:

IF you want to learn more about... THEN see...

Security administration Auditing Client (TDP) security System Views related to security

Teradata RDBMS Security Administration Teradata RDBMS Security Administration Teradata TDP Reference Teradata RDBMS Data Dictionary

9 ­ 10

Introduction to Teradata RDBMS

Chapter 10:

System Administration

This chapter discusses space allocation, accounting, diagnostics and maintenance, and monitoring on the Teradata RDBMS as they relate to system administration. Previous chapters discuss security and the Data Dictionary.

Space Allocation

Space allocation for the Teradata RDBMS relates not only to the disk space which databases require, but to the space required to define users.

Accounting

Accounting is quite broad and includes many subtopics. Those discussed in this chapter pertain to: · · Session management Account usage and security violation logging

Monitoring

This section discusses various aspects of monitoring the Teradata RDBMS, including the facilities for performance monitoring.

Introduction to Teradata RDBMS

10 ­ 1

Chapter 10: System Administration Space Allocation: Databases and Users

Space Allocation: Databases and Users

In the Teradata RDBMS, a database is a collection of related tables, views, stored procedures, and macros. A database also contains an allotment of space from which users can create and maintain their own tables, views, macros, stored procedures, or other users or databases. A database and a user are almost the same thing in the Teradata RDBMS. The difference is that a user can log on to the system whereas the database cannot. A user identifies someone who can log on to both the system and a database.

Creating Databases and Users

When the Teradata RDBMS is first installed on a server, only one user exits on the system: the Database Computer or DBC. The database administrator typically manages this user and assigns space from the user DBC to all other organizations. The user DBC owns all other databases and users in the system. To protect the security of system tables within the Teradata RDBMS, the database administrator typically creates an RDBMS administrator user from DBC. The usual procedure is to assign all RDBMS disk space that system tables do not require to the new administrator database. The database administrator then uses this database as a resource from which to allocate space to the databases and users of the system.

Example: Creating a Finance and Administration Database

Consider the following scenario: the database administrator needs to create a Finance and Administration (F&A) department database with user Jones as a supervisory user, or database administrator, (DBA) within the F&A department. The Teradata RDBMS database administrator (DBA) first creates the F&A database, then allocates space from it to Jones to act as the F&A DBA. The Teradata RDBMS DBA also allocates space from F&A to Jones for his personal use and for creating a Personnel database, other databases, and other user space allocations. When you create a new database or allot space to a user, the system assigns disk space from the space belonging to an existing database or user. The creating database (or user) is the owner of the new database (or user space). The owner permanently grants a specified amount of space to the new database or user, which is then subtracted from the total unused space available to the user.

10 ­ 2

Introduction to Teradata RDBMS

Chapter 10: System Administration Space Allocation: Databases and Users

The following figure shows the hierarchy of this relationship.

DBC User/ Database

System Administrator User/Database

F&A Database

· · ·

Other Department Database

Personnel Database

User Jones

Other Users and Databases for the Department

HD08B001

The F&A Database owns Personnel and all the other department databases. F&A also owns user Jones and all other users within the department. Because the user DBC ultimately owns all other databases and users, it is the final owner of all the databases and user space belonging to the organization. This hierarchical ownership structure provides the owner of a database or user space with complete control over the security of owned data. The owner can archive the database or control access to it by granting or revoking privileges on it. For more information on granting and revoking access privileges, see Chapter 9: "Security and Integrity."

Introduction to Teradata RDBMS

10 ­ 3

Chapter 10: System Administration Space Allocation: Databases and Users

Example: Creating Databases

The previous section explains the concept of databases and users in the Teradata RDBMS environment. This section explains the mechanics of how to create a database from DBC. Before you can create tables, views, users, stored procedures, or macros, you must first create a database. Use the SQL statement CREATE DATABASE to create a database. The following example shows the SQL statement used to create the Personnel database for the Employee and Department tables: CREATE DATABASE Personnel

FROM F&A AS PERMANENT = 10000000 BYTES, SPOOL = 100000000 BYTES FALLBACK, ACCOUNT = `Administration' ;

10 ­ 4

Introduction to Teradata RDBMS

Chapter 10: System Administration Space Allocation: Databases and Users

The following table explains the components of the previous CREATE DATABASE SQL statement:

Component Description

AS FROM

Introduces a clause that specifies one or more database parameters. Introduces an owner database whose disk space is allocated to create the new database.

IF... THEN...

you do not specify an owner database you specify an owner database

disk space allocation comes from the database of the user entering the statement. you must either have the CREATE DATABASE privilege on that database or be its owner.

PERMANENT SPOOL

Specifies the allocation of disk space for permanent tables to the new database. Specifies a space limit for spool files, which are temporary files used to store the interim results of a Teradata SQL statement until they can be examined by a user or application program.

IF... THEN...

you do not specify the SPOOL parameter FALLBACK

the system allocates spool space from the owner database.

Specifies that, in addition to the primary copy of the rows of all database tables, a secondary (fallback) copy is to be distributed among the disks of all AMPs in the RDBMS. FALLBACK is the default value. Note that only business-critical tables need to be fallbackprotected and anything else is up to the discretion of the DBA.

ACCOUNT

Identifies the department or budget responsible for accumulated disk space used by the new database. This information is used for accounting purposes only.

IF... THEN...

you do not specify an account name

the system uses the account name of the owner database.

Introduction to Teradata RDBMS

10 ­ 5

Chapter 10: System Administration Space Allocation: Databases and Users

Example: Creating Users

This section explains the mechanics of how to create a user. The SQL statement for creating a user is CREATE USER. The statement authorizes a new user identification (user name) for the RDBMS and specifies a password for user authentication. Because the system creates a database for each user, the CREATE USER statement is very similar to the CREATE DATABASE statement. The following example shows the SQL statement used to create user Jones in the F&A database:

CREATE USER Jones FROM F&A AS PERMANENT = 1000000 BYTES, SPOOL = 1000000 BYTES PASSWORD = Jan, FALLBACK, ACCOUNT = `Administration' STARTUP = `DATABASE F&A;' ;

The optional STARTUP clause specifies one or more Teradata SQL statements that the system can execute automatically when the user establishes a session. Any user who performs this statement must have a CREATE USER privilege on the owner database or be its owner. The system automatically grants the new user all privileges on tables, views, and macros created in this space. The new user gets only DROP PROCEDURE privilege on the stored procedure objects created in this space. Note: The single sign on feature negates the need for users to enter user names, passwords, and account ids. For more information about this feature see "Logon Control Using GRANT and REVOKE LOGON" on page 9-7.

10 ­ 6

Introduction to Teradata RDBMS

Chapter 10: System Administration Accounting

Accounting

This section describes the accounting options available for the Teradata RDBMS. Among the areas covered are: · · Session management Account usage and security violation logging

Session Management

A user must log on to the Teradata RDBMS and establish a session before they can do any accounting.

Establishing a Session

To establish a session, the user logs on to the RDBMS. The procedure varies depending on the client system, the operating system, and whether the user is an application programmer or a user in an interactive terminal session using BTEQ or a third party query processing product.

Logon Parameters

Logon parameters can include any of the following: · · · · Optional identifier for the RDBMS, called a tdpid User name Password Optional account number

Note: The single sign on feature negates the need for users to enter usernames, passwords, and account ids after they have logged on to the using their authorized user names and passwords. For more information about this feature, see "Logon Control Using GRANT and REVOKE LOGON" on page 9-7.

Session Requests

A session is established once the RDBMS accepts the user name, password, and account number and returns a session number to the process. Subsequent Teradata SQL requests generated by the user and responses returned from the RDBMS are identified by: · · · Host id Session number Request number

Introduction to Teradata RDBMS

10 ­ 7

Chapter 10: System Administration Accounting

The context for the session also includes a default database name which is the same as the user name. When the session ends, the system discards the context and accepts no further Teradata SQL statements from the user.

Account Usage

Most account usage information is best gathered by the client system. You can use client-provided facilities to gather information about RDBMS use by accounts by recording logon/logoff activity. You can also use client-provided solutions to capture attempted security violations. The principal Teradata RDBMS feature for accounting is the optional Account String Expansion (ASE) capability.

Account String Expansion

ASE permits you to use substitution variables to include date and time information in the account ID portion of a user logon string. The system inserts actual values for the variables at Teradata SQL execution time. ASE permits more accurate measurement of individual Teradata SQL statement execution, more precise statistics, more accurate capacity planning, and more information for chargeback and accounting software.

Account Performance Groups

Resource partitions divide system resources for allocation to major user groups. Each session is assigned, either explicitly or implicitly, to a performance group, and each performance group is assigned a proportional resource weight. This allows system administrators to control resources of the group rather than individual users based either on time of day or resource consumption. A Priority Scheduler Facility is used to manage the workload based on the relative priority of each group's resource weight. This weight does not guarantee system responsiveness in a corresponding proportion because responsiveness is a function of overall system activity. Certain agent processes that can be tied to a parent process of a performance group can be marked as distributed performance groups for that performance group. This allows for time used by the distributed process group to be charged to the parent performance group and for scheduler control of session resources that can span multiple tasks.

10 ­ 8

Introduction to Teradata RDBMS

Chapter 10: System Administration Accounting

When an account id prefixed with a group code is provided in a LOGON string, the session is assigned to the associated performance group when the logon is successful. If this form of account id is not present, the session is assigned a default value that corresponds to a medium priority for the default performance group.

Introduction to Teradata RDBMS

10 ­ 9

Chapter 10: System Administration System Monitoring

System Monitoring

This section discusses how the Teradata RDBMS monitors aspects of the system such as: · · · · Teradata Manager System and configuration status Resource usage Performance

Teradata Manager

The Teradata Manager is a production and performance monitoring system used to monitor, control, and administer one or more Teradata servers. The Teradata Manager provides a variety of tools and applications to gather, manipulate, and analyze information about the RDBMS on which you are working. From a single platform, you can query, manipulate, and analyze the information you need.

System and Configuration Status

The Teradata RDBMS is always defined to be in one of several potential states. You can monitor these states from the Database Window. The following table lists and describes the states:

Status Description

Offline

Either the processor to which the RDBMS console is attached or the entire RDBMS has been started offline. The RDBMS cannot be accessed from a client or used for processing.

Startup Logoff Logoff/Quiet Logon Logon/Quiet Reconfig

The system is starting up but is not ready to process requests. No new sessions may log on (logons are disabled), but one or more sessions remain logged on. No new sessions may log on and no sessions are logged on. The system is quiescent. New sessions may log on (logons are enabled) and one or more sessions are logged on. New sessions may log on (logons are enabled), but no sessions are logged on. The reconfiguration program is running.

10 ­ 10

Introduction to Teradata RDBMS

Chapter 10: System Administration System Monitoring

Resource Usage

The Teradata RDBMS has facilities that permit you to monitor the use of resources such as: · · · · CPUs AMPs Disk activity BYNET/Boardless BYNET activity

ResUsage Data

Resource usage (ResUsage) data handling for the Teradata RDBMS is divided into two phases:

Step Action

1 2

Various subsystems gather ResUsage data and the Resource Sampling Subsystem (RSS) collects the data into collect buffers. The collected data is logged to ResUsage tables periodically (as determined by user-defined logging intervals).

The logged ResUsage data is then available for analysis by the various ResUsage macros.

ResUsage Macros

The facilities for analyzing resource usage data are provided by means of a set of ResUsage macros tailored to retrieving information from a set of system views designed to collect and present resource usage information. These views use a base table called DBC.ResUsage for their information source.

ResUsage Data Categories

Each row of ResUsage data contains two broad categories of information: · · Housekeeping, containing identifying information Statistical

Each item of statistical data falls into a defined kind and class. Each kind corresponds to one (or several) different things that may be measured about a resource.

Introduction to Teradata RDBMS

10 ­ 11

Chapter 10: System Administration System Monitoring

ResUsage Data in Summarization Mode

You can activate summarization mode for many ResUsage tables independently. Summarization mode reduces database I/O by summarizing data from multiple vprocs and other objects on each node in one representative row. This summarization reduces detail, but is very useful for exploratory data analysis of performance problems and general resource usage issues.

Controlling Collection and Logging of ResUsage Data

Several mechanisms exist within the Teradata RDBMS for setting the collection and logging rates of ResUsage data. The control sets allow users to do any of the following: · · · Enable or disable ResUsage table on a table by table basis Specify collection and logging periods Summarize the data or not

You can use the statistics collected in DBC.ResUsage to analyze system bottlenecks, determining excessive swapping, and detecting system load imbalances Collection rates control the frequency that ResUsage data is made available to applications. Logging rates control the frequency that ResUsage data is logged to the ResUsage tables. Use the DBW command SET LOG to establish the logging of resource utilization information. The system inserts one row of data into DBC.ResUsage for each sampling interval.

10 ­ 12

Introduction to Teradata RDBMS

Chapter 10: System Administration Performance Monitoring

Performance Monitoring

Several facilities exist for monitoring and controlling system performance. This section briefly discusses many of these facilities.

The TDPTMON

The TDP User Transaction Monitor (TDPTMON) is a client routine that enables a system programmer to write code to track TDP elapsed time statistics.

System Management Facility

The System Management Facility (SMF) is available in the MVS environment only. This facility collects data about Teradata performance, accounting, and usage. Data is grouped into the following categories: · · · Session information Security violations PE stops

The PM/API

The Performance Monitor/Application Programming Interface (PM/API) provides hooks into server performance monitoring routines. The Performance Monitor (PM) provides support for user-developed and NCR-developed applications to monitor Teradata activity within production environments. PM commands permit you to do the following: · Examine: · Current system configuration · Global summary of system status · Activity and current status on a session-by-session basis · Resource usage on a processor-by-processor basis Set the system-wide rate for updating: · Session level statistics · Resource usage statistics Obtain the name corresponding to a given session id, user/database id, or table id Abort session activity and (optionally) log the specified session or sessions off

·

· ·

Introduction to Teradata RDBMS

10 ­ 13

Chapter 10: System Administration Performance Monitoring

Gateway Control Utility

The Gateway Control Utility is a package that supports management of Teradata servers connected to a local area network. The utility provides controls for ensuring that network-attached clients do not damage or in any way modify the operating parameters of other network-attached clients. Services offered by the utility include: · · · · · Displaying configuration information Displaying session status information Displaying session debugging information Enabling/disabling logons Enabling/disabling session event tracing

System Utility Software

System utilities are used to perform maintenance functions on the Teradata RDBMS. They are typically invoked from the Database Window with some exceptions. The system utilities available perform the following functions: · · · · Aborting outstanding transactions Checking for table inconsistencies Specifying logical database configuration (AMPs and PEs) Displaying and modifying the fields of the Control Parameters Globally Distributed Objects (GDO) of the Parallel Database Extension (PDE) software Specifying global runtime flags for database software Providing status for vprocs and permitting manipulation of their attributes Managing LAN connections Displaying and setting various disk space utilization attributes without destroying the data for which the File System is responsible Displaying information used to correct problems within the File System Allocating virtual processors to physical resources Displaying the current database software logical configuration Displaying session status information Displaying recovery status Reconstructing tables from fallback copies (when fallback and clustering are used) Redistributing disk data automatically whenever AMP vprocs are added or removed Displaying special utility locks on databases and tables Initializing the Teradata system tables and all user tables Setting up and updating configurations

· · · · · · · · · · · · · ·

10 ­ 14

Introduction to Teradata RDBMS

Chapter 10: System Administration Performance Monitoring

· · ·

Executing one or more of the standard DIP (Database Initialization Program) SQL scripts packaged with the RDBMS Installing multiple copies of the same software across all nodes of an MPP system Providing real time display of PDE system performance, including systemwide CPU utilization and disk utilization

Resource Check Tools

Resource Check Tools (RCT) is a monitoring tool for Teradata RDBMS. RCT provide ways to: · · Identify when the Teradata RDBMS is hung or significantly congested Simplify system debugging to determine the cause of the hang or congestion

RCT comprises two tools: · · Dbschk Syscheck

Dbschk

Dbschk measures Teradata RDBMS response time. This information can be used to monitor the average or sampling response time of Teradata RDBMS. Dbschk operates as follows:

Stage Process

1 2 3 4

Runs at start up after the system is in a ready (Logons Enables) state. Issues an all-AMP SELECT. Waits for the specified timeout interval then checks on the status of the SELECT. Logs events after a user-defined number of trials, then waits.

Introduction to Teradata RDBMS

10 ­ 15

Chapter 10: System Administration Performance Monitoring

Syscheck

Syscheck automates the task of collecting data across the nodes when Teradata RDBMS hangs by doing the following:

Stage Process

1 2 3

Collects PDE data across all nodes. Summarizes the data. Returns only PDE attributes that meet or fall below the specified WARNING or ALERT threshold. Note: The threshold level is defined in a resource file called syscheckrc.

10 ­ 16

Introduction to Teradata RDBMS

Chapter 10: System Administration For More Information

For More Information

For more information on the topics presented in this chapter, see the following Teradata RDBMS books.

IF you want to learn more about... THEN see...

Performance monitoring

· ·

Teradata RDBMS Resource Usage Macros and Tables Teradata RDBMS PM/API Reference Teradata Manager Installation Guide Getting Started with Teradata Manager

The Teradata Manager

· ·

Performance tuning System utilities System views Archiving data Loading tables Resource Monitoring Tool

Teradata RDBMS Performance Optimization Teradata RDBMS Utilities Teradata RDBMS Data Dictionary Teradata Archive/Recovery Reference Teradata RDBMS Utilities Teradata RDBMS Utilities Teradata RDBMS Database Administration

Introduction to Teradata RDBMS

10 ­ 17

Chapter 10: System Administration For More Information

10 ­ 18

Introduction to Teradata RDBMS

Chapter 11:

System Maximum Capacities

This chapter provides tables that list maximum storage capacities for Teradata RDBMS. Topics in this chapter include: · · · Database and data storage capacities System capacities Session capacities

Note: The information in this chapter is subject to change. Check with your NCR representative for the latest information.

Introduction to Teradata RDBMS

11 ­ 1

Chapter 11: System Maximum Capacities Database and Data Storage Capacities

Database and Data Storage Capacities

The following table lists the database capacities of the Teradata RDBMS servers.The database capacities apply to a single database.

Parameter Value

Maximum number of journal tables per database Maximum number of data tables per database Maximum number of columns per table Maximum number of rows per table Maximum row size Maximum field size Maximum database, user, table, view, macro, index, constraint or column name size Maximum number of columns per index Maximum number of columns per DML statement ORDER BY clause Maximum number of primary indexes per table Maximum number of secondary and join indexes, in any combination, per table Maximum table-level constraints per table Maximum referential constraints per table Maximum number of tables that can be joined Maximum number of columns in foreign and parent keys Maximum number of columns referenced per single table in a hash or join index Maximum number of columns per referenced base table in the fixed part of a join index (when there is a repeating part) Maximum number of columns per referenced base table in a repeating part of a join index Maximum number of columns in a join index (when there is no repeating part)

1 4.2 x 109 256 Limited by disk capacity Approximately 64256 bytes Approximately 64000 bytes 30 bytes 16 16 1 32 100 64 64 16 16 16 16 256

11 ­ 2

Introduction to Teradata RDBMS

Chapter 11: System Maximum Capacities Database and Data Storage Capacities

Parameter Value

Maximum number of materialized global temporary tables per session Maximum number of volatile tables per session

2 000 1 000

Introduction to Teradata RDBMS

11 ­ 3

Chapter 11: System Maximum Capacities System Capacities

System Capacities

The system capacities in the following table apply to an entire Teradata RDBMS configuration.

Parameter Value

Maximum number of databases/users Total data capacity Maximum SQL request size

4.2 x 109 43 GB per AMP or optional 120 GB per AMP 64 KB (Includes SQL statement text, USING data, and parcel overhead)

Maximum number of active transactions Maximum data format descriptor size Maximum SQL title size Maximum error message text size in failure parcel Maximum number of characters in a string constant Maximum number of sectors per data block Maximum number of sessions per PE Maximum number of sessions per Gateway (Maximum Gateways per node is 1) Maximum number of parcels in one message Maximum message size

2048 30 characters 60 characters 255 bytes 32000 255 120 Tunable 1200 maximum certified 256 Approximately 65000 bytes Note: This limit applies to messages to/from host systems and to some internal RDBMS messages.

Maximum number of PEs per system Maximum number of AMP and PE vprocs, in any combination, per system

1024 16384

11 ­ 4

Introduction to Teradata RDBMS

Chapter 11: System Maximum Capacities Session Capacities

Session Capacities

The session specifications in the following table apply to a single session.

Parameter Value

Active request result spool files Parallel steps Parallel steps can be used to process a request submitted within a transaction (which may be either explicit or implicit). The maximum number of steps generated per request is determined as follows: · Per request, if no channels Note: Channels are not required for a prime key request with an equality constraint. · A request that involves redistribution of rows to other AMPs, such as a join or an INSERT-SELECT · A request that does not involve row distribution

16

20 steps

Requires 4 channels.

Requires 2 channels.

Introduction to Teradata RDBMS

11 ­ 5

Chapter 11: System Maximum Capacities For More Information

For More Information

For more information on the topics presented in this chapter, see the following Teradata books.

IF you want to learn more about... THEN see...

designing databases optimizing performance troubleshooting

Teradata RDBMS Database Design Teradata RDBMS Performance Optimization Guide Teradata RDBMS Database Administration

11 ­ 6

Introduction to Teradata RDBMS

Glossary

This glossary defines terms specific to the Teradata RDBMS. It also defines some general-use computer terms critical to an understanding of the Teradata RDBMS, as well as some general-use terms whose definition has been enhanced for the Teradata RDBMS. Words that are italicized in the explanations are themselves defined in the glossary. For your convenience, frequently misunderstood terms are flagged with "cf" (compare) citations to alert you to similar terms for comparison; parenthetical notes to similar or contrasting terms. All command or statement keywords, modifiers, and other reserved words are fully capitalized. In alphabetizing entries in this glossary, a blank character sorts before any letter, and a dash character sorts after any letter.

A

Access Module Process (AMP) An instance (virtual processor) of database management data (tables, rows, indices) with their associated data manipulation processes and their data context (Transaction In Progress table, lock information, disk access information). administrator A special user responsible for allocating resources to a community of users. Administration Workstation The system console of an MPP system. American National Standards Institute (ANSI) An institute that administrates and coordinates the private sector voluntary standardization system for the United States. Teradata SQL conforms to the ANSI standard for SQL usage but adds its own extensions. AMP See Access Module Process. application program A host program that performs a particular function or set of functions that the user desires to perform. array controller A printed circuit board containing a microprocessor and the electronics for operating an array. AWS See Administration Workstation

Introduction to Teradata RDBMS

Glossary ­ 1

Glossary

B

Bandwidth A measure of how fast data can transfer, in bits per second or bytes per second, Megabytes per second, etc. Basic Teradata Query (BTEQ) A host-resident application program that enables a user to execute a series of Teradata SQL requests in either batch or interactive mode. BTEQ can read from or write to host data sets and use more than one Teradata session. BTEQ See Basic Teradata Query. buffer A segment of memory that is set aside to hold data while it is being processed. BYNET The dual interconnection network that allows high-speed communications within a node of a single-node system or between the nodes of an MPP System. byte A common unit for data storage in which information is stored as a string of 8-bit elements.

C

cache High-speed buffer storage that contains frequently accessed instructions and data. It is used to reduce access time. Call-Level Interface (CLI) A library of service routines that reside on the client side. It provides the interface between the client application program and the Teradata Director Program. It also allows SQL statements to be passed directly to the server without being recompiled. Cf: TDP. channel The hardware means by which a central processor is attached to peripheral devices. The path by which data is transferred between a mainframe host and the Teradata RDBMS. channel connection software A program on the server that provides a pathway for applications running on a channel-connected mainframe (host) to access the Teradata RDBMS. Cf: Gateway. Clique A logical group of nodes on the NCR System 5100M that shares access to disk storage. The vprocs on a failed node of a clique can migrate to other nodes within the clique during the recovery process, allowing uninterrupted use of the Teradata RDBMS. cluster A collection of AMPs that handles the fallback data for a given AMP. CPU Central Processor Unit, or Central Processing Unit. In NCR systems, a CPU is one large integrated circuit that performs all arithmetic and logical operations and controls data flows among registers and memory. The Intel Pentium® chip is a CPU.

Glossary ­ 2

Introduction to Teradata RDBMS

Glossary

D

data dictionary A set of relational tables that contains information about the RDBMS and the database objects within it. It is "data about the data" that includes information about disk space, access authorization, ownership and data definitions. database a collection of permanently stored data that is logically related, shared by many users, protected to control user access and managed in such a way to maintain value and integrity. Database Management System (DBMS) Computer procedures that permit the database to be maintained independently of application programs. A database management system provides services for data definition, data manipulation, and data integrity. Data Control Language (DCL) A subset of SQL statements that define user access. It can selectively limit a user's ability to retrieve, add, or modify data. It is also used to grant and revoke access privileges on tables and views. Data Definition Language (DDL) A subset of SQL statements that allows the user to define the database objects and the relationships that exist among them. Examples of DDL are creating or modifying tables and views. Data Manipulation Language (DML) A subset of SQL statements that manipulate, change, or retrieve the data rows of the database. DML lets the user change the information contained in the database and is the most commonly used subset of SQL. DML statements are used to select, update, delete, and insert rows. data integrity The condition where accidental or intentional destruction, alteration, or loss of data does not occur. DBS See Teradata Relational Database Management System. DBS/SQL See Teradata SQL. DBW See Database Window. diagnostic Software routines that test hardware components. If the hardware components are not operating correctly, diagnostic reports are generated to isolate the failures. disk Primary form of data storage. Multiple disks can be connected to a system. disk arrays A configuration of disk drives that combine disks in order to increase capacity, increase speed, and provide fault tolerance operation by utilizing specialized array controllers. These controllers manage and distribute data and parity across the disks. dispatcher The component of the Parsing Engine that controls the sequence in which the steps of the client request are executed and passes them onto the BYNET. It is composed of execution-control and response-control tasks.

Introduction to Teradata RDBMS

Glossary ­ 3

Glossary

Destination Selection Word (DSW) The first sixteen bits of the row hash that results after an index value is passed through the hashing algorithm. Each DSW is mapped to a particular hash map entry which in turn is mapped to the AMP containing the requested data row(s). dump An operational function provided to create an archival copy of a data base, part of a data base, or a collection of databases stored by the Teradata RDBMS. Cf: restore.

E

end user An ordinary user of the Teradata RDBMS, as opposed to a supervisory user or an administrator. An end user cannot create a subordinate user or database, except within its own space.

F

fallback The ability of the Teradata RDBMS to maintain an extra copy of every row of a table in different AMPs. During normal processing, reads are performed using only the primary copy, while updates are performed on both copies. If a failure renders the primary copy inaccessible, the fallback copy is used for reads and updates. The database maintains a log of changes missed by inaccessible copies and automatically applies the missed changes to synchronize the table copy when it becomes accessible. field The basic unit of information stored in the Teradata RDBMS. A field is either null, or has a single numeric or string value. See also column, database, row, table. firmware Programming that is permanently fixed into a subsystem, as opposed to a software system, which is replaced without altering a hardware configuration. foreign key A logical construct in database theory that identifies relationships between tables. The foreign key must exist somewhere as a primary key value. Unlike primary keys, foreign keys permit duplicate values, permit missing or null values, and can be changed. Cf: primary key.

G

Gateway A program on the server that provides a pathway for applications running on a network-connected client to access the Teradata RDBMS. Cf: channel connection software.

Glossary ­ 4

Introduction to Teradata RDBMS

Glossary

H

hash map A one-dimensional array that associates DSW values with specific AMPs. hash synonym When two different index values hash identically. It is sometimes referred to as a hash collision. hashing A way of mapping data records to various physical storage areas. In the Teradata RDBMS, hashing is used to determine at which AMP a given row will be stored. host A general-purpose computer that can execute application programs that access and manipulate information within the Teradata RDBMS. Host System Communication Interface. See HSCI. host utility program One of several programs that executes on the mainframe host computer to archive data from the Teradata RDBMS to tape and/or to restore archived data to the Teradata RDBMS, and to load data to the Teradata RDBMS from a host input file. HSCI The Host System Communication Interface, which consists of the Teradata Director Program, Call-Level Interface, and user-to-TDP communication techniques. These HSCI components enable communication between the userwritten and Teradata application programs and the Teradata RDBMS.

I

I/O Input/Output. A combined term to simplify talking about mixed input and output operations. Input generally means data coming into a computer. Output generally means data going out from a computer.

J

join In Teradata SQL, a select operation that combines information from two or more tables to produce a result. Also the addition of disk storage and nodes to an existing system.

L

LAN See Local Area Network. Local Area Network (LAN) A means of connecting workstations that allows them to communicate with one another. The LAN is usually confined to a limited area, such as a building. lock A method of controlling the type of access a user is allowed to the data. Locks prevent multiple users who are trying to change the same data at the

Introduction to Teradata RDBMS

Glossary ­ 5

Glossary

same time from violating data integrity. Locks are automatically acquired during the processing of a request and released at the termination of the request. logical Pertaining to an entity, record, or grouping of data that is treated as a unit by a software program, as opposed to an entity that is treated as a unit by hardware. Logical Unit Number (LUN) A collection of user slices (UNIX) or partitions (NT), also known as pdisks. A LUN is recognized only by the OS.

M

macros A predefined, stored set of one or more SQL commands, and optionally, report-formatting commands. Macros are used to simplify the execution of frequently used SQL commands. Massively Parallel Processing System Multiple SMP nodes that are connected by the BYNET to form a larger configuration. Mirror In data storage, the secondary copy of a primary group of data. The mirror copy is an exact duplicate (mirror) of the primary data. Usually the mirror copy is stored on a different physical disk than the primary copy, for protection against failure of the primary disk. module A unit of software that typically performs one function or a set of closely related functions and is the smallest unit of software that can be replaced. It may also refer to a unit of hardware that performs a specific function. MPP System See Massively Parallel Processing System. Micro Operating System Interface (MOSI) A library of routines providing OS independence for clients accessing the RDBMS. It allows network-attached platforms to use only one version of the MTDP. multitasking The ability to share the resources of a computer, operating system, or address space among several tasks, or "threads" of execution, where the state of each task is, in general, independent of the state of other tasks.

N

node A processor node consisting of one or more processors that share memory. Peripheral devices such as disk storage may be connected to a node. An interconnect between the nodes facilitates message communication between nodes. Network and channel connections provide for communication to clients and those between interconnected nodes. Non Unique Primary Index (NUPI) A primary index based on a nonunique column set. A NUPI results in even distribution of table rows proportional to the degree of its uniqueness. Cf: UPI, PI.

Glossary ­ 6

Introduction to Teradata RDBMS

Glossary

NUPI see Non Unique Primary Index. Non Unique Secondary Index (NUSI) A secondary index based on a nonunique column set. It is used to prevent a full table scan. Cf: USI, NUPI. NUSI see Non Unique Secondary Index.

O

ODBC See Open Database Connectivity. offline A state of a system or component in which it is logically disconnected from its normal operating environment so that special functions such as service, maintenance, and/or diagnostics can be performed on the component or system in isolation from the rest of the environment. online Any state in which a system is available for users to enter ordinary requests, and in which all normal functions are provided. Open Database Connectivity (ODBC) Microsoft's database programming interface that provides a common language for Windows applications to access databases on a network. Optimizer A component of the Parsing Engine. The optimizer develops the least expensive plan to return the requested response set. Processing alternatives are evaluated and the fastest alternative is chosen. This alternative is converted to executable steps, to be performed by the AMPs, which are then passed to the dispatcher. owner In Teradata SQL, the user who has the ability to grant or revoke all access rights on a database to and from other users. By default, the creator of the database is the owner, but ownership can be transferred from one user to another by the GIVE statement.

P

Parallel Database Extensions (PDE) Message-based kernel service extensions to UNIX System V, Release 4, suitable for a parallel distributed-memory machine. The extensions include network services and message services, memory management services, and file system access services. These services provide parallelism and are available only to a TPA, that is, the Teradata RDBMS. parallel processing Simultaneous or concurrent execution of two or more processes in a single functional unit. parallelism The foundation of Teradata RDBMS in which processing efficiency of large amounts of data is made possible through evenly dividing tasks among available vprocs in order to process an answer set as quickly as possible.

Introduction to Teradata RDBMS

Glossary ­ 7

Glossary

parity an encoded format of the actual data that could be used to detect errors and/or repair the original data. Depending on the RAID format, the pattern of the parity laid out across the disks will vary. Parsing Engine (PE) An instance (virtual processor) of the database management Session Control, Parsing, and Dispatching processes and their data context (caches). PDE See Parallel Database Extensions. PE See Parsing Engine. permanent journal An optional, user-specified, system-maintained journal used for database recovery to a specified point in time. It is used for recovery from unexpected hardware or software disasters by maintaining before and after images of the database. pdisks User slices (UNIX) or partitions (NT) of different disks on one or more ranks that make up a vdisk. primary index A mechanism for assigning a row to or retrieving a row from an AMP. The primary index is made up of either one or more column and is processed by a hashing algorithm that computes a value that correlates to a particular AMP. See NUPI, UPI. primary key A logical construct in database theory that uniquely identifies each row within a table. A primary key must be unique, non-null, non-changing, and cannot have duplicate values. privilege The right of a specified user to enter a specified RDBMS/SQL statement (such as CREATE, SELECT, GRANT, etc.) against a specified database, macro, stored procedure, table, user, or view. program A unit of software that performs a set of operations to satisfy the needs of users or other programs. A program consists of one or more modules. protocol A set of rules that govern the communication between two or more entities, such as processors, programs, or systems, including the formats of messages that flow among the entities.

R

RAID Redundant Array of Independent Disks. A technology using specialized disk controllers and software to distribute data across a set of multiple disk drives (arrays). The data is segmented across the different disks in the array, a method known as striping. RAID 1 An implementation of RAID technology in which striped data is stored as identical (mirrored) copies on separate disk media, so the data is both striped and mirrored.

Glossary ­ 8

Introduction to Teradata RDBMS

Glossary

RAID 5 An implementation of RAID technology used on Symbios disk arrays in which data and parity segments are stored by means of striping across multiple disks. Failures in any one component can be compensated for by reconstructing the data from the parity information and the surviving data. RAID5 data is not mirrored. RAID S An implementation of RAID technology used in EMC disk arrays, similar to RAID 5 on Symbios disk arrays. See RAID 5. RAM See random access memory. random access memory A quality of a memory device that writes or reads data from the memory through direct locating, rather than locating through references to other data in the memory. RDBMS Relational Database Management System. A database management system in which complex data structures are represented as simple, twodimensional tables consisting of columns and rows. recovery journal A journal that, at the loss of an AMP, automatically begins logging any changes to rows that reside on the down AMP. Logging continues until the down AMP is brought back on-line. The changes recorded on the recovery journal are applied to the recovered AMP and the recovery journal is discarded once the AMP is back online and fully recovered. Cf: permanent journal, transient journal. resource monitoring Performance monitoring features to help users gather performance data in real time that may help tune the system. Resource Usage (ResUsage) data Resource usage and performance data that is stored in database tables for subsequent analysis, which may help to identify trends and help tune the system. Resource Sampling Subsystem A PDE software component that provides the ability to gather resource statistics that may be presented to real-time resource monitoring or saved as ResUsage data or both. restart The process by which online operation of the Teradata RDBMS resumes after a system error, such as a hardware failure, a software protocol failure, or loss and restoration of AC power. restore A function provided by a host utility program that re-creates a database from archived dump tapes. Cf: dump. ResUsage data See Resource Usage. row hash The 32-bit output that results when an index value is processed by the hashing algorithm. Cf: row-id. row-id A combination of the row hash value plus a uniqueness value. Each stored row has a row-id as a prefix that differentiates each row in the table. Rows are logically maintained in row-id sequence. Cf: row hash.

Introduction to Teradata RDBMS

Glossary ­ 9

Glossary

S

SCSI See Small Computer Systems Interface. secondary index An alternate path to the data stored that improves performance by providing faster set selection and allows the user to avoid scanning the entire table during a query. It is like a primary index in that it allows the user to locate rows. Unlike the primary index, however, it has no influence on the way the rows are distributed among the AMPs.

Glossary ­ 10

Introduction to Teradata RDBMS

Index

Symbols " (quotation marks), punctuation in SQL statement 4­10 ( ) (left/right parentheses), punctuation in SQL statement 4­10 , (comma), punctuation in SQL statement 4­10 . (period), punctuation in SQL statement 4­10 : (colon), punctuation in SQL statement 4­10 ; (semicolon), punctuation in SQL statement 4­10 ' (apostrophe), punctuation in SQL statement 4­10 single sign on 10­7 acctid 9­4 Administration Workstation. See AWS ALTER TABLE statement 7­4 Alternate key 3­7 AMPs clusters 2­12, 7­4 described 2­4, 2­11 multiple requests 2­17 node failure behavior 2­6 step processing 2­17 steps 2­12 Application development application generators 6­1 CLI 6­9 embedded SQL 6­2 fourth generation languages 6­1 implicit 6­1 macros 6­4 Preprocessor2 6­3 TDSP 6­6 third party software 6­14 Application programming facilities 2­20 Archive/Recovery utility 2­22 ASE 10­8 ASF2 2­22 Attributes column 3­2, 3­3 columns, tables, and views 5­4 data type 4­3 functional dependence of 3­7, 3­9, 3­11 intermediate 3­7 join dependence of 3­8 macro and stored procedure 5­1 multivalued dependence of 3­7 mutual independence of 3­10 nonkey 3­9, 3­10 overlapping 3­11 redundant 3­6 space utilization 10­14 values 3­7 vproc 10­14 AWS 2­25

Numerics 2PC CICS and 8­12 data integrity and 8­12 described 8­12 IMS and 8­12 participant 8­12 2PL 8­2

A Access control acctid 9­4 client identifiers 9­3 GRANT LOGON statement 9­7 logon policies 9­3 parameters 2­24 password encryption 9­7 password security 9­6 REVOKE LOGON statement 9­7 single sign on 9­5 TDP security 9­4 user identifiers 9­3 Access Module Process. See AMPs Account String Expansion. See ASE Account usage account performance groups 10­8 gathering usage information 10­8 string expansion 10­8 Accounting logon parameters 10­7 session management 10­7

Introduction to Teradata RDBMS

Index ­1

Index

B Bandwidth 2­3 Basic Teradata Query. See BTEQ BCNF 3­11 Bidirectional broadcast 2­3 Boyce-Codd normal form. See BCNF BTEQ 1­9, 4­11 BYNET as interconnect 1­4, 2­4, 2­10 as processor 1­9, 1­10, 2­11, 2­14, 2­16, 2­24 defined 2­2 dual 7­8 dump freeze 2­9 functions of 2­3 virtual (boardless) BYNET 2­2, 2­4

D Data archiving 7­7 controlling access 9­7 restoring 7­7 Data blocks and OLAP functions 4­9 changing size of 2­7 maximum size 2­7 Data Control Language. See DCL Data Definition Language. See DDL Data Dictionary. See DD Data integrity 2PC and 8­12 ensuring 1­3 See also Referential integrity Data Manipulation Language. See DML Database Computer. See DBC Database Initialization Program. See DIP Database management hierarchical model 3­1 network model 3­1 object-oriented model 3­1 relational model 3­1 Database Query Manager. See DBQM Database utilities 2­23 Database Window. See DWS Databases creating 10­2 default 4­4 names 9­3 DBC 10­2 DBC.SysSec table defaults 9­7 DBQM blocking queries 2­24 ODBC 2­21 DBS database manager 2­11 views 5­3 dbscontrol 2­7 DBW and the console subsystem 2­26 console software 2­25 See also AWS Supervisor Subwindow 2­26 DCL 4­2, 4­5 DD administrator views 5­5 archiving 7­7 end user views 5­4 objects 5­1

C CA-ACF2 9­5 Candidate key 3­7 Capacities, maximum. See System capacities CA-TOP SECRET 9­5 CICS, 2PC and 8­12 Circular dependencies, elimination of 3­9 CLI handling SQL requests 2­18 processing embedded SQL 4­11 programming facilities 2­20 request handler 1­9 what is CLI? 6­10 Clients channel attached 1­5 network attached 1­5 SIA 1­4 software 1­5 Cliques 2­6 described 2­6 sharing vprocs 7­9 Clusters, AMP 2­12, 7­4 Columns 3­3 Communications management 2­18 Concurrency control 8­1 Configuration status 10­10 CREATE MACRO statement 6­4 CREATE PROCEDURE statement 6­6 CREATE TABLE statement 7­4 ctl 2­7 Cursors, described 4­12

Index ­2

Introduction to Teradata RDBMS

Index

Recovery control user views 5­5 structure 5­3 supervisory views 5­5 users 5­2 views 5­4 DDL 4­2, 4­5 Determinant 3­7 DIP 2­7, 5­3, 10­15 Directory cache 2­13 Disk arrays 2­5 Dispatcher element of PE 2­11 executing steps 2­14 DML 4­2, 4­5 Drive groups and LUNs 2­5 DROP MACRO statement 6­5 dual BYNETs 7­8 Dumps 2­9, 2­22, 2­23, 8­9 Dynamic data exchange 2­18

gncApply 2­12, 2­13, 2­14 GRANT LOGON statement restrictions 9­8 using 9­7 GUI. See PDE Tools utility

H Hash indexes 3­18 Hash value deleting 8­8 functionality 8­5 inserting 8­8 processing 2­14, 2­16, 3­16 selecting 8­8 updating 8­8 Host Utility. See HUT HUT lock characteristics 8­9 type of locks 8­9

E Encryption 9­7 EXECUTE statement 6­5 EXPLAIN statement 6­15 I IMS, 2PC and 8­12 Indexes AMP usage 2­16 creating primary 4­15 difference, primary key and primary index 3­17 hash 3­18 join 3­18 multi-table join 3­18 primary 3­16 processing 2­17 secondary 3­17 single-table join 3­18 types used by Teradata RDBMS 3­16 Internationalization feature 4­4 Inter-Node disk array connections 2­6

F Fallback 7­3 FastExport utility 2­23 FastLoad utility 2­23 Fault tolerance hardware 7­8 hardware and software 7­1 software 7­1, 7­2 Fifth normal form. See Normal form, 5NF File system. See Teradata file system. First normal form. See Normal form, 1NF Foreign keys 3­7, 3­13 Fourth normal form. See Normal form, 4NF Full functional dependence 3­7 Functional dependence 3­7

J Join indexes as access method 4­16 defined 4­15 functionality 3­18 restrictions 4­17

G Gateway Control utility 10­14 Global temporary tables, maximum number of 3­5

Introduction to Teradata RDBMS

Index ­3

Index

Joins defined 3­8 dependency 3­8 Journaling down AMP recovery 7­6 permanent 7­6 transaction 8­1 transient 7­6 types of 7­5

LUNs and drive groups 2­5 described 2­5

M Macros application development 6­4 creating 6­4 deleting 6­5 executing 6­5 modifying 6­5 Massively Parallel Processing. See MPP Maximum capacities databases 11­2 session 11­5 system 11­4 Micro 1­7 Micro Operating System Interface. See MOSI Monitoring 10­9 MOSI contents of 2­19 routines 2­19 MPP and PDE 2­7 distributed functions 1­4 system utility software 10­15 Multicast 2­3 MultiLoad utility 2­23 Multinode system 2­6 Multi-table join index 3­18 Multivalued dependence 3­7

K Kanji characters 4­4

L LAN migrating PEs 2­6 remote connections 2­25 Limitations. See System capacities Lock manager 8­5 Locks access 8­7 characteristics of HUT 8­9 concept 8­5 exclusive 8­7 HUT 8­9 levels of user 8­6 levels, automatic 8­8 lost updates 8­5 read 8­7 types of HUT 8­9 why required? 8­5 write 8­7 Logical units. See LUNs Logon parameters 10­7 policies 9­3 Lost update anomaly 8­6

N Network links 2­3 Node failure 2­6 Nonkey attributes 3­9 Non-Unique Primary Index. See NUPI Non-Unique Secondary Index. See NUSI Normal forms 1NF 3­9 2NF 3­9 3NF 3­10 4NF 3­11 5NF 3­11 BCNF 3­11 fifth 3­11 first 3­9

Index ­4

Introduction to Teradata RDBMS

Index

fourth 3­11 PJ/NF 3­11 projection-join 3­11 second 3­9 third 3­10 Normalization defined 3­6 layers of 3­6 NUPI 4­15 NUSI 4­15

O ODBC 2­21 OLAP extensions of SQL 4­9 features 4­9 sampling data blocks 4­9 Open Database Connectivity. See ODBC Optimizer, SQL 3­2 OTB backup for windows 2­22 NetBackup 2­22 NetVault 2­22

P Parallel Data Extensions. See PDE Parallel operating system services 2­9 Parallel steps, executing 2­16 Parse tree 2­13 Parser, element of PE 2­11 Parsing Engine. See PEs Passwords 9­5, 9­6, 10­7 PDE interface layer 2­7 services 2­9 PDE Tools utility 2­7 Pdisks 2­5 PE dispatcher element 2­11 optimizer element 2­11 parser element 2­11 parser processing example 2­15 session, control element 2­11 software components 2­11 type of vproc 2­10

Performance account performance groups 10­8 PM/API 10­13 Performance Monitor utility. See PM/API Performance Monitor/Application Programming Interface. See PM/API PEs channel attached 2­6 described 2­4 PL/I preprocessor 2­20 Plastic steps 2­13 PM/API features 10­13 monitoring performance 10­13 Point-to-point communication 2­3 Preprocessor2 4­11, 4­19 Primary indexes attributes 3­16 creating 4­15 requirements 3­17 Primary keys defined 3­7 requirements 3­17 used with referential integrity 3­13 Priority Scheduler 2­23 Privileges control 5­3 CREATE DATABASE statement 10­5 CREATE USER statement 10­6 databases 10­3 default databases 4­4 DROP PROCEDURE statement 10­6 GRANT LOGON statement 9­7 granting and revoking 9­7 REVOKE LOGON statement 9­7 tables, views, macros, and stored procedures 10­6 views 5­4 volatile temporary tables 3­4 Punctuation, SQL syntax statement 4­9

Q Queries, blocking using DBQM 2­24

R RACF 9­5

Introduction to Teradata RDBMS

Index ­5

Index

RAIDs RAID1 7­8 RAIDs Fault tolerance 7­8 RDBMS Gateway 2­28 Recovery defined 8­1 down AMP 8­11 RDBMS 8­10 single transaction 8­10 system and media 8­10 transaction 8­10 Redundant Array of Inexpensive Disks. See RAIDs Referential constraints 3­14, 3­15 Referential integrity described 3­12 referenced (parent) table 3­12 referencing (child) table 3­12 Relational algebra 3­2 Relational model 3­1 Relations cardinality 3­2 degree 3­2 REPLACE MACRO statement 6­5 REPLACE PROCEDURE statement 6­6 REPLACE VIEW statement 4­18 Requests cache 2­12 dispatcher 1­7 multiple 2­17 Resolver tree 2­13 Restarts, system 8­10 REVOKE LOGON statement 9­7 ROLLBACK statement 7­7 ROLLFORWARD statement 7­7 RowID 3­18 Rows cardinality 3­2 defined 3­3 hash value 2­14, 2­16, 3­16, 7­3, 8­5, 8­8 row positions 4­9 rowID 3­18

S Second normal form. See Normal form, 2NF Secondary indexes 3­17 Security 9­1 access 9­9 C2 9­7

policy 9­1, 9­9 Single sign on 9­1 TDP 9­4 Server, software 1­7 Session management account numbers 10­7 passwords 10­7 tdpid 10­7 usernames 10­7 Sessions controlling 2­11, 2­18 establishing 10­7 maximum capacities 11­5 Shared information architecture. See SIA SIA 1­4 Single sign on 9­1, 9­5, 10­6, 10­7 Single-table join index 3­18 SMP nodes 2­3 virtual (boardless) BYNET 1­4 Space allocation 10­2 SPL 6­6 SQL 1­5, 4­1 ANSI standard language 4­1 cursor 4­12 data definition statements 4­1 data manipulation statements 4­5 described 1­7 dispatcher 1­9, 2­13, 2­14 embedded SQL 2­20 FETCH INTO statement 4­12 generator 2­13 INSERT statement 4­6 optimizer 3­2 parser 1­7, 1­9 preprocessor 4­11 request packaging 2­18 SELECT statement 4­2 statement flow 1­9 syntaxer 1­7, 2­13 UPDATE statement 4­6 Statements ALTER 4­5 ALTER TABLE 7­4 CREATE 4­5 CREATE DATABASE 10­4, 10­5 CREATE MACRO 6­4 CREATE PROCEDURE 6­7 CREATE TABLE 7­4 CREATE TRIGGER 4­17 CREATE USER 10­6 DROP 4­5

Index ­6

Introduction to Teradata RDBMS

Index

DROP MACRO 6­5 DROP PROCEDURE 10­6 EXECUTE 6­5 EXPLAIN 6­15 GRANT LOGON 9­7 punctuation 4­9 RENAME 4­5 REPLACE 4­5 REPLACE MACRO 6­5 REPLACE VIEW 4­18 REVOKE LOGON 9­7 ROLLBACK 7­7 ROLLFORWARD 7­7 Statistics 4­5 Steps AMP 2­12 concrete 2­13 parallel execution 2­16 sending to AMPs 2­16 Stored procedure language defined 6­6 Stored Procedure Language. See TDSP Structured Query Language. See SQL Supervisor Subwindow. See DBW Supervisor window 2­7 Symmetric Multi-Processing. See SMP System capacities, maximums 11­4 System console 2­25 System Management Facility 10­13 System status 10­10

T Table Rebuild utility 7­7, 7­10 Tables and set theory 3­3 child 3­13 creating index 3­19 maximum number of temporary 3­5 parent 3­13 primary indexes for 3­16 rebuilding 7­7 types of 3­4 TDP functions 2­18 managing data communications 1­6 MTDP 2­19 tracking elapsed time 10­13 TDP User Transaction Monitor. See TDPTMON TDPLGUX logon exit 9­4

TDPTMON 10­13 TDSP as SQL application 6­6 as SQL tool 6­1 attributes 5­4 controlling data access using 9­7 converting 2­13 CREATE statement 4­5 creating 6­6 DROP statement 4­5 privileges 5­4, 10­6 removing inconsistencies in TDSP tables 7­7 RENAME statement 4­5 REPLACE statement 4­5 space allocation 10­2 Temporary tables global 3­4, 3­5 volatile 3­4, 3­5 Teradata Director Program. See TDP Teradata file system AMP function 2­12 disk management 2­12 maximum data block size 2­7 Teradata Manager 2­24, 10­10 Teradata RDBMS 2PC 8­12 client utilities 2­22 database maximum capacities 11­2 database software 1­5 database utilities 2­24 DBQM 2­24 design 1­3 parallel processing 1­3 running as tpa 1­4 security 9­1 session maximum capacities 11­5 single data store 1­3 system maximum capacities 11­4 Teradata Manager 2­24, 10­10 third-party software and 1­10 Teradata SQL. See SQL Third normal form. See Normal form, 3NF tpa, described 2­9 Tpid 10­7 Transactions concept 8­2 explicit 8­3 implicit 8­3 serializability 8­2 Transitive dependence 3­7 Transparency Series/Application Program Interface. See TS/API

Introduction to Teradata RDBMS

Index ­7

Index

Triggers additional resources 4­19 ALTER statement 4­5 CREATE statement 4­5 CREATE TRIGGER statement 4­17 defined 4­17 defining 4­17 DROP statement 4­5 functionality 4­17 RENAME statement 4­5 REPLACE statement 4­5 restrictions on using 4­17 triggering events 4­17 with SQL statements 4­17 Trusted Parallel Application. See tpa TS/API 6­14 Tuples 3­3 Two Phase Commit protocol. See 2PC Two-Phase Locking Protocol. See 2PL

end user 5­4 Recovery Control User 5­5 supervisory 5­5 why use 5­3 Virtual disks. See Vdisks Volatile tables, maximum number of 3­5, 11­3 Vproc Manager 2­7, 2­23 Vprocs AMP vproc 2­11 AMPs 2­4 cliques 7­9 functionality 2­2, 2­10 maximum supported 2­10 migration of 2­6 PEs 2­4 types of 2­4 Vproc Manager 2­7, 2­23

W U Unicode character set UTF-8 1­5 Unique Primary Index. See UPI Unique Secondary Index. See USI UPI 3­14, 4­15 Users initial 10­2 usernames 10­7 USI 3­14, 4­15 UTF-8 1­5 Utilities Archive/Recovery 2­22 controlling and monitoring 2­26 database 2­23 FastExport 2­23 FastLoad 2­23 Gateway Control 10­14 Multiload 2­23 WinCLI 2­18, 2­20, 2­21 Workstations, types of 2­25

V Vdisks 2­5 Views 5­3 administrative 5­5 altering 4­18 DD 5­4 defined 5­3

Index ­8

Introduction to Teradata RDBMS

Information

Introduction to Teradata RDBMS

204 pages

Report File (DMCA)

Our content is added by our users. We aim to remove reported files within 1 working day. Please use this link to notify us:

Report this file as copyright or inappropriate

7104