Starburst Teradata connectors#
The Teradata connectors allow querying and creating tables in external Teradata databases. There are two connectors available:
The Starburst Teradata connector is a standard connector with a rich feature set, which is easy to install.
The Starburst Teradata Direct connector has the same feature set as the Starburst Teradata connector. It combines the connector with a table operator installed on the Teradata servers. This makes it more complex to install and update, but can produce higher query performance.
The Starburst Teradata connector is a feature rich, easy to install connector.
Starburst Teradata connector#
The Starburst Teradata connector is a feature rich, easy to install connector.
Requirements#
To connect to Teradata, you need:
Teradata 16 or higher.
Network access from the coordinator and workers to Teradata.
Teradata’s JDBC driver, downloaded from Teradata.
A valid Starburst Enterprise license.
Configuration#
To configure the Teradata connector as a teradatadb
catalog, create a file
named teradatadb.properties
in etc/catalog
. Use the connector name
teradata
:
connector.name=teradata
connection-url=jdbc:teradata://HOST
connection-user=USERNAME
connection-password=PASSWORD
Note
As new user, consider setting the Teradata system default to the ANSI session mode. For more information, read Transaction Semantics Differences in ANSI and Teradata Session Modes.
More information about the supported JDBC URL format and parameters of the Teradata JDBC driver is available in the Teradata documentation.
Installation#
To install the Teradata connector, use the following directions:
Obtain the Teradata JDBC driver files (
terajdbc4.jar
andtdgssconfig.jar
) from the Teradata website.Add the Teradata JDBC JAR files to the SEP
plugin/teradata
directory.Add SEP Teradata catalog properties file (such as
teradata.properties
for a SEP catalog namedteradata
)Perform the above steps on every SEP cluster node.
Restart SEP on every node.
Parallelism#
SEP is reading from Teradata using multiple parallel JDBC connections. By default, the number of parallel reads is determined automatically based on number of AMPs available in Teradata. The parallel reads can be configured by using the following properties:
Property name |
Description |
Default |
---|---|---|
|
Determines the parallelism method. Possible values are:
|
|
|
Number of parallel connections, to be used when
|
Number of AMP in Teradata Database |
Mapping data types between SEP and Teradata#
SEP and Teradata each support different data types for table columns and use different names for some of them. Some data types are not supported equally by both systems. The following tables display the mapping used by SEP when working with existing columns, and when creating tables in Teradata.
Mappings for working with existing fields in Teradata:
Teradata data type |
SEP data type |
---|---|
BYTEINT |
TINYINT |
SMALLINT |
SMALLINT |
INTEGER |
INTEGER |
BIGINT |
BIGINT |
REAL |
REAL |
FLOAT |
REAL |
DOUBLE |
DOUBLE |
NUMERIC |
DECIMAL |
DECIMAL |
DECIMAL |
CHAR |
CHAR |
NCHAR |
CHAR |
VARCHAR |
VARCHAR |
NVARCHAR |
VARCHAR |
LONGVARCHAR |
VARCHAR |
LONGNVARCHAR |
VARCHAR |
DATE |
DATE |
TIMESTAMP |
TIMESTAMP |
TIMESTAMP WITH TIME ZONE |
not supported |
JSON |
not supported |
XML |
not supported |
Any Teradata data types not listed above, such as JSON or XML and others, are not supported.
If a Teradata column uses a type not listed in the above table, any query
accessing such a field fails, unless unsupported-type.handling-strategy
is
set to CONVERT_TO_VARCHAR
as detailed in
Mapping configuration.
Mappings for creating tables with specific fields in Teradata:
SEP data type |
Teradata data type |
---|---|
TINYINT |
BYTEINT |
SMALLINT |
SMALLINT |
INTEGER |
INTEGER |
BIGINT |
BIGINT |
REAL |
REAL |
DOUBLE |
DOUBLE PRECISION |
DATE |
DATE |
TIMESTAMP |
TIMESTAMP |
CHAR |
CHAR |
VARCHAR |
VARCHAR |
The mapping above applies to creating a table with CREATE TABLE
and CREATE
TABLE ... AS [SELECT]
statements.
Any SEP data type not listed in the above table can not be created as a column in a new Teradata table.
General configuration properties#
The following properties can be used to configure how data types from the connected data source are mapped to Trino data types and how the metadata is cached in Trino.
Property name |
Description |
Default value |
---|---|---|
|
Configure how unsupported column data types are handled:
The respective catalog session property is |
|
|
Allow forced mapping of comma separated lists of data types to convert to
unbounded |
|
|
Support case insensitive database and collection names |
False |
|
1 minute |
|
|
Duration for which metadata, including table and column statistics, is cached |
0 (disabled caching) |
|
Cache the fact that metadata, including table and column statistics, is not available |
False |
Mapping configuration#
Property name |
Session property name |
Description |
Default |
---|---|---|---|
|
|
Default SEP |
|
|
|
Rounding mode for the Teradata
|
|
Lower case of Teradata NOT CASESPECIFIC
char types#
SEP does not support case insensitive comparison of values with char-based type. However, it is possible to force the connector to convert values to lower case for these comparisons. You can activate this behavior with a catalog property:
teradata.type.not-case-specific.to-lower-case=true
Alternatively you can use a catalog session property:
SET SESSION teradata.not_casespecific_to_lower_case = true;
Pass-through queries#
The Teradata connector allows you to embed any valid Teradata query. You need to enable this feature in the catalog properties file with the following configuration:
teradata.query-pass-through.enabled=true
The Teradata query string you want to pass through has to be base32-encoded to avoid having to deal with escaping quotes and case sensitivity issues in table identifiers.
For example, you can encode the following simple Teradata query. It uses the
TOP
statement, which is not supported in ANSI SQL and SEP:
SELECT TOP 1 * FROM nation ORDER BY nationkey;
Copy the statement into a file query.txt
, and encode the text on the
command line with base32
:
$ base32 query.txt
KNCUYRKDKQQFIT2QEAYSAKRAIZJE6TJANZQXI2LPNYQE6USEIVJCAQSZEBXGC5DJN5XGWZLZHMFA
The command is typically available in a coreutils
package.
Now it can be used in a query. The following example returns all resulting rows
(*
) from the query that runs in the any
schema in the td
catalog:
SELECT *
FROM td.any."$query:KNCUYRKDKQQFIT2QEAYSAKRAIZJE6TJANZQXI2LPNYQE6USEIVJCAQSZEBXGC5DJN5XGWZLZHMFA"
CREATE TABLE WITH
syntax#
The Teradata connector supports CREATE TABLE [ WITH ( property_name =
expression [, ...] ) ]
trino syntax.
Property name |
Description |
Default |
---|---|---|
|
This allows to configure primary index for the table.
The value is the array of columns used for the index.
NOTE:
If provided array is empty the Teradata behaviour is equal to
|
|
|
Specifies if created primary index is unique. |
|
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Table statistics#
The Teradata connector supports table and column statistics to improve query processing performance based on the actual data in the data source.
The statistics are collected by Teradata and retrieved by the connector.
To collect statistics for a table, execute the following statements in Teradata. The second statement needs to be repeated for every column you want to collect statistics for.
COLLECT SUMMARY STATISTICS ON table_schema.table_name;
COLLECT STATISTICS COLUMN(column_name) ON table_schema.table_name;
Pushdown#
The connector supports pushdown for a number of operations:
Aggregate pushdown for the following functions:
Additionally, pushdown is only supported for DOUBLE
type columns with the
following functions:
Additionally pushdown is only supported for REAL
or DOUBLE
type column
with the following functions:
Caution
If the transaction mode
connection parameter is set to TERA
, the connector doesn’t
pushdown filters on VARCHAR
columns or joins
between tables that have filters on VARCHAR
columns.
To support pushdown of these operations, you can set the connector’s
transaction mode to ANSI
by appending /TMODE=ANSI
to the
connection-url
parameter in your catalog properties file:
connection-url=jdbc:teradata://HOST/TMODE=ANSI
Dynamic filtering#
Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.
You can disable dynamic filtering by setting the property
dynamic-filtering.enabled
in your catalog properties file to false
.
JDBC connection pooling#
You can improve performance by enabling JDBC connection pooling, which is disabled by default.
Starburst Cached Views#
The connectors supports table scan redirection to improve performance and reduce load on the data source.
Lock manager#
The Lock Manager imposes concurrency control on Teradata Database by managing the locks on the database objects being accessed by each transaction and releasing those locks when the transaction either commits or rolls back its work. This control ensures that the data remains consistent for all users. However for analytic queries default Teradata Lock Manager could be too restrictive.
Property name |
Description |
Default |
---|---|---|
|
|
|
Security#
The connector includes a number of security-related features, detailed in the following sections.
User impersonation#
Teradata connector supports user impersonation.
User impersonation can be enabled in the catalog file:
teradata.impersonation.enabled=true
User impersonation in Teradata connector is based on GRANT CONNECT THROUGH
.
For more details, search for GRANT CONNECT THROUGH
on the Teradata
documentation.
Note
Option WITH TRUST_ONLY
in GRANT CONNECT THROUGH
is not yet supported.
Kerberos authentication#
The connector supports Kerberos-based authentication with the following configuration:
teradata.authentication.type=KERBEROS
kerberos.client.principal=example@example.com
kerberos.client.keytab=etc/kerberos/example.keytab
kerberos.config=etc/kerberos/krb5.conf
With this configuration the user example@example.com
, defined in the
principal property, is used to connect to the database, and the related Kerberos
service ticket is located in the example.keytab
file. The Kerberos
configuration specified with kerberos.config
is used.
Kerberos credential pass-through#
The connector can be configured to pass through Kerberos credentials, received by SEP, to the Teradata database.
Configure Kerberos and SEP, following the instructions in Kerberos credential pass-through.
Then configure the connector to pass through the credentials from the server to the database in your catalog properties file and ensure the Kerberos client configuration properties are in place:
teradata.authentication.type=KERBEROS_PASS_THROUGH
http.authentication.krb5.config=/etc/krb5.conf
http-server.authentication.krb5.service-name=exampleServiceName
http-server.authentication.krb5.keytab=/path/to/Keytab/File
Now any database access via SEP is subject to the data access restrictions and permissions of the user supplied via Kerberos.
Password credential pass-through#
The connector supports password credential pass-through. To enable it, edit the catalog properties file to include the authentication type:
teradata.authentication.type=PASSWORD_PASS_THROUGH
For more information about configurations and limitations, see Password credential pass-through.
Audit#
The connector utilizes the QUERY_ID
property for Teradata’s Query Band to
annotate any query from SEP in Teradata with the SEP query identifier. If
teradata.query-band.query-id.enabled=true
each Teradata query has
QUERY_ID=<query_id>
with the current identifier added. This allows you to
determine which queries originated from SEP, and specifically also details
such as user submitting the query, for audit purposes.
Property name |
Description |
Default |
---|---|---|
|
Add a SEP query identifier to the Teradata QUERY_BAND |
|
|
Name of a SEP property for query identifier in the Teradata QUERY_BAND |
|
SQL support#
The connector provides read and write access to data and metadata in Teradata. In addition to the globally available and read operation statements, the connector supports the following features:
SQL DELETE#
If a WHERE
clause is specified, the DELETE
operation only works if the
predicate in the clause can be fully pushed down to the data source.
Starburst Teradata Direct connector#
The Teradata Direct connector is an alternative connector for Teradata databases.
Compared to the Starburst Teradata connector, it requires more advanced
configuration and is capable of achieving higher performance. It supports all
use cases of the Teradata connector, and adds greatly improved performance for
SELECT
queries.
The connector is composed of components running on SEP, and others running on Teradata. All components participate in every transmission. A transmission includes all data transfer related to a single table scan.
The connector uses JDBC as a control channel to schedule the query in Teradata.
Any SELECT
statement run with JDBC is passed to a Teradata table operator,
which then uses the custom transmitters to pass the data over HTTP to the SEP
controller and receivers:
Starburst Enterprise
- Controller
The controller starts and manages all transmissions. It runs on the SEP coordinator and distributes work to the receivers, and provides status information to the transmitters.
- Receivers
The receivers connect to the transmitters and receive data from Teradata from the transmitters. They run on the SEP workers and are managed by the controller.
Teradata
- Table operator and JDBC driver
The table operator and JDBC driver read the data from the Teradata storage.
- Transmitters
The transmitters send the data for each table operator invocation on Teradata to the receivers on SEP.
The following features from the Teradata connector are also supported by the Teradata Direct connector:
SEP controller and receiver installation#
The Teradata Direct connector needs to be configured in a catalog
properties file. e.g. myteradatadb.properties
, using the connector name
teradata-direct
.
connector.name=teradata-direct
connection-url=jdbc:teradata://HOST
connection-user=USERNAME
connection-password=PASSWORD
teradata-direct.table-operator.name=some_schema.table_operator
teradata-direct.http.port=9000
teradata-direct.http.host=hostname.accessible.from.teradata.instance
Besides the connector name, you need to provide the JDBC connection parameters
and the relevant JAR files in the teradata
plugin directory, as discussed in
Starburst Teradata connector.
Additionally you need to install the native table operator on Teradata in the queried schema and configure the parameters for the connector specifying the table operator name, and the HTTP port and host of the SEP coordinator node.
Property name |
Description |
---|---|
|
Native Table Operator function name. |
|
Sets the receiver port the same on each node. If this is not specified, ports are set randomly. |
|
We strongly recommend that you set this to |
Additional configuration properties#
Further configuration can be done with the following parameters:
teradata-direct.receiver.buffer.size
in the catalog properties
file, or receiver_buffer_size
in catalog session properties:
Determines the buffer size per each SEP worker. This buffer is available per table scan, so a single query joining three tables uses three buffers. Default value is 32MB.
teradata-direct.parcel.size
in catalog.properties
, or
parcel_size
in catalog session properties:
The size of the data packets sent to the SEP receivers. The last packet may be smaller. Defaults to 2MB. Parcel size should not be higher than the buffer size.
teradata-direct.compression.enabled
in catalog.propertes
, or
compression_enabled
in catalog session properties:
Determines if transmission data is compressed. Defaults to
false
.
teradata-direct.splits-per-worker
in catalog.propertes
, or
splits_per_worker
in catalog session properties:
Determines how many splits each SEP worker receives. This determines the maximum concurrency level on SEP. Defaults to 16, identical to the
task.concurrency
of the SEP server.
teradata-direct.table-operator.logging-path
in the catalog properties file:
Set the path for log files of the native table operator on the Teradata nodes. Default behavior is to log to files in the the
/tmp
folder.
Aside from Teradata specific options, it is recommended to use JDBC connection configuration, configured in the catalog properties file, to cache the Teradata metadata on SEP and improve query performance.
metadata.cache-ttl=10m
metadata.cache-missing=true
Configuring in Kubernetes deployments#
The Starburst Teradata Direct connector is supported for Kubernetes deployments in AWS EKS and in Azure AKS meeting certain requirements.
Warning
The configuration to use the connector on Kubernetes is complex and requires significant networking and Kubernetes expertise. Contact Starburst Support for assistance and help.
Encryption#
The connector can encrypt all data transmissions and use HTTPS. Add the details for the Java keystore file to use in the SEP server config.
teradata-direct.https.keystore.path=mykekstore.jks
teradata-direct.https.keystore.password=123456
SEP automatically distributes the keystore to the controller, the receivers, the transmitters and the table operator.
Native table operator#
Install the native table operator with the following steps:
To obtain access to the Teradata native table operator file, contact Starburst Support.
Copy the provided file onto the Teradata servers into a known location
Allow Teradata processes to access the file by either adding read permission with a command such as
chmod +r <.so file>
, or allowing access by thetdtrusted
group withchown :tdtrusted <.so file>; chmod g+r <.so file>
.Create the table operator function in the desired schema. We recommend a dedicated database for the current operator version.
CREATE FUNCTION some_database.table_operator() RETURNS TABLE VARYING USING FUNCTION starburst_teradata_contract LANGUAGE C NO SQL PARAMETER STYLE SQLTABLE EXTERNAL NAME 'SP!/opt/starburst/libstarburst_teradata.so!F!starburst_teradata';
Warning
Do not install the native table operator in the syslib
database.
No Teradata restart is needed for the changes to take effect.
The path and access for the .so
file, for example /opt/starburst
, has to be adjusted
to a suitable location on your Teradata servers.
Each user that connects to Teradata from SEP needs to be granted sufficient access:
GRANT EXECUTE ON SPECIFIC FUNCTION some_database.table_operator TO your_user;
Alternatively access to table operator UDF can be granted via ROLE
CREATE ROLE table_operator_users;
GRANT EXECUTE ON SPECIFIC FUNCTION some_database.table_operator
TO table_operator_users;
GRANT table_operator_users TO your_user;
This also requires to set the role for any impersonated user, this could be achieved with the below in the catalog file:
teradata.impersonation-role=table_operator_users
Updates and removal
You need to update the native table operator with each version change of SEP. It is important to remove the function before any updates:
DROP FUNCTION some_database.table_operator;
Additional recommendations for successful upgrades:
Drop the database used for the function
Remove all old
.so
files from the Teradata serversUse a different filename for the
.so
file for the new versionUse a new exclusive database for the function with the new version
Restart the Teradata servers, if possible