Starburst Teradata connectors#

The Teradata connectors allow querying and creating tables in external Teradata databases. There are two connectors available:

The Starburst Teradata connector is a standard connector with a rich feature set, which is easy to install.

The Starburst Teradata Direct connector has the same feature set as the Starburst Teradata connector. It combines the connector with a table operator installed on the Teradata servers. This makes it more complex to install and update, but can produce higher query performance.

Note

SEP release 477-e includes copies of the connectors for compatibility with previous behavior. They are available under the names teradata_direct_legacy and teradata_legacy. To use them, update the connector.name property to use one of these values. These connectors are intended only as a temporary fallback and will be removed in the next LTS release.

Starburst Teradata connector#

The Starburst Teradata connector is a feature-rich, easy to install connector.

Requirements#

To connect to Teradata, you need:

Teradata 17 or higher.
Network access from the coordinator and workers to Teradata.
Teradata’s JDBC driver, downloaded from Teradata.
A valid Starburst Enterprise license.

Configuration#

To configure the Teradata connector as example catalog, create a file named example.properties in etc/catalog (replace example with your database name or some other descriptive name of the catalog) with the following contents:

connector.name=teradata
connection-url=jdbc:teradata://HOST
connection-user=USERNAME
connection-password=PASSWORD

More information about the supported JDBC URL format and parameters of the Teradata JDBC driver is available in the Teradata documentation.

General configuration properties#

The following table describes general catalog configuration properties for the connector:

Property name	Description
`case-insensitive-name-matching`	Support case insensitive schema and table names. Defaults to `false`.
`case-insensitive-name-matching.cache-ttl`	Duration for which case insensitive schema and table names are cached. Defaults to `1m`.
`case-insensitive-name-matching.config-file`	Path to a name mapping configuration file in JSON format that allows Trino to disambiguate between schemas and tables with similar names in different cases. Defaults to `null`.
`case-insensitive-name-matching.config-file.refresh-period`	Frequency with which Trino checks the name matching configuration file for changes. The duration value defaults to `0s` (refresh disabled).
`metadata.cache-ttl`	Duration for which metadata, including table and column statistics, is cached. Defaults to `0s` (caching disabled).
`metadata.cache-missing`	Cache the fact that metadata, including table and column statistics, is not available. Defaults to `false`.
`metadata.schemas.cache-ttl`	Duration for which schema metadata is cached. Defaults to the value of `metadata.cache-ttl`.
`metadata.tables.cache-ttl`	Duration for which table metadata is cached. Defaults to the value of `metadata.cache-ttl`.
`metadata.statistics.cache-ttl`	Duration for which tables statistics are cached. Defaults to the value of `metadata.cache-ttl`.
`metadata.cache-maximum-size`	Maximum number of objects stored in the metadata cache. Defaults to `10000`.
`sql-based-metadata.enabled`	Controls how the connector retrieves metadata. When `true`, the connector uses `SELECT ` queries for faster performance. When `false`, it uses the standard JDBC metadata API, which avoids `SELECT ` but can be slower. Defaults to `true`.
`write.batch-size`	Maximum number of statements in a batched execution. Do not change this setting from the default. Non-default values may negatively impact performance. Defaults to `1000`.
`dynamic-filtering.enabled`	Push down dynamic filters into JDBC queries. Defaults to `true`.
`dynamic-filtering.wait-timeout`	Maximum duration for which Trino waits for dynamic filters to be collected from the build side of joins before starting a JDBC query. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries. Defaults to `20s`.

Transaction mode#

As a new user, consider setting the Teradata system default to the ANSI session mode. For more information, read Transaction Semantics Differences in ANSI and Teradata Session Modes.

Unless the Teradata system default is changed, the connector defaults to the TERA transaction mode. To set the connector’s transaction mode to ANSI without changing the mode system-wide on Teradata, append /TMODE=ANSI to the connection-url parameter in your catalog properties file:

connection-url=jdbc:teradata://HOST/TMODE=ANSI

Installation#

To install the Teradata connector, use the following directions:

Obtain the Teradata JDBC driver file (terajdbc4.jar) from the Teradata website.
Add the Teradata JDBC JAR file to the SEP plugin/teradata directory.
Add SEP Teradata catalog properties file (such as example.properties for a SEP catalog named example).
Perform the above steps on every SEP cluster node.
Restart SEP on every node.

Parallelism#

SEP can read from Teradata tables using multiple parallel JDBC connections. By default, parallel reads are disabled. To enable parallel reads, set the following catalog configuration property to a value greater than 1:

Teradata parallelism configuration properties#
Property name	Description	Default
`teradata.connections-count`	Number of parallel JDBC connections to use when reading from Teradata. The corresponding catalog session property is `connections_count`.	1

Note

Parallelism is not supported for views in the Teradata JDBC connector. The Teradata Direct connector supports parallelism for both tables and views.

Type mapping#

Because Trino and Teradata each support types that the other does not, this connector modifies some types when reading or writing data. Data types may not map the same way in both directions between Trino and the data source. Refer to the following sections for type mapping in each direction.

Teradata to Trino type mapping#

The connector maps Teradata types to the corresponding Trino types according to the following table:

Teradata to Trino type mapping#
Teradata type	Trino type
BYTEINT	TINYINT
SMALLINT	SMALLINT
INTEGER	INTEGER
BIGINT	BIGINT
REAL	REAL
FLOAT	DOUBLE
DOUBLE	DOUBLE
NUMERIC	DECIMAL
DECIMAL	DECIMAL
CHAR	CHAR
NCHAR	CHAR
VARCHAR	VARCHAR
NVARCHAR	VARCHAR
LONGVARCHAR	VARCHAR
DATE	DATE
TIMESTAMP	TIMESTAMP

No other types are supported.

Trino to Teradata type mapping#

The connector maps Trino types to the corresponding Teradata types according to the following table:

Trino to Teradata type mapping#
Trino type	Teradata type
TINYINT	BYTEINT
SMALLINT	SMALLINT
INTEGER	INTEGER
BIGINT	BIGINT
REAL	REAL
DOUBLE	DOUBLE PRECISION
DATE	DATE
TIMESTAMP	TIMESTAMP
CHAR	CHAR
VARCHAR	VARCHAR

No other types are supported.

The mapping above applies to creating a table with CREATE TABLE and CREATE TABLE ... AS [SELECT] statements.

Any Trino data type not listed in the above table cannot be created as a column in a new Teradata table.

Type mapping configuration properties#

The following properties can be used to configure how data types from the connected data source are mapped to Trino data types and how the metadata is cached in Trino.

Property name

Description

Default value

unsupported-type-handling

Configure how unsupported column data types are handled:

IGNORE, column is not accessible.
CONVERT_TO_VARCHAR, column is converted to unbounded VARCHAR.

The respective catalog session property is unsupported_type_handling.

IGNORE

jdbc-types-mapped-to-varchar

Allow forced mapping of comma separated lists of data types to convert to unbounded VARCHAR

Teradata type mapping configuration#

The following additional properties can be used to configure the type mapping.

Mapping configuration properties#
Property name	Session property name	Description	Default
`teradata.number.default-scale`	`number_default_scale`	Default SEP `DECIMAL` scale for Teradata `NUMBER` (without precision and scale) date type. When not set then such column is treated as not supported.	`0`
`teradata.number.rounding-mode`	`number_rounding_mode`	Rounding mode for the Teradata `NUMBER` data type. Possible values are: `UNNECESSARY` - Rounding mode to assert that the requested operation has an exact result, hence no rounding is necessary. `CEILING` - Rounding mode to round towards positive infinity. `FLOOR` - Rounding mode to round towards negative infinity. `HALF_DOWN` - Rounding mode to round towards `nearest neighbor` unless both neighbors are equidistant, in which case rounding down is used. `HALF_EVEN` - Rounding mode to round towards the `nearest neighbor` unless both neighbors are equidistant, in which case rounding towards the nearest neighbor is performed. `HALF_UP` - Rounding mode to round towards `nearest neighbor` unless both neighbors are equidistant, in which case rounding up is used. `UP` - Rounding mode to round towards zero. `DOWN` - Rounding mode to round towards zero.	`UNNECESSARY`
`teradata.force-integer-dateformat.enabled`		If enabled, the connector reads date values using the DateForm setting configured on the Teradata database, which typically defaults to `IntergerDate`. The connector does not set the `DateForm` session property itself.	`false`

Lower case of Teradata `NOT CASESPECIFIC` char types#

SEP does not support case insensitive comparison of values with char-based type. However, it is possible to force the connector to convert values to lower case for these comparisons. You can activate this behavior with a catalog property:

teradata.type.not-case-specific.to-lower-case=true

Alternatively, you can use a catalog session property:

SET SESSION example.not_casespecific_to_lower_case = true;

`CREATE TABLE WITH` syntax#

The Teradata connector supports CREATE TABLE [ WITH ( property_name = expression [, ...] ) ] trino syntax.

Create table properties#
Property name	Description
`primary_index`	This allows to configure primary index for the table. The value is the array of columns used for the index. NOTE: If provided array is empty the Teradata behaviour is equal to `NO PRIMARY INDEX` and default index is not created. Defaults to `ARRAY[]`.
`primary_index_unique`	Specifies if created primary index is unique. Defaults to `false`.

Table functions#

The connector provides specific table functions to access Teradata.

`query(VARCHAR) -> table`#

The query function allows you to query the underlying database directly. It requires syntax native to the data source, because the full query is pushed down and processed in the data source. This can be useful for accessing native features or for improving query performance in situations where running a query natively may be faster.

The query table function is available in the system schema of any catalog that uses the Teradata connector, such as example. The following example passes myQuery to the data source. myQuery has to be a valid query for the data source, and is required to return a table as a result:

SELECT
  *
FROM
  TABLE(
    example.system.query(
      query => 'myQuery'
    )
  );

Performance#

The connector includes a number of performance improvements, detailed in the following sections.

Table statistics#

The Teradata connector can use table and column statistics for cost based optimizations, to improve query processing performance based on the actual data in the data source.

The statistics are collected by Teradata and retrieved by the connector.

To collect statistics for a table, execute the following statements in Teradata. The second statement needs to be repeated for every column you want to collect statistics for.

COLLECT SUMMARY STATISTICS ON table_schema.table_name;
COLLECT STATISTICS COLUMN(column_name) ON table_schema.table_name;

Managed statistics#

The connector supports Managed statistics allowing SEP to collect and store its own table and column statistics that can then be used for performance optimizations in query planning.

Statistics must be collected manually using the built-in collect_statistics command, see collect_statistics for details and examples.

Note

If managed statistics is enabled, automatic internal TLS cannot secure communication between the Teradata Direct connector and the cluster. To use the feature, you must add the http-server.https.port configuration property to your teradata_direct catalog configuration and set it to an available HTTPS port. Additionally, ensure that the entire cluster uses an identical Java keystore configuration.

Pushdown#

The connector supports pushdown for a number of operations:

Aggregate pushdown for the following functions:

avg()
count(), also count(distinct x)
max()
min()
sum()

Additionally, pushdown is only supported for DOUBLE type columns with the following functions:

Additionally pushdown is only supported for REAL or DOUBLE type column with the following functions:

Projection pushdown for the following functions:

cast()
upper() and lower()
trim(), ltrim() and rtrim()
substring()

Cost-based join pushdown#

The connector supports cost-based Join pushdown to make intelligent decisions about whether to push down a join operation to the data source.

When cost-based join pushdown is enabled, the connector only pushes down join operations if the available Table statistics suggest that doing so improves performance. Note that if no table statistics are available, join operation pushdown does not occur to avoid a potential decrease in query performance.

The following table describes catalog configuration properties for join pushdown:

Property name	Description	Default value
`join-pushdown.enabled`	Enable join pushdown. Equivalent catalog session property is `join_pushdown_enabled`.	`true`
`join-pushdown.strategy`	Strategy used to evaluate whether join operations are pushed down. Set to `AUTOMATIC` to enable cost-based join pushdown, or `EAGER` to push down joins whenever possible. Note that `EAGER` can push down joins even when table statistics are unavailable, which may result in degraded query performance. Because of this, `EAGER` is only recommended for testing and troubleshooting purposes.	`AUTOMATIC`

Warning

The Teradata connector only pushes down join operations where the join condition uses columns of type CHAR or VARCHAR with = and <> operators.

Predicate pushdown support#

The connector’s support of predicate pushdown is partially dependent on the configured transaction mode. You can modify the connector’s transaction mode in the catalog configuration file.

The predicate pushdown supports statements with the following arithmetic, boolean, comparison operators, and functions:

+
-
/
*
%
OR
NOT
LIKE
LIKE with ESCAPE character
IS NULL
IS NOT NULL
NULLIF
IN
=
<>

If the transaction mode is set to TERA, the support for pushdown of any predicates on columns of character string types like CHAR or VARCHAR is limited.

If the transaction mode is set to ANSI, the connector does not support pushdown of inequality predicates such as !=, or range predicates such as > and BETWEEN, on columns with character string types. Equality predicates, such as IN or =, on columns with character string types are pushed down. This ensures correctness of results since the remote data source may sort strings differently than SEP.

In the following example, the predicate of the first query is not pushed down since name is a column of type VARCHAR, and > is an inequality predicate. The second query is pushed down.

SELECT * FROM nation WHERE name > 'CANADA';
SELECT * FROM nation WHERE name = 'CANADA';

Dynamic filtering#

Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.

You can disable dynamic filtering by setting the dynamic-filtering.enabled property in your catalog configuration file to false.

Wait timeout#

By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.

You can configure the dynamic-filtering.wait-timeout property in your catalog properties file:

dynamic-filtering.wait-timeout=1m

You can use the dynamic_filtering_wait_timeout catalog session property in a specific session:

SET SESSION example.dynamic_filtering_wait_timeout = 1s;

Compaction#

The maximum size of dynamic filter predicate, that is pushed down to the connector during table scan for a column, is configured using the domain-compaction-threshold property in the catalog properties file:

domain-compaction-threshold=100

You can use the domain_compaction_threshold catalog session property:

SET SESSION domain_compaction_threshold = 10;

By default, domain-compaction-threshold is set to 32. When the dynamic predicate for a column exceeds this threshold, it is compacted into a single range predicate.

For example, if the dynamic filter collected for a date column dt on the fact table selects more than 32 days, the filtering condition is simplified from dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30') to dt BETWEEN '2020-01-10' AND '2020-05-30'. Using a large threshold can result in increased table scan overhead due to a large IN list getting pushed down to the data source.

Metrics#

Metrics about dynamic filtering are reported in a JMX table for each catalog:

jmx.current."com.starburstdata.presto.plugin.jdbc.dynamicfiltering:name=example,type=dynamicfilteringstats"

Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of available dynamic filters and the time spent waiting for dynamic filters.

JDBC connection pooling#

When JDBC connection pooling is enabled, each node creates and maintains a connection pool instead of opening and closing separate connections to the data source. Each connection is available to connect to the data source and retrieve data. After completion of an operation, the connection is returned to the pool and can be reused. This improves performance by a small amount, reduces the load on any required authentication system used for establishing the connection, and helps avoid running into connection limits on data sources.

JDBC connection pooling is disabled by default. You can enable JDBC connection pooling by setting the connection-pool.enabled property to true in your catalog configuration file:

connection-pool.enabled=true

The following catalog configuration properties can be used to tune connection pooling:

JDBC connection pooling catalog configuration properties#
Property name	Description	Default value
`connection-pool.enabled`	Enable connection pooling for the catalog.	`false`
`connection-pool.max-size`	The maximum number of idle and active connections in the pool.	`10`
`connection-pool.max-connection-lifetime`	The maximum lifetime of a connection. When a connection reaches this lifetime it is removed, regardless of how recently it has been active.	`30m`
`connection-pool.connection-timeout`	The maximum amount of time to wait for a connection from the JDBC connection pool before the request times out.	`30s`
`connection-pool.pool-cache-max-size`	The maximum size of the JDBC data source cache.	`1000`
`connection-pool.pool-cache-ttl`	The expiration time of a cached data source when it is no longer accessed.	`30m`

Starburst Cached Views#

The connectors supports table scan redirection to improve performance and reduce load on the data source.

Lock manager#

The Lock Manager imposes concurrency control on Teradata Database by managing the locks on the database objects being accessed by each transaction and releasing those locks when the transaction either commits or rolls back its work. This control ensures that the data remains consistent for all users. However for analytic queries default Teradata Lock Manager could be too restrictive.

Locking configuration properties#
Property name	Description
`teradata.lock.type`	`LOCKING ROW FOR [ACCESS \| READ]` request modifier is used for outer `SELECT` requests. Defaults to `ACCESS`.

Security#

The connector includes a number of security-related features, detailed in the following sections.

User impersonation#

Warning

User impersonation may reduce the effectiveness of caching in SEP, potentially leading to degraded performance.

Teradata connector supports user impersonation.

User impersonation can be enabled in the catalog file:

teradata.impersonation.enabled=true

User impersonation in Teradata connector is based on GRANT CONNECT THROUGH. For more details, search for GRANT CONNECT THROUGH on the Teradata documentation.

Note

Option WITH TRUST_ONLY in GRANT CONNECT THROUGH is not yet supported.

DEFINER authorization for views#

For views with DEFINER security mode, the Teradata connector can use the permissions of the view owner rather than the permissions of the user executing the query to access tables referenced in a view.

Set authorization.enabled to true to use DEFINER authorization for views during query execution. User impersonation must be enabled for the connector to use this feature.

Kerberos authentication#

The connector supports Kerberos authentication. Use the following properties in the catalog properties file to configure it.

teradata.authentication.type=KERBEROS
kerberos.client.principal=example@example.com
kerberos.client.keytab=<PATH_TO_KEYTAB_FILE>
kerberos.config=etc/kerberos/krb5.conf

With this configuration the user example@example.com, defined in the principal property, is used to connect to the database, and the related Kerberos service ticket is located in the keytab file. The Kerberos configuration specified with kerberos.config is used.

Kerberos credential pass-through#

The connector can be configured to pass through Kerberos credentials, received by SEP, to the Teradata database.

Configure Kerberos and SEP, following the instructions in Kerberos credential pass-through.

Then configure the connector to pass through the credentials from the server to the database in your catalog properties file and ensure the Kerberos client configuration properties are in place:

teradata.authentication.type=KERBEROS_PASS_THROUGH
http.authentication.krb5.config=/etc/krb5.conf
http-server.authentication.krb5.service-name=exampleServiceName
http-server.authentication.krb5.keytab=<PATH_TO_KEYTAB_FILE>

Note

When delegated Kerberos authentication is configured for the Starburst Enterprise web UI, make sure the http-server.authentication.krb5.service-name value is set to HTTP to match the configured Kerberos service name.

Now any database access via SEP is subject to the data access restrictions and permissions of the user supplied via Kerberos.

Password credential pass-through#

The connector supports password credential pass-through. To enable it, edit the catalog properties file to include the authentication type:

teradata.authentication.type=PASSWORD_PASS_THROUGH

For more information about configurations and limitations, see Password credential pass-through.

Audit#

The connector utilizes the QUERY_ID property for Teradata’s Query Band to annotate any query from SEP in Teradata with the SEP query identifier. If teradata.query-band.query-id.enabled=true each Teradata query has QUERY_ID=<query_id> with the current identifier added. This allows you to determine which queries originated from SEP, and specifically also details such as user submitting the query, for audit purposes.

The following example shows a QUERY_ID value for a query issued from SEP:

QUERY_ID=20220324_142042_00000_zkr9n;

If User impersonation is enabled, the impersonated username is appended to the QUERY_ID value as an additional PROXYROLE property:

QUERY_ID=20220324_142042_00000_zkr9n;PROXYROLE=user_name;

Audit configuration properties#
Property name	Description
`teradata.query-band.query-id.enabled`	Add a SEP query identifier to the Teradata QUERY_BAND. Defaults to `false`.
`teradata.query-band.query-id.key-name`	Name of a SEP property for query identifier in the Teradata QUERY_BAND. Defaults to `QUERY_ID`.

SQL support#

The connector provides read and write access to data and metadata in Teradata. In addition to the globally available and read operation statements, the connector supports the following features:

UPDATE limitation#

Only UPDATE statements with constant assignments and predicates are supported. For example, the following statement is supported because the values assigned are constants:

UPDATE table SET col1 = 1 WHERE col3 = 1

Arithmetic expressions, function calls, and other non-constant UPDATE statements are not supported. For example, the following statement is not supported because arithmetic expressions cannot be used with the SET command:

UPDATE table SET col1 = col2 + 2 WHERE col3 = 1

All column values of a table row cannot be updated simultaneously. For a three column table, the following statement is not supported:

UPDATE table SET col1 = 1, col2 = 2, col3 = 3 WHERE col3 = 1

DELETE limitation#

If a WHERE clause is specified, the DELETE operation only works if the predicate in the clause can be fully pushed down to the data source.

ALTER TABLE RENAME TO limitation#

The connector does not support renaming tables across multiple schemas. For example, the following statement is supported:

ALTER TABLE example.schema_one.table_one RENAME TO example.schema_one.table_two

The following statement attempts to rename a table across schemas, and therefore is not supported:

ALTER TABLE example.schema_one.table_one RENAME TO example.schema_two.table_two

ALTER TABLE EXECUTE#

The connector supports the following commands for use with ALTER TABLE EXECUTE:

collect_statistics#

The collect_statistics command is used with Managed statistics to collect statistics for a table and its columns.

The following statement collects statistics for the example_table table and all of its columns:

ALTER TABLE example_table EXECUTE collect_statistics;

Collecting statistics for all columns in a table may be unnecessarily performance-intensive, especially for wide tables. To only collect statistics for a subset of columns, you can include the columns parameter with an array of column names. For example:

ALTER TABLE example_table
    EXECUTE collect_statistics(columns => ARRAY['customer','line_item']);

Procedures#

`system.flush_metadata_cache()`#

Flush JDBC metadata caches. For example, the following system call flushes the metadata caches for all schemas in the example catalog

USE example.example_schema;
CALL system.flush_metadata_cache();

`system.execute('query')`#

The execute procedure allows you to execute a query in the underlying data source directly. The query must use supported syntax of the connected data source. Use the procedure to access features which are not available in Trino or to execute queries that return no result set and therefore can not be used with the query or raw_query pass-through table function. Typical use cases are statements that create or alter objects, and require native feature such as constraints, default values, automatic identifier creation, or indexes. Queries can also invoke statements that insert, update, or delete data, and do not return any data as a result.

The query text is not parsed by Trino, only passed through, and therefore only subject to any security or access control of the underlying data source.

The following example sets the current database to the example_schema of the example catalog. Then it calls the procedure in that schema to drop the default value from your_column on your_table table using the standard SQL syntax in the parameter value assigned for query:

USE example.example_schema;
CALL system.execute(query => 'ALTER TABLE your_table ALTER COLUMN your_column DROP DEFAULT');

Verify that the specific database supports this syntax, and adapt as necessary based on the documentation for the specific connected database and database version.

Starburst Teradata Direct connector#

The Teradata Direct connector is an alternative to the standard Starburst Teradata connector.

Compared to the Starburst Teradata connector, it requires more advanced configuration and is designed for improved performance. It supports all use cases of the Teradata connector, and adds greatly improved performance for SELECT queries.

The connector is composed of components running on SEP, and others running on Teradata. All components participate in every transmission. A transmission includes all data transfer related to a single table scan.

The connector uses JDBC as a control channel to schedule the query in Teradata. Any SELECT statement run with JDBC is passed to a Teradata table operator, which then uses the custom transmitters to pass the data over HTTP to the SEP controller and receivers:

Starburst Enterprise

Controller
The controller starts and manages all transmissions. It runs on the SEP coordinator and distributes work to the receivers, and provides status information to the transmitters.

Receivers
The receivers connect to the transmitters and receive data from Teradata from the transmitters. They run on the SEP workers and are managed by the controller.

Teradata

Table operator and JDBC driver
The table operator and JDBC driver read the data from the Teradata storage.

Transmitters
The transmitters send the data for each table operator invocation on Teradata to the receivers on SEP.

The following features from the Teradata connector are also supported by the Teradata Direct connector:

Kerberos authentication
Kerberos credential pass-through
User impersonation

Unlike the standard Teradata connector, the Teradata Direct connector supports parallelism for both tables and views.

SEP controller and receiver installation#

The Teradata Direct connector must be configured in a catalog properties file. e.g. example.properties, using the connector name teradata_direct.

connector.name=teradata_direct
connection-url=jdbc:teradata://HOST
connection-user=USERNAME
connection-password=PASSWORD
teradata-direct.table-operator.name=some_schema.table_operator
teradata-direct.http.port=9000
teradata-direct.http.host=hostname.accessible.from.teradata.instance

Warning

The port used for the teradata-direct.http.port catalog configuration property must be unique across all catalogs that use the Teradata Direct connector. If multiple different catalogs use the same port value for this property the cluster fails to start.

Besides the connector name, you need to provide the JDBC connection parameters and the relevant JAR files in the teradata plugin directory, as discussed in Starburst Teradata connector.

You must install the native table operator on Teradata in the queried schema. Specify the required table operator name. SEP automatically calculates values for the optional teradata-direct.http.port and teradata-direct.http.host properties, but in more complex network environments we recommend that you manually specify the port and host used to connect with a particular node.

Required Teradata Direct connector catalog properties#
Property name	Description
`teradata-direct.table-operator.name`	Native Table Operator function name.
`teradata-direct.table-operator.logging-level`	We strongly recommend that you set this to `DEBUG` until you have validated that all networking and access works as expected. You can safely remove this setting once you have verified all connections. Leaving it set to DEBUG can affect performance.

General configuration properties#

The following table describes general catalog configuration properties for the connector:

Property name	Description
`case-insensitive-name-matching`	Support case insensitive schema and table names. Defaults to `false`.
`case-insensitive-name-matching.cache-ttl`	Duration for which case insensitive schema and table names are cached. Defaults to `1m`.
`case-insensitive-name-matching.config-file`	Path to a name mapping configuration file in JSON format that allows Trino to disambiguate between schemas and tables with similar names in different cases. Defaults to `null`.
`case-insensitive-name-matching.config-file.refresh-period`	Frequency with which Trino checks the name matching configuration file for changes. The duration value defaults to `0s` (refresh disabled).
`metadata.cache-ttl`	Duration for which metadata, including table and column statistics, is cached. Defaults to `0s` (caching disabled).
`metadata.cache-missing`	Cache the fact that metadata, including table and column statistics, is not available. Defaults to `false`.
`metadata.schemas.cache-ttl`	Duration for which schema metadata is cached. Defaults to the value of `metadata.cache-ttl`.
`metadata.tables.cache-ttl`	Duration for which table metadata is cached. Defaults to the value of `metadata.cache-ttl`.
`metadata.statistics.cache-ttl`	Duration for which tables statistics are cached. Defaults to the value of `metadata.cache-ttl`.
`metadata.cache-maximum-size`	Maximum number of objects stored in the metadata cache. Defaults to `10000`.
`write.batch-size`	Maximum number of statements in a batched execution. Do not change this setting from the default. Non-default values may negatively impact performance. Defaults to `1000`.
`dynamic-filtering.enabled`	Push down dynamic filters into JDBC queries. Defaults to `true`.
`dynamic-filtering.wait-timeout`	Maximum duration for which Trino waits for dynamic filters to be collected from the build side of joins before starting a JDBC query. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries. Defaults to `20s`.

Additional Teradata configuration properties#

Further configuration can be done with the following parameters:

teradata-direct.receiver.buffer.size in the catalog properties file:

Determines the buffer size per each SEP worker. This buffer is available per table scan, so a single query joining three tables uses three buffers. Default value is 32MB.

teradata-direct.parcel.size the catalog properties file:

The size of the data packets sent to the SEP receivers. The last packet may be smaller. Defaults to 2MB. Parcel size should not be higher than the buffer size.

teradata-direct.compression.enabled in the catalog properties file:

Determines if transmission data is compressed. Defaults to false.

teradata-direct.splits-per-worker in the catalog properties file:

Determines how many splits each SEP worker receives. This determines the maximum concurrency level on SEP. Defaults to 16, identical to the task.concurrency of the SEP server.

teradata-direct.table-operator.logging-path in the catalog properties file:

Set the path for log files of the native table operator on the Teradata nodes. Default behavior is to log to files in the /tmp folder.

Configuring in Kubernetes deployments#

The Starburst Teradata Direct connector is supported for Kubernetes deployments in AWS EKS and in Azure AKS meeting certain requirements.

Warning

The configuration to use the connector on Kubernetes is complex and requires significant networking and Kubernetes expertise. Contact Starburst Support for assistance and help.

HTTPS communication#

The Starburst Teradata Direct connector supports secure HTTPS communication between Teradata and SEP nodes. Configuration for HTTPS changed in SEP release 477-e.

Previously, HTTPS settings were configured directly on the connector itself. HTTPS configuration is now inherited from SEP coordinator and worker properties.

A new mandatory property, teradata-direct.root-certificate.path, specifies the path to the root certificate on the Teradata instance. This root certificate must be used to sign all SEP coordinator and worker certificates (For example, presto-master.jks and presto-worker.jks). This certificate should be copied to the Teradata instance in the same way as the native table operator .so file. Without it, Teradata will not be able to communicate with SEP nodes over TLS.

If SEP node communication is non-HTTPS, Teradata Direct communication is also non-HTTPS. If SEP node communication is HTTPS, Teradata Direct communication uses HTTPS.

The only available option for encryption is via the SEP keystore. Automatic TLS configuration is not supported.

Note

These HTTPS configuration changes apply to the Teradata Direct connector (teradata_direct). The temporary fallback connector (teradata_direct_legacy) continues to use the previous behavior. Additionally, the .so files for the updated Teradata Direct connector and the teradata_direct_legacy fallback connector are not compatible. You must install the .so file that matches the connector version in use.

Native table operator#

Install the native table operator with the following steps:

To obtain access to the Teradata native table operator file, contact Starburst Support.
Copy the provided file onto all Teradata servers into an identical location of your choosing. The .so file you copy must correspond to the SEP version of the cluster that will connect to Teradata. Note the path, as you will need it in the CREATE FUNCTION statement below.
Allow Teradata processes to access the file by either adding read permission with a command such as chmod +r <.so file>, or allowing access by the tdtrusted group with chown :tdtrusted <.so file>; chmod g+r <.so file>.
Create the table operator function in the desired schema. We recommend a dedicated database for the current operator version. The version-specific function and symbol names allow multiple native table operator versions to coexist in a single Teradata database.
CREATE FUNCTION some_database.table_operator() RETURNS TABLE VARYING USING FUNCTION starburst_td_contract_<trino_version> LANGUAGE C NO SQL PARAMETER STYLE SQLTABLE EXTERNAL NAME 'SP!<path>/starburst-direct-<version>.so!F!starburst_teradata_<trino_version>';
Replace <trino_version> with the Trino version number (for example, 481 for a 481-e.X release), and <version> with the full SEP release version (for example, 481-e.12). Replace <path> with the location where you copied the .so file in the previous step. The .so file and the version tokens in USING FUNCTION and EXTERNAL NAME must all correspond to the same Trino version. You cannot use a .so file built for a different Trino version than the one referenced in the function name.

Warning

Do not install the native table operator in the syslib database.

No Teradata restart is needed for the changes to take effect.

By default the UDF is created to run in protected mode. It allocates memory and opens socket and can therefore not be used in unprotected mode.

The path and access for the .so file, for example /opt/starburst, has to be adjusted to a suitable location on your Teradata servers.

Each user that connects to Teradata from SEP needs to be granted sufficient access:

GRANT EXECUTE ON SPECIFIC FUNCTION some_database.table_operator TO your_user;

Alternatively access to table operator UDF can be granted via ROLE

CREATE ROLE table_operator_users;
GRANT EXECUTE ON SPECIFIC FUNCTION some_database.table_operator
  TO table_operator_users;
GRANT table_operator_users TO your_user;

This also requires to set the role for any impersonated user, this could be achieved with the below in the catalog file:

teradata.impersonation-role=table_operator_users

Updates

You must only update the native table operator when required in the release notes. The requirement can be in the specific release you are installing, or in a release that you are skipping over. For example, the requirement result from an STS release between two LTS releases. An update from one LTS to the other, skipping the STS, still includes the need to update the native table operator. Contact Starburst Support if you require specific guidance for your update.

The following list details the necessary steps:

Download the new native table operator starburst-direct-<version>.so. The version needs to correspond to the version of the SEP cluster.
Remove the function in Teradata before any updates:
```
DROP FUNCTION some_database.table_operator;
```
Drop the database used for the function in Teradata with DROP DATABASE.
Remove all old .so files from all Teradata server nodes.
Restart Teradata with tparesert -force to remove the native table operator from memory. This step may not be necessary and is solely determined by the memory management of Teradata. If you skip this step and run into an error like [Error 7559] [SQLState HY000] Memory (via malloc call) not freed before exiting UDF/XSP/UDM, you must restart the upgrade process.
Copy the new native table operator file to all Teradata server nodes. Use the original filename from the download. The filename is required to be different from each prior install.
Proceed with the same steps as performed during the initial installation. Make sure you update the create function statement to the new database and .so file path.
Update the catalog properties files to point to the new database on the SEP coordinator and all workers.

teradata-direct.table-operator.name=new_database.table_operator

Restart the SEP cluster.

Starburst Teradata connectors#

Starburst Teradata connector#

Requirements#

Configuration#

General configuration properties#

Transaction mode#

Installation#

Parallelism#

Type mapping#

Teradata to Trino type mapping#

Trino to Teradata type mapping#

Type mapping configuration properties#

Teradata type mapping configuration#

Lower case of Teradata NOT CASESPECIFIC char types#

CREATE TABLE WITH syntax#

Table functions#

query(VARCHAR) -> table#

Performance#

Table statistics#

Managed statistics#

Pushdown#

Cost-based join pushdown#

Predicate pushdown support#

Dynamic filtering#

Wait timeout#

Compaction#

Metrics#

JDBC connection pooling#

Starburst Cached Views#

Lock manager#

Security#

User impersonation#

DEFINER authorization for views#

Kerberos authentication#

Kerberos credential pass-through#

Password credential pass-through#

Audit#

SQL support#

UPDATE limitation#

DELETE limitation#

ALTER TABLE RENAME TO limitation#

ALTER TABLE EXECUTE#

collect_statistics#

Procedures#

system.flush_metadata_cache()#

system.execute('query')#

Starburst Teradata Direct connector#

SEP controller and receiver installation#

General configuration properties#

Additional Teradata configuration properties#

Configuring in Kubernetes deployments#

HTTPS communication#

Native table operator#

Lower case of Teradata `NOT CASESPECIFIC` char types#

`CREATE TABLE WITH` syntax#

`query(VARCHAR) -> table`#

`system.flush_metadata_cache()`#

`system.execute('query')`#