Starburst Generic JDBC connector#
The Generic JDBC connector allows querying and creating tables in an external database that SEP does not have a dedicated connector for. This can be used for proof-of-concept deployments of federation use-cases, enabling joining multiple data sources. For production deployments a dedicated supported connector should be used.
Warning
Due to its generic nature, this connector may or may not work with a particular database or deployment scenario. In particular, you should expect the connector to handle at most only a subset of data types available in the external database.
Requirements#
To connect to a data source with the Generic JDBC connector, you need:
A JDBC driver for the database or system you want to connect.
Network access from the coordinator and workers to the data source. The used port varies for different databases and configurations.
A valid Starburst Enterprise license.
Configuration#
The Generic JDBC connector requires a JDBC 4.2 driver dedicated for the external database. No JDBC drivers are shipped with the connector. The installation steps assume you have the external database already running and that you have the appropriate JDBC driver jar.
Locate
generic-jdbc
directory under SEP’s plugin directory. When SEP is installed from RPM, this is/usr/lib/starburst/plugin/generic-jdbc
.Add the appropriate JDBC driver jar file to the
generic-jdbc
directory.Locate
etc/catalog
directory containing configuration for SEP catalogs. When SEP is installed from RPM, this is/etc/starburst/catalog
.Create
example.properties
file in the catalogs configuration directory (replaceexample
with your database name or some other descriptive name of the catalog) with the following contents:
connector.name=generic-jdbc # the name of the Driver class. E.g. for Postgres this would be org.postgresql.Driver generic-jdbc.driver-class=FULLY_QUALIFIED_NAME_OF_THE_DRIVER_CLASS # JDBC connection URL. E.g. for Postgres this could be jdbc:postgresql://example.net:5432/database?defaultRowFetchSize=10000 connection-url=JDBC_CONNECTION_URL_AS_APPROPRIATE_FOR_THE_EXTERNAL_DATABASE_DRIVER # connection credentials connection-user=USERNAME connection-password=PASSWORD # additional configuration case-insensitive-name-matching=true
Perform the above steps on every SEP cluster node.
Restart SEP on every node.
General configuration properties#
The following table describes general catalog configuration properties for the connector:
Property name |
Description |
---|---|
|
Support case insensitive schema and table names. Defaults to |
|
Duration for which case insensitive schema and table
names are cached. Defaults to |
|
Path to a name mapping configuration file in JSON format that allows
Trino to disambiguate between schemas and tables with similar names in
different cases. Defaults to |
|
Frequency with which Trino checks the name matching configuration file
for changes. The duration value defaults to |
|
Duration for which metadata, including table and
column statistics, is cached. Defaults to |
|
Cache the fact that metadata, including table and column statistics, is
not available. Defaults to |
|
Duration for which schema metadata is cached.
Defaults to the value of |
|
Duration for which table metadata is cached.
Defaults to the value of |
|
Duration for which tables statistics are cached.
Defaults to the value of |
|
Maximum number of objects stored in the metadata cache. Defaults to |
|
Maximum number of statements in a batched execution. Do not change
this setting from the default. Non-default values may negatively
impact performance. Defaults to |
|
Push down dynamic filters into JDBC queries. Defaults to |
|
Maximum duration for which Trino waits for dynamic
filters to be collected from the build side of joins before starting a
JDBC query. Using a large timeout can potentially result in more detailed
dynamic filters. However, it can also increase latency for some queries.
Defaults to |
Caution
Because the connector is generic, while you can configure these properties there is no guarantee that the configured feature works as expected.
Multiple external databases#
If you want to connect to multiple external databases using the Generic JDBC connector, follow these steps:
Add appropriate JDBC driver jars to the
generic-jdbc
directory under SEP’s plugin directory for all database servers you are going to connect to. If two external databases use the same driver jar, you do not need to add the driver jar twice.Define additional catalogs in SEP catalogs configuration directory, by creating
additional_example.properties
file (replacingadditional_example
with some descriptive name of the catalog).
JDBC compatibility#
Generic JDBC connector provides best-effort compatibility with older JDBC drivers, including drivers implementing version 3 of the JDBC specification (JDBC 3).
The best-effort compatibility can be turned on with:
generic-jdbc.experimental.jdbc-version-compatibility.enabled=true
In rare cases, JDBC drivers fail to implement search string escaping as documented in the JDBC specification. In those cases, you can force the driver to ignore escaping by setting the following property:
generic-jdbc.ignore-search-string-escape=true
Table functions#
The connector provides specific table functions to access a data source with the generic JDBC connector.
query(VARCHAR) -> table
#
The query
function allows you to query the underlying database directly. It
requires syntax native to the data source, because the full query is pushed down
and processed in the data source. This can be useful for accessing native
features or for improving query performance in situations where running a query
natively may be faster.
The query
table function is available in the system
schema of any
catalog that uses the generic JDBC connector, such as example
. The
following example passes myQuery
to the data source. myQuery
has to be a
valid query for the data source, and is required to return a table as a result:
SELECT
*
FROM
TABLE(
example.system.query(
query => 'myQuery'
)
);
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Dynamic filtering#
Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.
You can disable dynamic filtering by setting the dynamic-filtering.enabled
property in your catalog configuration file to false
.
Wait timeout#
By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.
You can configure the dynamic-filtering.wait-timeout
property in your
catalog properties file:
dynamic-filtering.wait-timeout=1m
You can use the dynamic_filtering_wait_timeout
catalog session property in a specific session:
SET SESSION example.dynamic_filtering_wait_timeout = 1s;
Compaction#
The maximum size of dynamic filter predicate, that is pushed down to the
connector during table scan for a column, is configured using the
domain-compaction-threshold
property in the catalog
properties file:
domain-compaction-threshold=100
You can use the domain_compaction_threshold
catalog
session property:
SET SESSION domain_compaction_threshold = 10;
By default, domain-compaction-threshold
is set to 32
.
When the dynamic predicate for a column exceeds this threshold, it is compacted
into a single range predicate.
For example, if the dynamic filter collected for a date column dt
on the
fact table selects more than 32 days, the filtering condition is simplified from
dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30')
to dt BETWEEN '2020-01-10' AND '2020-05-30'
. Using a large threshold can result in increased
table scan overhead due to a large IN
list getting pushed down to the data
source.
Metrics#
Metrics about dynamic filtering are reported in a JMX table for each catalog:
jmx.current."io.trino.plugin.jdbc:name=example,type=dynamicfilteringstats"
Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of available dynamic filters and the time spent waiting for dynamic filters.
Table scan redirections#
The connector supports table scan redirection to improve performance and reduce load on the data source.
SQL support#
The connector provides read and write access to data and metadata in a database using a JDBC connection. In addition to the globally available and read operation statements, the connector supports the following features:
Some SQL statements may not work depending on the database and JDBC driver.
Type mapping#
Type mapping configuration properties#
The following properties can be used to configure how data types from the connected data source are mapped to Trino data types and how the metadata is cached in Trino.
Property name |
Description |
Default value |
---|---|---|
|
Configure how unsupported column data types are handled:
The respective catalog session property is |
|
|
Allow forced mapping of comma separated lists of data types to convert to
unbounded |
Security#
The connector includes a number of security-related features, detailed in the following sections.
User impersonation#
Generic JDBC connector supports user impersonation.
User impersonation can be enabled in the catalog file:
generic-jdbc.impersonation.enabled=true
User impersonation in Generic JDBC connector is based on SET ROLE
.
Limitations#
Data types#
Support for data types depends on the data types in the external database and how the data types are reported by the JDBC driver of the external database. Actual support may vary from database to database. The Generic JDBC connector will hide columns which data type it considered unsupported. In the extreme case, this may mean that no columns are visible, if all data types being used are unsupported.
If you want to query data with unsupported types, you can define a view in your
external database, converting unsupported data types to supported ones
(converting to VARCHAR
is usually a good choice). Then you should query this
view from SEP, instead of the underlying table.