Starburst Generic JDBC connector#
The Generic JDBC connector allows querying and creating tables in an external database that SEP does not have a dedicated connector for. This can be used for proof-of-concept deployments of federation use-cases, enabling joining multiple data sources. For production deployments a dedicated supported connector should be used.
Warning
Due to its generic nature, this connector may or may not work with a particular database or deployment scenario. In particular, you should expect the connector to handle at most only a subset of data types available in the external database.
Requirements#
To connect to a data source with the Generic JDBC connector, you need:
A JDBC driver for the database or system you want to connect.
Network access from the coordinator and workers to the data source. The used port varies for different databases and configurations.
A valid Starburst Enterprise license.
Configuration#
The Generic JDBC connector requires a JDBC 4.2 driver dedicated for the external database. No JDBC drivers are shipped with the connector. The installation steps assume you have the external database already running and that you have the appropriate JDBC driver jar.
Locate
generic-jdbc
directory under SEP’s plugin directory. When SEP is installed from RPM, this is/usr/lib/starburst/lib/plugin/generic-jdbc
.Add the appropriate JDBC driver jar file to the
generic-jdbc
directory.Locate
etc/catalog
directory containing configuration for SEP catalogs. When SEP is installed from RPM, this is/etc/starburst/catalog
.Create
some_catalog.properties
file in the catalogs configuration directory (replacesome_catalog
with your database name or some other descriptive name of the catalog) with the following contents:
connector.name=generic-jdbc # the name of the Driver class. E.g. for Postgres this would be org.postgresql.Driver generic-jdbc.driver-class=FULLY_QUALIFIED_NAME_OF_THE_DRIVER_CLASS # JDBC connection URL. E.g. for Postgres this could be jdbc:postgresql://example.net:5432/database?defaultRowFetchSize=10000 connection-url=JDBC_CONNECTION_URL_AS_APPROPRIATE_FOR_THE_EXTERNAL_DATABASE_DRIVER # connection credentials connection-user=USERNAME connection-password=PASSWORD # additional configuration case-insensitive-name-matching=true
Perform the above steps on every SEP cluster node.
Restart SEP on every node.
Multiple external databases#
If you want to connect to multiple external databases using the Generic JDBC connector, follow these steps:
Add appropriate JDBC driver jars to the
generic-jdbc
directory under SEP’s plugin directory for all database servers you are going to connect to. If two external databases use the same driver jar, you do not need to add the driver jar twice.Define additional catalogs in SEP catalogs configuration directory, by creating
some_additional_catalog.properties
file (replacingsome_additional_catalog
with some descriptive name of the catalog).
JDBC compatibility#
Generic JDBC connector provides best-effort compatibility with older JDBC drivers, including drivers implementing version 3 of the JDBC specification (JDBC 3).
The best-effort compatibility can be turned on with:
generic-jdbc.experimental.jdbc-version-compatibility.enabled=true
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Dynamic filtering#
Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.
You can disable dynamic filtering by setting the property
dynamic-filtering.enabled
in your catalog properties file to false
.
Table scan redirections#
The connector supports table scan redirection to improve performance and reduce load on the data source.
SQL support#
The connector provides read and write access to data and metadata in a database using a JDBC connection. In addition to the globally available and read operation statements, the connector supports the following features:
Some SQL statements may not work depending on the database and JDBC driver.
Type mapping#
General configuration properties#
The following properties can be used to configure how data types from the connected data source are mapped to Trino data types and how the metadata is cached in Trino.
Property name |
Description |
Default value |
---|---|---|
|
Configure how unsupported column data types are handled:
The respective catalog session property is |
|
|
Allow forced mapping of comma separated lists of data types to convert to
unbounded |
|
|
Support case insensitive database and collection names |
False |
|
1 minute |
|
|
Duration for which metadata, including table and column statistics, is cached |
0 (disabled caching) |
|
Cache the fact that metadata, including table and column statistics, is not available |
False |
Security#
The connector includes a number of security-related features, detailed in the following sections.
User impersonation#
Generic JDBC connector supports user impersonation.
User impersonation can be enabled in the catalog file:
generic-jdbc.impersonation.enabled=true
User impersonation in Generic JDBC connector is based on SET ROLE
.
Limitations#
Data types#
Support for data types depends on the data types in the external database and how the data types are reported by the JDBC driver of the external database. Actual support may vary from database to database. The Generic JDBC connector will hide columns which data type it considered unsupported. In the extreme case, this may mean that no columns are visible, if all data types being used are unsupported.
If you want to query data with unsupported types, you can define a view in your
external database, converting unsupported data types to supported ones
(converting to varchar
is usually a good choice). Then you should query this
view from SEP, instead of the underlying table.