Starburst Generic JDBC connector#

The Generic JDBC connector allows querying and creating tables in an external database that SEP does not have a dedicated connector for. This can be used for proof-of-concept deployments of federation use-cases, enabling joining multiple data sources. For production deployments a dedicated supported connector should be used.

Warning

Due to its generic nature, this connector may or may not work with a particular database or deployment scenario. In particular, you should expect the connector to handle at most only a subset of data types available in the external database.

Requirements#

To connect to a data source with the Generic JDBC connector, you need:

  • A JDBC driver for the database or system you want to connect.

  • Network access from the coordinator and workers to the data source. The used port varies for different databases and configurations.

  • A valid Starburst Enterprise license.

Configuration#

The Generic JDBC connector requires a JDBC 4.2 driver dedicated for the external database. No JDBC drivers are shipped with the connector. The installation steps assume you have the external database already running and that you have the appropriate JDBC driver jar.

  1. Locate generic-jdbc directory under SEP’s plugin directory. When SEP is installed from RPM, this is /usr/lib/starburst/lib/plugin/generic-jdbc.

  2. Add the appropriate JDBC driver jar file to the generic-jdbc directory.

  3. Locate etc/catalog directory containing configuration for SEP catalogs. When SEP is installed from RPM, this is /etc/starburst/catalog.

  4. Create some_catalog.properties file in the catalogs configuration directory (replace some_catalog with your database name or some other descriptive name of the catalog) with the following contents:

connector.name=generic-jdbc

# the name of the Driver class. E.g. for Postgres this would be org.postgresql.Driver
generic-jdbc.driver-class=FULLY_QUALIFIED_NAME_OF_THE_DRIVER_CLASS

# JDBC connection URL. E.g. for Postgres this could be jdbc:postgresql://example.net:5432/database?defaultRowFetchSize=10000
connection-url=JDBC_CONNECTION_URL_AS_APPROPRIATE_FOR_THE_EXTERNAL_DATABASE_DRIVER

# connection credentials
connection-user=USERNAME
connection-password=PASSWORD

# additional configuration
case-insensitive-name-matching=true
  1. Perform the above steps on every SEP cluster node.

  2. Restart SEP on every node.

Multiple external databases#

If you want to connect to multiple external databases using the Generic JDBC connector, follow these steps:

  1. Add appropriate JDBC driver jars to the generic-jdbc directory under SEP’s plugin directory for all database servers you are going to connect to. If two external databases use the same driver jar, you do not need to add the driver jar twice.

  2. Define additional catalogs in SEP catalogs configuration directory, by creating some_additional_catalog.properties file (replacing some_additional_catalog with some descriptive name of the catalog).

JDBC compatibility#

Generic JDBC connector provides best-effort compatibility with older JDBC drivers, including drivers implementing version 3 of the JDBC specification (JDBC 3).

The best-effort compatibility can be turned on with:

generic-jdbc.experimental.jdbc-version-compatibility.enabled=true

Performance#

The connector includes a number of performance improvements, detailed in the following sections.

Dynamic filtering#

Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.

You can disable dynamic filtering by setting the property dynamic-filtering.enabled in your catalog properties file to false.

Table scan redirections#

The connector supports table scan redirection to improve performance and reduce load on the data source.

SQL support#

The connector provides read and write access to data and metadata in a database using a JDBC connection. In addition to the globally available and read operation statements, the connector supports the following features:

Some SQL statements may not work depending on the database and JDBC driver.

Type mapping#

General configuration properties#

The following properties can be used to configure how data types from the connected data source are mapped to Trino data types and how the metadata is cached in Trino.

Property name

Description

Default value

unsupported-type-handling

Configure how unsupported column data types are handled:

  • IGNORE, column is not accessible.

  • CONVERT_TO_VARCHAR, column is converted to unbounded VARCHAR.

The respective catalog session property is unsupported_type_handling.

IGNORE

jdbc-types-mapped-to-varchar

Allow forced mapping of comma separated lists of data types to convert to unbounded VARCHAR

case-insensitive-name-matching

Support case insensitive database and collection names

False

case-insensitive-name-matching.cache-ttl

1 minute

metadata.cache-ttl

Duration for which metadata, including table and column statistics, is cached

0 (disabled caching)

metadata.cache-missing

Cache the fact that metadata, including table and column statistics, is not available

False

Security#

The connector includes a number of security-related features, detailed in the following sections.

User impersonation#

Generic JDBC connector supports user impersonation.

User impersonation can be enabled in the catalog file:

generic-jdbc.impersonation.enabled=true

User impersonation in Generic JDBC connector is based on SET ROLE.

Limitations#

Data types#

Support for data types depends on the data types in the external database and how the data types are reported by the JDBC driver of the external database. Actual support may vary from database to database. The Generic JDBC connector will hide columns which data type it considered unsupported. In the extreme case, this may mean that no columns are visible, if all data types being used are unsupported.

If you want to query data with unsupported types, you can define a view in your external database, converting unsupported data types to supported ones (converting to varchar is usually a good choice). Then you should query this view from SEP, instead of the underlying table.