Starburst MaxCompute connector#

The MaxCompute connector allows users to query data in MaxCompute databases.

Requirements#

To use the MaxCompute connector, you need:

Configuration#

Create a catalog properties file in etc/catalog named example.properties to access the configured MaxCompute database in the example catalog (replace example with your database name or some other descriptive name of the catalog). Configure the usage of the connector by specifying the name maxcompute and replace the connection properties as appropriate for your setup.

connector.name=maxcompute
maxcompute.project.name=max_compute
maxcompute.access.id=access id
maxcompute.access.key=access key
maxcompute.endpoint=http://service.cn-example.maxcompute.aliyun.com/api

General configuration properties#

The following table describes catalog configuration properties for the connector:

Property name

Description

maxcompute.project.name

Name of the MaxCompute project. Required.

maxcompute.access.id

Unique identifier used to access MaxCompute resources securely. Required.

maxcompute.access.key

Access key used to authenticate access to MaxCompute. Required.

maxcompute.endpoint

Endpoint used to communicate with MaxCompute. Required.

maxcompute.tunnel.endpoint

Endpoint where the tunneling protocol should connect, used to improve performance.

maxcompute.additional-projects

Comma separated list of additional MaxCompute projects to be exposed as SEP schemas.

maxcompute.input.split.size

Maximum size for each split of the input data.

Optionally, configure maxcompute.tunnel.endpoint to improve performance:

maxcompute.tunnel.endpoint=http://dt.cn-example.maxcompute.aliyun.com

Type mapping#

Because Trino and MaxCompute each support types that the other does not, this connector modifies some types when reading data. Data types may not map the same way between SEP and the data source. Refer to the following section for type mapping.

MaxCompute to Trino type mapping#

The connector maps MaxCompute types to the corresponding Trino types following this table:

MaxCompute to Trino type mapping#

MaxCompute type

Trino type

Notes

BOOLEAN

BOOLEAN

TINYINT

TINYINT

SMALLINT

SMALLINT

INT

INT

BIGINT

BIGINT

BINARY

VARBINARY

FLOAT

REAL

DOUBLE

DOUBLE

DECIMAL

DECIMAL

VARCHAR

VARCHAR

CHAR

CHAR

Special characters in CHAR columns cannot be read.

STRING

VARCHAR

DATE

DATE

DATETIME

TIMESTAMP(3)

TIMESTAMP

TIMESTAMP

No other types are supported.

SQL support#

The connector provides globally available and read operation statements to access data and metadata in MaxCompute:

View management#

The connector supports read-only views to data and metadata exposed by the connector accessing a data source.

Note

When you query a view, the underlying SELECT statement that defines that view is executed in MaxCompute, then the result set is returned. Because some computation is completed by MaxCompute, this may incur costs in MaxCompute.

Materialized views#

The connector supports Materialized view management. In the underlying system, each materialized view consists of a query statement and a MaxCompute virtual table. When you query a view, the query statement converts into the SQL statement that is used to define the view.

External tables#

The connector lets you view external tables and access unstructured data stored externally.

Managed tables#

Managed tables are fully managed by MaxCompute, including the physical storage of the tables. For these tables, SEP uses the tunnel API to retrieve data in MaxCompute. This lets SEP determine the data size, split it by records and partitions, and use parallelization with multiple InputSplits.

Performance#

The connector includes a number of performance improvements, detailed in the following sections.

Pushdown#

The connector supports partition and projection pushdown.

Projection pushdown#

The connector supports Projection pushdown for VIEWS, MATERIALIZED VIEWS, and external tables.