Starburst DynamoDB connector#
The Starburst DynamoDB connector allows querying data stored in Amazon DynamoDB.
Requirements#
To connect to DynamoDB, you need:
Amazon Web Services (AWS) access credentials with an attached policy to be able to read from DynamoDB.
Network access from the coordinator and workers to the AWS region hosting your DynamoDB tables.
A valid Starburst Enterprise license.
Configuration#
Create the mydynamodb
catalog with a catalog properties file in
etc/catalog
named mydynamodb.properties
:
connector.name=dynamodb
dynamodb.aws-access-key=AKIAIOSFODNN7EXAMPLE
dynamodb.aws-secret-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
dynamodb.aws-region=us-east-1
dynamodb.generate-schema-files=NEVER
dynamodb.schema-directory=/path/to/schema/directory
Specify the connector.name
property as dynamodb
. Configure the
catalog using your AWS access key, secret key, and region for your account.
The connector can only access DynamoDB with the configured access credentials in the catalog. If you need to access DynamoDB with different credentials or a different region, configure a separate catalog. See more information about Type mapping and schemas session and catalog session properties.
Generate a schema file#
The connector exposes the data in your DynamoDB instance in the
amazondynamodb
schema of the catalog.
A schema file for a DynamoDB details the SEP table name, columns, and data types. It additionally species which DynamoDB attribute (or nested attribute) the connector uses to populate data for the column.
If you do not specify a schema file, the connector generates a schema file. The connector samples up to 50 rows from the DynamoDB table and uses those to generate a schema file. The schema files are in RealSQLDatabase (RSD) format and accessible using SQL commands.
The value dictates the behavior:
NEVER
(default): A schema file is never generated.ON_USE
: A schema file is generated the first time a table is referenced, provided the schema file for the table does not already exist.ON_START
: A schema file is generated at connection time for any tables that do not currently have a schema file.
Several attributes detail the mapping from the SQL type to DynamoDB:
Schema file column definitions# Attribute name
Description
Default Value
name
The name of the column in the SEP table
xs:type
The SQL type of the SEP table. Supported types are below
columnsize
Length of a SQL column that supports variable widths like
VARCHAR
255
key
true
if this column is part of the DynamoDB primary key
false
other:keytype
The type of the DynamoDB key. Valid values are either
HASH
orRANGE
other:path
The name of the attribute in DynamoDB
name
other::internaltype
The DynamoDB Type. Valid values are
N
,S
,SS
, orM
Example#
This is an example of a schema file for the TPC-H orders
table. The name
of the file is orders.rsd
and goes in the directory specified by the
dynamodb.schema-directory
configuration property on every SEP host.
The api:info
section has the name of the table, orders
, and then
lists all columns in the SEP table definition as child elements.
The additional api:script
definitions for the GET
, POST
, MERGE
,
and DELETE
API methods are required in every schema file. These definitions
are standard, you can copy them as they are written below.
<api:script xmlns:api="http://apiscript.com/ns?v1">
<api:info title="orders">
<attr name="orderkey" xs:type="bigint" key="true" other:keytype="HASH" other:path="orderkey" other:internaltype="N" />
<attr name="custkey" xs:type="bigint" other:path="custkey" other:internaltype="N" />
<attr name="orderstatus" xs:type="string" columnsize="1" other:path="orderstatus" other:internaltype="S" />
<attr name="totalprice" xs:type="double" other:path="totalprice" other:internaltype="N" />
<attr name="orderdate" xs:type="date" other:path="orderdate" other:internaltype="S" />
<attr name="orderpriority" xs:type="string" columnsize="15" other:path="orderpriority" other:internaltype="S" />
<attr name="clerk" xs:type="string" columnsize="15" other:path="clerk" other:internaltype="S" />
<attr name="shippriority" xs:type="integer" other:path="shippriority" other:internaltype="N" />
<attr name="comment" xs:type="string" columnsize="79" other:path="comment" other:internaltype="S" />
</api:info>
<api:script method="GET">
<api:call op="dynamodbadoMultiThreadSelect">
<api:push/>
</api:call>
</api:script>
<api:script method="POST">
<api:call op="dynamodbadoAdd">
<api:push/>
</api:call>
</api:script>
<api:script method="MERGE">
<api:call op="dynamodbadoUpdate">
<api:push/>
</api:call>
</api:script>
<api:script method="DELETE">
<api:call op="dynamodbadoDelete">
<api:push/>
</api:call>
</api:script>
</api:script>
Type mapping#
The supported SQL types include:
BOOLEAN
TINYINT
SMALLINT
BIGINT
REAL
DOUBLE
VARCHAR(N)
VARBINARY
DATE
All other SQL types are not supported.
Type mapping and schemas#
Review the table to learn more about mapping the relational values of nested attributes and arrays. This is used when generating a schema file and dictates how the table displays.
Property name |
Description |
Default value |
---|---|---|
|
The relative or absolute path to a directory that contains schema files. The directory must exist on every host. More information on generating schema files. |
|
|
Set to |
|
|
The default returns the array as a single column of JSON. Set the integer
value to |
|
Maps and string sets#
DynamoDB supports nested attributes of maps and string sets, and the connector
supports flattening these elements to relational columns. By default, any
map or string set is returned as a single JSON column of type VARCHAR
.
Take the below table foo
which has a single key row_id
and one column
bar
which is a DynamoDB type M
(only the api:info
section is shown
below in the example section).
<api:info title="foo">
<attr name="row_id" xs:type="string" columnsize="255" other:path="row_id" other:internaltype="S" key="true" other:keytype="HASH" />
<attr name="bar" xs:type="string" columnsize="2000" other:path="bar" other:internaltype="M" />
</api:info>
The following exmaple shows a value of bar
is below. It contains three keys
a
, a string, b
, a number, and c
, a string set.
{
"a": "string value",
"b": 123,
"c": ["foo", "bar"]
}
To flatten this value into a map, change the M
column in our schema file
to four columns named a
, b
, c_0
, and c_1
, setting the SQL
types and DynamoDB types via the xs:type
and other:internaltype
attributes, respectively. Then set other:path
to bar___cdata___a
,
where ___cdata___
is a path separator to project into the map to select the
a
attribute in the top-level bar
attribute.
The string set uses the same separator as the map. Instead of specifying an
attribute, use 0
and 1
to map the first and second elements of the
string set to c_0
and c_1
SQL columns.
Set
<api:info title="foo">
<attr name="row_id" xs:type="string" columnsize="255" other:path="row_id" other:internaltype="S" key="true" other:keytype="HASH" />
<attr name="a" xs:type="string" columnsize="255" other:path="bar___cdata___a" other:internaltype="S" />
<attr name="b" xs:type="bigint" other:path="bar___cdata___b" other:internaltype="N" />
<attr name="c_0" xs:type="string" columnsize="255" other:path="bar___cdata___c___cdata___0" other:internaltype="S" />
<attr name="c_1" xs:type="string" columnsize="255" other:path="bar___cdata___c___cdata___1" other:internaltype="S" />
</api:info>
For string set types, it uses the same method of projection. Below, baz
is
of type SS
.
<api:info title="foo">
<attr name="row_id" xs:type="string" columnsize="255" other:path="row_id" other:internaltype="S" key="true" other:keytype="HASH" />
<attr name="bar" xs:type="string" columnsize="2000" other:path="bar" other:internaltype="SS" />
</api:info>
It splits it into three columns:
<api:info title="foo">
<attr name="row_id" xs:type="string" columnsize="255" other:path="row_id" other:internaltype="S" key="true" other:keytype="HASH" />
<attr name="baz_0" xs:type="string" columnsize="2000" other:path="baz___cdata___0" other:internaltype="S" />
<attr name="baz_1" xs:type="string" columnsize="2000" other:path="baz___cdata___1" other:internaltype="S" />
<attr name="baz_2" xs:type="string" columnsize="2000" other:path="baz___cdata___2" other:internaltype="S" />
</api:info>
If baz
contains more than three elements, it is omitted from the result set.
If baz
contains fewer than three elements, the value of the column is
NULL
where an element is not present.
Using this method, you can project into many nested attributes in DynamoDB for both maps and string sets.
SQL support#
The connector supports globally available and read operation statements to access data and metadata in DynamoDB.
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Dynamic filtering#
Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a query.
You can disable dynamic filtering by setting the property
dynamic-filtering.enabled
in your catalog properties file to false
.
Starburst Cached Views#
The connector supports table scan redirection to improve performance and reduce load on the data source.