Starburst Cosmos DB connector#
The Starburst Cosmos DB connector uses the API for NoSQL to read data stored in Azure Cosmos DB for NoSQL.
The Starburst Cosmos DB connector only supports connecting to Azure Cosmos DB for NoSQL. If you are using Azure Cosmos DB for PostgreSQL, MongoDB, or Apache Cassandra, use the native PostgreSQL, MongoDB, or Cassandra connectors instead.
Note
The Starburst Cosmos DB connector is a public preview. Contact Starburst Support with questions or feedback.
Requirements#
To connect to Azure Cosmos DB for NoSQL, you need:
Azure access credentials with an attached policy to be able to read from CosmosDB.
Network access from the coordinator and workers to the Cosmos DB instance. By default this connection uses HTTPS over port 443.
A valid Starburst Enterprise license.
Data in Cosmos DB must be stored in Azure Cosmos DB for NoSQL.
Configuration#
Create the example
catalog with a catalog properties file in etc/catalog
named example.properties
(replace example with your database name or some
other descriptive name of the catalog) with the following contents:
connector.name=cosmosdb
cosmosdb.connection-url=https://ACCOUNT_NAME.documents.azure.com:443/
cosmosdb.connection-key=sample-key
Specify the connector.name
property as cosmosdb
. Configure the catalog
using your Azure Cosmos DB connection URL and access key. The connection URL may
be formatted differently from the example provided here.
Case insensitive matching#
When case-insensitive-name-matching
is set to true
, Trino
is able to query non-lowercase schemas and tables by maintaining a mapping of
the lowercase name to the actual name in the remote system. However, if two
schemas and/or tables have names that differ only in case (such as “customers”
and “Customers”) then Trino fails to query them due to ambiguity.
In these cases, use the case-insensitive-name-matching.config-file
catalog
configuration property to specify a configuration file that maps these remote
schemas/tables to their respective Trino schemas/tables:
{
"schemas": [
{
"remoteSchema": "CaseSensitiveName",
"mapping": "case_insensitive_1"
},
{
"remoteSchema": "cASEsENSITIVEnAME",
"mapping": "case_insensitive_2"
}],
"tables": [
{
"remoteSchema": "CaseSensitiveName",
"remoteTable": "tablex",
"mapping": "table_1"
},
{
"remoteSchema": "CaseSensitiveName",
"remoteTable": "TABLEX",
"mapping": "table_2"
}]
}
Queries against one of the tables or schemes defined in the mapping
attributes are run against the corresponding remote entity. For example, a query
against tables in the case_insensitive_1
schema is forwarded to the
CaseSensitiveName schema and a query against case_insensitive_2
is forwarded
to the cASEsENSITIVEnAME
schema.
At the table mapping level, a query on case_insensitive_1.table_1
as
configured above is forwarded to CaseSensitiveName.tablex
, and a query on
case_insensitive_1.table_2
is forwarded to CaseSensitiveName.TABLEX
.
By default, when a change is made to the mapping configuration file, Trino must
be restarted to load the changes. Optionally, you can set the
case-insensitive-name-mapping.refresh-period
to have Trino refresh the
properties without requiring a restart:
case-insensitive-name-mapping.refresh-period=30s
SQL support#
The connector provides globally available and read operation statements to access data and metadata in Cosmos DB databases.
Type mapping#
Because Trino and Cosmos DB each support types that the other does not, this connector modifies some types when reading data. Data types may not map the same way in both directions between SEP and the data source. Refer to the following sections for type mapping in each direction.
Cosmos DB to Trino type mapping#
The connector maps Cosmos DB types to the corresponding Trino types following this table:
Cosmos DB type |
Trino type |
Notes |
---|---|---|
|
|
|
|
|
Cosmos DB uses IEEE 754 double precision
for its number type. All numeric types in Cosmos DB are mapped to
|
|
|
|
|
|
|
|
|
Mapped instead to |
No other types are supported.
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Pushdown#
The connector supports pushdown for Limit pushdown and some predicates.
Predicate pushdown is only supported for equality (=) and range
(<, >) expressions, on columns of type VARCHAR
and BOOLEAN
.