MongoDB connector#
The MongoDB connector allows the use of MongoDB collections as tables in Starburst Enterprise.
SEP includes additional enterprise features that are built on top of the existing Trino connector functionality. For more information on connector key feature differences between Trino and SEP, see the connectors feature matrix.
Requirements#
To connect to MongoDB, you need:
MongoDB 4.2 or higher.
Network access from the Trino coordinator and workers to MongoDB. Port 27017 is the default port.
Write access to the schema information collection in MongoDB.
A valid Starburst Enterprise license.
Configuration#
To configure the MongoDB connector, create a catalog properties file that
specifies the MongoDB connector by setting the connector.name
to mongodb
.
For example, to access a database as the example
catalog, create the file
etc/catalog/example.properties
. Replace the connection properties as
appropriate for your setup:
connector.name=mongodb
mongodb.connection-url=mongodb://user:pass@sample.host:27017/
The connection-url
defines the connection information and parameters to pass
to the MongoDB driver. The supported parameters for the URL are available in the
MongoDB
documentation.
Multiple MongoDB servers#
You can have as many catalogs as you need. If you have additional MongoDB servers, configure another catalog.
To add another catalog, add a new properties file to etc/catalog
. For example,
if you name the property file sales.properties
, SEP creates a catalog
named sales
.
Configuration properties#
The following table describes catalog configuration properties for the connector:
Property name |
Description |
Default |
---|---|---|
|
Assign MongoDB splits to a specific worker. Set the value of this property
to |
|
|
Match database and collection names case insensitively. |
|
|
The socket connection timeout in milliseconds. A value of |
|
|
The connection
url
string containing the protocol, credential, and host info for use in
connection to your MongoDB deployment. For example, the connection string
may use the format |
|
|
The maximum number of connections allowed per host for this MongoClient instance. Those connections are kept in a pool when idle. Once the pool is exhausted, any operation requiring a connection blocks waiting for an available connection. |
|
|
Limits the number of elements returned in one batch. A cursor typically
fetches a batch of result objects and stores them locally. If |
|
|
Duration to wait for completion of dynamic filters during split generation. |
|
|
The maximum idle time of a pooled connection in milliseconds. A value of |
|
|
The maximum wait time in milliseconds, that a thread may wait for a
connection to become available. A value of |
|
|
The minimum number of connections per host for this MongoClient instance. Those connections are kept in a pool when idle, and the pool ensures over time that it contains at least this minimum number. |
|
|
The read preference to use for queries, map-reduce, aggregation, and count.
The available values are |
|
|
The required replica set name. With this option set, the MongoClient instance performs the following actions:
|
|
|
A collection which contains schema information in each MongoDB database should define the schema of all tables. Please refer the Table definition section for the details. At startup, the connector tries to guess the data type of fields based on
the type mapping. The initial guess can be
incorrect for your specific collection. In that case, you need to modify it
manually. Please refer the Table definition section for the
details. Creating new tables using |
|
|
The socket timeout in milliseconds. It is used for I/O socket read and write operations. |
Defaults to |
|
Enables TLS/SSL connections to MongoDB servers. |
|
|
The key password for the key store specified by |
|
|
||
|
The key password for the trust store specified by
|
|
|
||
|
The write concern to use. The available values are |
|
Table definition#
MongoDB maintains table definitions on the special collection where
mongodb.schema-collection
configuration value specifies.
Note
The plugin cannot detect that a collection has been deleted. You must delete the
entry by executing db.getCollection("_schema").remove( { table: deleted_table_name })
in the MongoDB Shell. You can also drop a collection in
SEP by running DROP TABLE table_name
.
A schema collection consists of a MongoDB document for a table.
{
"table": ...,
"fields": [
{ "name" : ...,
"type" : "varchar|bigint|boolean|double|date|array(bigint)|...",
"hidden" : false },
...
]
}
}
The connector quotes the fields for a row type when auto-generating the schema. However, the auto-generated schema must be corrected manually in the collection to match the information in the tables.
Manually altered fields must be explicitly quoted. For example, row("UpperCase" varchar)
.
Field |
Required |
Type |
Description |
---|---|---|---|
|
required |
string |
Trino table name. |
|
required |
array |
A list of field definitions. Each field definition creates a new column in the Trino table. |
Each field definition:
{
"name": ...,
"type": ...,
"hidden": ...
}
Field |
Required |
Type |
Description |
---|---|---|---|
|
required |
string |
Name of the column in the Trino table. |
|
required |
string |
Trino type of the column. |
|
required |
boolean |
Hides the column from |
There is no limit on field descriptions for either key or message.
ObjectId#
MongoDB collection has the special field _id
. The connector tries to follow
the same rules for this special field, so there will be hidden field _id
.
CREATE TABLE IF NOT EXISTS orders (
orderkey BIGINT,
orderstatus VARCHAR,
totalprice DOUBLE,
orderdate DATE
);
INSERT INTO orders VALUES(1, 'bad', 50.0, current_date);
INSERT INTO orders VALUES(2, 'good', 100.0, current_date);
SELECT _id, * FROM orders;
_id | orderkey | orderstatus | totalprice | orderdate
-------------------------------------+----------+-------------+------------+------------
55 b1 51 63 38 64 d6 43 8c 61 a9 ce | 1 | bad | 50.0 | 2015-07-23
55 b1 51 67 38 64 d6 43 8c 61 a9 cf | 2 | good | 100.0 | 2015-07-23
(2 rows)
SELECT _id, * FROM orders WHERE _id = ObjectId('55b151633864d6438c61a9ce');
_id | orderkey | orderstatus | totalprice | orderdate
-------------------------------------+----------+-------------+------------+------------
55 b1 51 63 38 64 d6 43 8c 61 a9 ce | 1 | bad | 50.0 | 2015-07-23
(1 row)
You can render the _id
field to readable values with a cast to VARCHAR
:
SELECT CAST(_id AS VARCHAR), * FROM orders WHERE _id = ObjectId('55b151633864d6438c61a9ce');
_id | orderkey | orderstatus | totalprice | orderdate
---------------------------+----------+-------------+------------+------------
55b151633864d6438c61a9ce | 1 | bad | 50.0 | 2015-07-23
(1 row)
ObjectId timestamp functions#
The first four bytes of each ObjectId represent an embedded timestamp of its creation time. Trino provides a couple of functions to take advantage of this MongoDB feature.
- objectid_timestamp(ObjectId) timestamp #
Extracts the TIMESTAMP WITH TIME ZONE from a given ObjectId:
SELECT objectid_timestamp(ObjectId('507f191e810c19729de860ea')); -- 2012-10-17 20:46:22.000 UTC
- timestamp_objectid(timestamp) ObjectId #
Creates an ObjectId from a TIMESTAMP WITH TIME ZONE:
SELECT timestamp_objectid(TIMESTAMP '2021-08-07 17:51:36 +00:00'); -- 61 0e c8 28 00 00 00 00 00 00 00 00
In MongoDB, you can filter all the documents created after 2021-08-07 17:51:36
with a query like this:
db.collection.find({"_id": {"$gt": ObjectId("610ec8280000000000000000")}})
In SEP, the same can be achieved with this query:
SELECT *
FROM collection
WHERE _id > timestamp_objectid(TIMESTAMP '2021-08-07 17:51:36 +00:00');
Fault-tolerant execution support#
The connector supports Fault-tolerant execution of query processing. Read and write operations are both supported with any retry policy.
Type mapping#
Because Trino and MongoDB each support types that the other does not, this connector modifies some types when reading or writing data. Data types may not map the same way in both directions between Trino and the data source. Refer to the following sections for type mapping in each direction.
MongoDB to Trino type mapping#
The connector maps MongoDB types to the corresponding Trino types following this table:
MongoDB type |
Trino type |
Notes |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Map to |
|
|
No other types are supported.
Trino to MongoDB type mapping#
The connector maps Trino types to the corresponding MongoDB types following this table:
Trino type |
MongoDB type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
No other types are supported.
SQL support#
The connector provides read and write access to data and metadata in MongoDB. In addition to the globally available and read operation statements, the connector supports the following features:
ALTER TABLE#
The connector supports the following ALTER TABLE operations:
ALTER TABLE RENAME TO
ALTER TABLE ADD COLUMN
ALTER TABLE DROP COLUMN
Other uses of ALTER TABLE
are not supported.
Table functions#
The connector provides specific table functions to access MongoDB.
query(database, collection, filter) -> table
#
The query
function lets you query the underlying MongoDB directly. It
requires syntax native to MongoDB, because the full query is pushed down and
processed by MongoDB. This can be useful for accessing native features which are
not available in SEP or for improving query performance in situations where
running a query natively may be faster.
For example, get all rows where regionkey
field is 0:
SELECT
*
FROM
TABLE(
example.system.query(
database => 'tpch',
collection => 'region',
filter => '{ regionkey: 0 }'
)
);
Performance#
The connector includes a number of performance features, detailed in the following sections.
Predicate pushdown support#
The connector supports predicate pushdown, including complex expression predicate pushdown using MongoDB indexing. It also supports predicate pushdown for the following functions:
json_extract_scalar()
with the=
and!=
operators, on columns of typeSTRING
,INTEGER
,LONG
, andBOOLEAN
.from_iso8601_timestamp()
on columns of typeDATE
and when used in conjunction withjson_extract_scalar()
.
Complex expression predicate pushdown is supported on columns of following types:
SMALLINT
TINYINT
INT
BIGINT
VARCHAR
TIMESTAMP(3)
TIMESTAMP WITH TIMEZONE(3)
You can configure complex predicate pushdown, by configuring the
mongodb.complex-expression-pushdown.enabled
property in your catalog
configuration file. The equivalent session property is
complex_expression_pushdown_enabled
. By default, this feature is set to
true
.
Security#
The connector includes security-related features detailed in the following section.
Kerberos authentication#
The connector supports Kerberos authentication using a keytab.
To configure Kerberos authentication with a keytab, add the following catalog configuration properties to the catalog properties file:
mongo.authentication.type=KERBEROS
kerberos.client.principal=example@example.com
kerberos.client.keytab=etc/kerberos/example.keytab
kerberos.config=etc/kerberos/krb5.conf
With this configuration the user example@example.com
, defined in the
principal property, is used to connect to the database, and the related Kerberos
service ticket is located in the example.keytab
file.
Kerberos credential pass-through#
The connector can be configured to pass through Kerberos credentials received by SEP to the MongoDB database. Configure Kerberos and SEP, following the instructions in Kerberos credential pass-through.
Configure the connector to pass through the credentials from the server to the database in your catalog properties file, and ensure the Kerberos client configuration properties are in place on all nodes.
mongo.authentication.type=KERBEROS_PASS_THROUGH
http.authentication.krb5.config=/etc/krb5.conf
http-server.authentication.krb5.service-name=exampleServiceName
http-server.authentication.krb5.keytab=/path/to/Keytab/File
Note
When delegated Kerberos authentication is configured
for the Starburst Enterprise web UI, ensure the http-server.authentication.krb5.service-name
property value is set to HTTP
to match the configured Kerberos service name.
Now, any MongoDB database accessed using SEP is subject to the Kerberos-defined data access restrictions and permissions.
Password credential pass-through#
The connector supports password credential pass-through. To enable it, edit the catalog properties file to include the authentication type:
mongo.authentication.type=PASSWORD_PASS_THROUGH
mongo.authentication.source=
mongo.authentication.mechanism=
The mongo.authentication.source
property is the name of the database where the
user’s data is stored. If the user’s data is stored outside of MongoDB, set the
property value to $external
.
The mongo.authentication.mechanism
property is modeled after MongoDB’s
authentication
mechanisms
and accepts a single value from the following set defined by MongoDB:
SCRAM-SHA-1
SCRAM-SHA-256
PLAIN
For more information about configurations and limitations, see Password credential pass-through.