OpenLineage event listener#
The OpenLineage event listener plugin lets you stream lineage information to an
external API. The plugin encodes data in JSON format according to the
OpenLineage specification and sends it via POST requests to your specified
URI.
Overview#
The OpenLineage event listener captures queries that create or modify Trino tables or columns, then transforms the queries into lineage information.
Lineage shows how data flows between tables and columns. OpenLineage provides an open-source standard for capturing this information across systems including Spark, Airflow, and Flink.
Trino |
OpenLineage |
|---|---|
|
Run ID |
|
Run Event Time |
Query Id |
Job Facet Name |
|
Job Facet Namespace (default, can be overridden) |
|
Dataset Name |
|
Dataset Namespace |
Available Trino facets#
Use the following facets to capture information about your Trino queries.
Trino metadata#
The Trino metadata facet contains the following properties:
query_plan: The execution plan for the querytransaction_id: The transaction id used for query processingquery_id: The unique identifier assigned to each query
These properties describe the query that triggered the OpenLineage run event.
This facet appears in both Start and Complete/Fail OpenLineage events.
To disable this facet, add trino_metadata to
openlineage-event-listener.disabled-facets.
Trino query context#
The Trino query context facet contains the following properties:
server_version: The Trino server version that processed the queryenvironment: Inherited fromnode.environmentin Node propertiesquery_type: A query type you configure withopenlineage-event-listener.trino.include-query-types
These properties describe the query that triggered the OpenLineage run event.
This facet appears in both Start and Complete/Fail OpenLineage events.
Use the following properties to organize your lineage events:
Property |
Description |
|---|---|
|
The user that runs the query |
|
The original user who runs the query. Differs from |
|
The authenticated entity from your external security system |
|
The name of the client that submits the query |
|
Additional client information |
|
The IP address of the remote client |
|
The User-Agent header value |
|
The token for query tracing |
To disable this facet, add trino_query_context to
openlineage-event-listener.disabled-facets.
Trino query statistics#
The Trino query statistics facet contains the query statistics of finished
queries. The facet is available only in Complete/Fail events.
To disable this facet, add trino_query_statistics to
openlineage-event-listener.disabled-facets.
Requirements#
Before you configure the OpenLineage event listener, complete the following steps:
Provide an HTTP/S service that accepts
POSTevents with a JSON body and is compatible with the OpenLineage API format.Configure
openlineage-event-listener.transport.urlin the event listener properties file with the URI of the service.Configure
openlineage-event-listener.trino.uriso Trino renders the proper OpenLineage job namespace within events. You must provide a proper URI with scheme, host, and port or the plugin fails to start.Configure what events to send. For more information, see Configuration.
Configuration#
To configure the OpenLineage event listener, create the file
etc/starburst-open-lineage-event-listener.properties with the following
properties:
event-listener.name=starburst-open-lineage
openlineage-event-listener.trino.uri=<Address of your Trino coordinator>
Add etc/starburst-open-lineage-event-listener.properties to
event-listener.config-files in Config properties:
event-listener.config-files=etc/starburst-open-lineage-event-listener.properties,...
Property name |
Description |
Default |
|---|---|---|
|
The type of transport to use when emitting lineage information. For a list of options, see Supported Transport Types. |
|
|
The Trino hostname. Trino uses this to render the job namespace in OpenLineage. This is a required property. |
None |
|
The comma-separated list of query types to include when emitting lineage
information. Each value must match |
|
|
The Available Trino facets you want to exclude from the final OpenLineage event.
The allowed values are |
None |
|
The custom namespace you use for the job |
None |
openlineage-event-listener.job.name-format |
Custom namespace to use for the job |
|
Supported Transport Types#
NOOP: The default transport type. It does not perform work or transfer data.CONSOLE: Sends OpenLineage JSON event to Trino coordinator standard output.HTTP: Sends OpenLineage JSON event to OpenLineage compatible HTTP endpoint.
Property name |
Description |
Default |
|---|---|---|
|
The OpenLineage URL. Required if you use |
None |
|
The custom path for an OpenLineage compatible endpoint. If you configure
this property, you cannot include any custom path in
|
|
|
The API key (string value) for authenticating with the OpenLineage endpoint.
at |
None |
|
The Timeout when you make HTTP Requests. |
|
|
A list of custom HTTP headers to send along with the events. For more information, see Custom HTTP headers. |
Empty |
|
A list of custom URL parameters to append to HTTP requests. For more information, see Custom URL parameters. |
Empty |
|
Compression codec used for reducing size of HTTP body.
Allowed values: |
|
Custom HTTP headers#
Use custom HTTP headers to send metadata with your event.
Format your headers as comma-separated key:value pairs:
openlineage-event-listener.transport.headers="Header-Name-1:header value 1,Header-Value-2:header value 2,..."
To use a comma(,) or colon(:) in a header name or value, escape it using a
backslash (\).
These headers cannot extract dynamic information from the event itself.
Custom URL parameters#
Include custom URL parameters in your HTTP requests.
Format your parameters as comma-separated key:value pairs:
openlineage-event-listener.transport.url-params="Param-Name-1:param value 1,Param-Value-2:param value 2,..."
These parameters cannot extract dynamic information from the event itself.