OpenLineage event listener#

The OpenLineage event listener plugin lets you stream lineage information to an external API. The plugin encodes data in JSON format according to the OpenLineage specification and sends it via POST requests to your specified URI.

Overview#

The OpenLineage event listener captures queries that create or modify Trino tables or columns, then transforms the queries into lineage information.

Lineage shows how data flows between tables and columns. OpenLineage provides an open-source standard for capturing this information across systems including Spark, Airflow, and Flink.

Trino Query attributes mapping to OpenLineage attributes#

Trino

OpenLineage

{UUID(Query Id)}

Run ID

{queryCreatedEvent.getCreateTime()} or {queryCompletedEvent.getEndTime()}

Run Event Time

Query Id

Job Facet Name

trino:// + {openlineage-event-listener.trino.uri.getHost()} + ":" + {openlineage-event-listener.trino.uri.getPort()}

Job Facet Namespace (default, can be overridden)

{schema}.{table}

Dataset Name

trino:// + {openlineage-event-listener.trino.uri.getHost()} + ":" + {openlineage-event-listener.trino.uri.getPort()}

Dataset Namespace

Available Trino facets#

Use the following facets to capture information about your Trino queries.

Trino metadata#

The Trino metadata facet contains the following properties:

  • query_plan: The execution plan for the query

  • transaction_id: The transaction id used for query processing

  • query_id: The unique identifier assigned to each query

These properties describe the query that triggered the OpenLineage run event.

This facet appears in both Start and Complete/Fail OpenLineage events.

To disable this facet, add trino_metadata to openlineage-event-listener.disabled-facets.

Trino query context#

The Trino query context facet contains the following properties:

  • server_version: The Trino server version that processed the query

  • environment: Inherited from node.environment in Node properties

  • query_type: A query type you configure with openlineage-event-listener.trino.include-query-types

These properties describe the query that triggered the OpenLineage run event.

This facet appears in both Start and Complete/Fail OpenLineage events.

Use the following properties to organize your lineage events:

Trino query context properties#

Property

Description

user

The user that runs the query

original_user

The original user who runs the query. Differs from user when you use impersonation

principal

The authenticated entity from your external security system

source

The name of the client that submits the query

client_info

Additional client information

remote_client_address

The IP address of the remote client

user_agent

The User-Agent header value

trace_token

The token for query tracing

To disable this facet, add trino_query_context to openlineage-event-listener.disabled-facets.

Trino query statistics#

The Trino query statistics facet contains the query statistics of finished queries. The facet is available only in Complete/Fail events.

To disable this facet, add trino_query_statistics to openlineage-event-listener.disabled-facets.

Requirements#

Before you configure the OpenLineage event listener, complete the following steps:

  • Provide an HTTP/S service that accepts POST events with a JSON body and is compatible with the OpenLineage API format.

  • Configure openlineage-event-listener.transport.url in the event listener properties file with the URI of the service.

  • Configure openlineage-event-listener.trino.uri so Trino renders the proper OpenLineage job namespace within events. You must provide a proper URI with scheme, host, and port or the plugin fails to start.

  • Configure what events to send. For more information, see Configuration.

Configuration#

To configure the OpenLineage event listener, create the file etc/starburst-open-lineage-event-listener.properties with the following properties:

event-listener.name=starburst-open-lineage
openlineage-event-listener.trino.uri=<Address of your Trino coordinator>

Add etc/starburst-open-lineage-event-listener.properties to event-listener.config-files in Config properties:

event-listener.config-files=etc/starburst-open-lineage-event-listener.properties,...
OpenLineage event listener configuration properties#

Property name

Description

Default

openlineage-event-listener.transport.type

The type of transport to use when emitting lineage information. For a list of options, see Supported Transport Types.

NOOP

openlineage-event-listener.trino.uri

The Trino hostname. Trino uses this to render the job namespace in OpenLineage. This is a required property.

None

openlineage-event-listener.trino.include-query-types

The comma-separated list of query types to include when emitting lineage information. Each value must match io.trino.spi.resourcegroups.QueryType enum. Trino filters out query types you don’t include.

DELETE,INSERT,MERGE,UPDATE,ALTER_TABLE_EXECUTE

openlineage-event-listener.disabled-facets

The Available Trino facets you want to exclude from the final OpenLineage event. The allowed values are trino_metadata, trino_query_context, trino_query_statistics.

None

openlineage-event-listener.namespace

The custom namespace you use for the job namespace attribute. Defaults to dataset namespace.

None

openlineage-event-listener.job.name-format

Custom namespace to use for the job name attribute. Use any string with, with optional substitution variables: $QUERY_ID, $USER, $SOURCE, $CLIENT_IP. For example: As $USER from $CLIENT_IP via $SOURCE.

$QUERY_ID.

Supported Transport Types#

  • NOOP: The default transport type. It does not perform work or transfer data.

  • CONSOLE: Sends OpenLineage JSON event to Trino coordinator standard output.

  • HTTP: Sends OpenLineage JSON event to OpenLineage compatible HTTP endpoint.

OpenLineage HTTP Transport Configuration properties#

Property name

Description

Default

openlineage-event-listener.transport.url

The OpenLineage URL. Required if you use HTTP transport.

None

openlineage-event-listener.transport.endpoint

The custom path for an OpenLineage compatible endpoint. If you configure this property, you cannot include any custom path in openlineage-event-listener.transport.url.

/api/v1

openlineage-event-listener.transport.api-key

The API key (string value) for authenticating with the OpenLineage endpoint. at openlineage-event-listener.transport.url.

None

openlineage-event-listener.transport.timeout

The Timeout when you make HTTP Requests.

5000ms

openlineage-event-listener.transport.headers

A list of custom HTTP headers to send along with the events. For more information, see Custom HTTP headers.

Empty

openlineage-event-listener.transport.url-params

A list of custom URL parameters to append to HTTP requests. For more information, see Custom URL parameters.

Empty

openlineage-event-listener.transport.compression

Compression codec used for reducing size of HTTP body. Allowed values: none, gzip.

none

Custom HTTP headers#

Use custom HTTP headers to send metadata with your event.

Format your headers as comma-separated key:value pairs:

openlineage-event-listener.transport.headers="Header-Name-1:header value 1,Header-Value-2:header value 2,..."

To use a comma(,) or colon(:) in a header name or value, escape it using a backslash (\).

These headers cannot extract dynamic information from the event itself.

Custom URL parameters#

Include custom URL parameters in your HTTP requests.

Format your parameters as comma-separated key:value pairs:

openlineage-event-listener.transport.url-params="Param-Name-1:param value 1,Param-Value-2:param value 2,..."

These parameters cannot extract dynamic information from the event itself.