Event logger#
The event logger allows query completion events and all related Starburst Enterprise platform (SEP) information to be logged into a database. Once the event logger is running, use the Insights user interface to explore query history.
Requirements#
To use the event logger, you need:
Externally managed database. The following RDBMS are supported:
MySQL 8.0.12+
PostgreSQL 9.6+
OracleDB 12.2.0.1+
Network access from the Trino coordinator to the external database.
A valid Starburst Enterprise license.
Installation and configuration#
The event logger is an event listener implementation. It can be enabled by creating a configuration file
called etc/event-listener.properties
on the coordinator. Alternatively, you
can use a file name such as etc/event-logger.properties
and configure
the config file name.
The event-listener.name
needs to be set to event-logger
. The
configuration includes the JDBC connection URL with hostname, port and database
name as well as username and password for accessing the database.
You must restart the cluster after creating the configuration files. At first
usage, the table completed_queries
is automatically created in the
database. Once the table is created, logging is activated and running.
Instructions for configuring event logger for Kubernetes deployments are available in the Kubernetes documentation.
Example configuration for PostgreSQL
event-listener.name=event-logger
jdbc.url=jdbc:postgresql://postgresql.example.com:5432/event_logger
jdbc.user=test_psql
jdbc.password=test12
It is not good practice to use the public
schema. Configure the user to use
a different schema by default. See the PostgreSQL documentation
for details.
Example configuration for Oracle
event-listener.name=event-logger
jdbc.url=jdbc:oracle:thin:@oracle.example.com:1521/event_logger
jdbc.user=test_ora
jdbc.password=test12
Example configuration for MySQL
event-listener.name=event-logger
jdbc.url=jdbc:mysql://mysql.example.com:3306/event_logger?sessionVariables=sql_mode=ANSI
jdbc.user=test_mysql
jdbc.password=test12
You must specify sql_mode=ANSI
in the jdbc.url
configuration. The
configured user must have sufficient rights to create tables and insert data in
the configured schema.
Logged information#
Information from processing queries is logged in multiple tables and the contained fields. Details are documented in the following sections.
Completed queries#
A row is created in the completed_queries
table for each submitted query. It
captures everything that SEP emits – query, user, metadata, stats, performance
related attributes, resource consumption, start time, end time, and much more:
Column |
Description |
---|---|
|
Unique identifier of the query |
|
Session catalog |
|
Session schema |
|
Identity used to authenticate to SEP |
|
User agent that submitted the query |
|
Information about the client submitting the query |
|
Client tool or driver used to execute query |
|
Name of the SEP environment name |
|
Address of the client that submitted the query |
|
Version of SEP that executed the query |
|
Identity used for authorization and access control |
|
The groups the user executing the query belongs to, as an array of strings |
|
State of the query when logged |
|
Full SQL statement of the submitted query |
|
Full explain plan of the submitted query, including costs for each stage.
Identical to the output from |
|
Sum of all rows used in the query input |
|
Sum of all bytes used in the query input |
|
Number of filtered rows on query output |
|
Number of filtered bytes on query output |
|
Number of inserted rows |
|
Number of bytes of inserted rows |
|
Total accumulated CPU time across threads and workers |
|
Elapsed time for query pressing as measured by a wall clock in ms |
|
Time spent between query submission and the beginning of query planning in ms |
|
Maximum amount of memory directly tied to query resources used by a query at one time |
|
Maximum amount of memory used at one time which is not eligible to be spilled to disk |
|
Maximum amount of user memory reserved at one time by one task |
|
Maximum amount of user and system memory reserved at one time by one task, values are impacted by the memory management configurations |
|
Number of uncompressed bytes read from source |
|
Number of rows read from source |
|
Number of bytes of data exchanged between nodes during query execution |
|
Number of rows of data exchanged between nodes during query execution |
|
Integral of memory reservation with respect to time, giving units of
|
|
Total number of splits completed by the query |
|
Table statistics for each source table, as well as estimates for the CPU,
memory, and network I/O for each stage of the query plan and each operator
in each plan node. Operator costs are denoted by their plan node ID, as
indicated from the |
|
Information about JVM garbage collection tasks and time spent in GC for each query stage |
|
Min, max, average, and percentiles for reserved CPU time per task for each query stage. Can be useful for identifying skew in a query. |
|
Information about every operator, including operator type (TableScan, Aggregation, Join, etc.) and resources used (CPU time, network I/O, memory reserved, etc.) invoked at each stage of the query. |
|
Time spent waiting for sufficient resources before query execution in ms |
|
Time spent reading the metadata and checking the query for semantic errors in ms |
|
Total time a query spent in the execution phase in ms |
|
Timestamp from when query was received |
|
Timestamp when results were finished being consumed by the client |
|
List of catalogs, schemas, tables, and columns read. Query must be a
|
|
Time spent creating and optimizing the query plan in ms |
|
Total time a query’s tasks were scheduled on a thread to run in ms |
|
Information about a query’s failure reason (if applicable), as a JSON formatted string. |
|
The type of query. Possible values are the ones enumerated for the
|
|
The type of update the query performed (e.g. |
Query tables#
For every table referenced by a query, zero, one or more rows with information
about the accessed table is created in the query_tables
table. The
query_id
defines the relationship to the query in the completed_queries
table.
Column |
Description |
---|---|
|
Unique identifier of the query |
|
Name of the catalog containing the queried table |
|
Name of the schema containing the queried table |
|
Name of the queried table |
|
Number of bytes read or written |
|
Number of rows read or written |
|
Boolean flag indicating whether the table was written to ( |
Analyzing the event log#
To analyze and query the event log, you can create a catalog with the PostgreSQL connector using the same JDBC connection parameters. This allows you to use the Trino CLI, or any other application connected to the catalog to create queries, dashboards, perform ad hoc analysis and more.
Example use cases:
measure query performance numbers and trends
understand impact of different cluster configurations
enable cluster workload management and resource consumption
Example#
Generate event logger data by running a query.
trino> CREATE TABLE memory.default.country AS
SELECT n.name nation_name, r.name region_name
FROM delta.sf1.nation n JOIN delta.sf1.region r ON n.regionkey = r.regionkey;
CREATE TABLE: 25 rows
Query 20201209_021306_00003_qsyab, FINISHED, 1 node
Splits: 68 total, 68 done (100.00%)
3.34 [35 rows, 5.35KB] [10 rows/s, 1.6KB/s]
Event logger data is written to the completed_queries
table in the
PostgreSQL database.
In the following example, the catalog file is named event_logger.properties
and the public
schema contains the completed_queries
table. The
query_id
from the execution is used to locate the correct record. The output
in the following example has been trimmed.
trino> SELECT *
FROM event_logger.public.completed_queries
WHERE query_id = '20201209_021306_00003_qsyab';
-[ RECORD 1 ]-------------------------+---------------------------------------
query_id | 20201209_021306_00003_qsyab
catalog | NULL
schema | NULL
principal | demo
user_agent | StatementClientV1/345
client_info | NULL
source | trino-cli
environment | demo
remote_client_address | 127.0.0.1
server_version | 350-e
usr | demo
query_state | FINISHED
query | CREATE TABLE memory.default.country AS SELECT n.name nation_name, r.name region_name FROM delta.sf1.nation n JOIN delta.sf1.region r ON n.regionkey = r.regionkey
query_plan | Fragment 0 [COORDINATOR_ONLY]
| CPU: 43.24ms, Scheduled: 60.81ms, Input: 2 rows (81B); per task: avg.: 2.00 std.dev.: 0.00, Output: 1 row (9B)
| Output layout: [rows]
| ...
total_rows | 35
total_bytes | 5477
output_rows | 1
output_bytes | 9
written_rows | 25
written_bytes | 597
cpu_time_ms | 392
wall_time_ms | 3342
queued_time_ms | 10
peak_user_memory_bytes | 387318
peak_total_non_revocable_memory_bytes | 387318
peak_task_user_memory | 387318
peak_task_total_memory | 387318
physical_input_bytes | 5274
physical_input_rows | 30
internal_network_bytes | 933
internal_network_rows | 32
cumulative_memory | 7.7086698E7
completed_splits | 68
plan_node_stats_and_costs | {"stats":{},"costs":{}}
stage_gc_statistics | [{"stageId":0,"tasks":1,"fullGcTasks":0,...
cpu_time_distribution | [{"stageId":0,"tasks":1,"p25":25,"p50":25,...
operator_summaries | [{"stageId":0,"pipelineId":0,"operatorId":0,..
resource_waiting_time | 783
analysis_time | 783
execution_time | 2559
create_time | 2020-12-09 02:13:06.140000 UTC
end_time | 2020-12-09 02:13:09.482000 UTC
accessed_metadata | [{"catalogName":"delta","schema":"sf1","table":"nation","columns":["name","regionkey"],...
failure_info | NULL
update_type | CREATE TABLE
planning_time | 1711
scheduled_time | 1263
The tables accessed by the query are recorded in the query_tables
table and
can be identified using the same query_id
.
trino> SELECT *
FROM event_logger.public.query_tables
WHERE query_id = '20201209_021306_00003_qsyab';
query_id | catalog_name | schema_name | table_name | physical_bytes | physical_rows | is_output
-----------------------------+--------------+-------------+------------+----------------+---------------+-----------
20201209_021306_00003_qsyab | memory | default | country | 597 | 25 | true
20201209_021306_00003_qsyab | delta | sf1 | region | 1691 | 5 | false
20201209_021306_00003_qsyab | delta | sf1 | nation | 3583 | 25 | false
(3 rows)