Query logger#

The Starburst Enterprise platform (SEP) query logger is the backend service that stores information for:

  • Query completion details and events

  • Cluster metrics

  • Cache service

  • Data products

  • Built-in role-based access control


Query logger is required, however by default, it is disabled. You must provide and configure a suitable database, and enable the service as described in the requirements and installation sections that follow.


To use the query logger, you need:

  • An externally-managed database for use by the query logger. The Following RDBMS are supported:

    • MySQL 8.0.12+

    • PostgreSQL 9.6+

    • OracleDB

  • Network access from the coordinator to the external database.

  • A valid Starburst Enterprise license.

The requirements for the external database vary based on a number of factors, including the following:

  • Specific RDBMS in use

  • Size of the cluster

  • Number of users

  • Number of executed queries

  • Number of Insights users accessing metrics

  • Reports and other queries accessing the query logger database

Start with 64GB RAM and 8 cores for the RDBMS server, monitor the server and performance based on your actual load, and adjust accordingly.

In addition, the connection to the database needs to be highly performant in terms latency and throughput to support your specific usage load.

Installation and configuration#

To enable the query logger, configure query logger on the SEP coordinator to use an existing database with the following query logger configuration properties on the coordinator only:

Query logger configuration properties#

Property name



JDBC connection string for the query logger database.


Username of a user with full read and write access to the query logger database.


Password for the database user.

Instructions for configuring the query logger for Kubernetes deployments are available in the Kubernetes documentation. For Starburst Admin deployments, read about the config.properties file.

The following examples show the query logger configuration in the coordinator’s config.properties for the starburstenterprise database in PostgreSQL, Oracle, and MySQL:

Example query logger configuration with a PostgreSQL database


It is not good practice to use the PostgreSQL public schema. Configure the user to use a different schema by default. See the PostgreSQL documentation for details.

Example query logger configuration with an Oracle database


Example query configuration with a MySQL database


You must specify sql_mode=ANSI in the insights.jdbc.url configuration for a MySQL database. The configured user must have sufficient rights to create tables and insert data in the configured schema.

You must explicitly enable the persistence for Insights to use the persisted data in the query logger database.

Backup before upgrading#

We recommend creating a backup of your query logger database before upgrading to a new version of SEP. Refer to your database documentation for platform- and version-specific instructions.

Logged information#

Information from processing queries is logged in multiple tables and the contained fields. Details are documented in the following sections.

Completed queries#

A row is created in the completed_queries table for each submitted query. It captures everything that SEP emits – query, user, metadata, stats, performance related attributes, resource consumption, start time, end time, and much more:

Columns of the completed_queries table#




Unique identifier of the query


Session catalog


Session schema


Identity used to authenticate to SEP


User agent that submitted the query


Information about the client submitting the query


Client tool or driver used to execute query


Name of the SEP environment name


Address of the client that submitted the query


Version of SEP that executed the query


Identity used for authorization and access control


The groups the user executing the query belongs to, as an array of strings


State of the query when logged


Full SQL statement of the submitted query


Full explain plan of the submitted query, including costs for each stage. Identical to the output from EXPLAIN ANALYZE <query>.


Sum of all rows used in the query input


Sum of all bytes used in the query input


Number of filtered rows on query output


Number of filtered bytes on query output


Number of inserted rows


Number of bytes of inserted rows


Total accumulated CPU time across threads and workers


Elapsed time for query pressing as measured by a wall clock in ms


Time spent between query submission and the beginning of query planning in ms


Maximum amount of memory directly tied to query resources used by a query at one time


Maximum amount of memory used at one time which is not eligible to be spilled to disk


Maximum amount of user memory reserved at one time by one task


Maximum amount of user and system memory reserved at one time by one task, values are impacted by the memory management configurations


Number of uncompressed bytes read from source


Number of rows read from source


Number of bytes of data exchanged between nodes during query execution


Number of rows of data exchanged between nodes during query execution


Integral of memory reservation with respect to time, giving units of memory x time. Dividing this figure by the execution time approximates the average memory reservation during query execution.


Total number of splits completed by the query


Table statistics for each source table, as well as estimates for the CPU, memory, and network I/O for each stage of the query plan and each operator in each plan node. Operator costs are denoted by their plan node ID, as indicated from the operator_summaries column. Cost values do not represent any actual unit, but are numbers that are used to compare the relative costs between plan nodes, allowing the optimizer to choose the best plan for executing a query.


Information about JVM garbage collection tasks and time spent in GC for each query stage


Min, max, average, and percentiles for reserved CPU time per task for each query stage. Can be useful for identifying skew in a query.


Information about every operator, including operator type (TableScan, Aggregation, Join, etc.) and resources used (CPU time, network I/O, memory reserved, etc.) invoked at each stage of the query.


Time spent waiting for sufficient resources before query execution in ms


Time spent reading the metadata and checking the query for semantic errors in ms


Total time a query spent in the execution phase in ms


Timestamp from when query was received


Timestamp when results were finished being consumed by the client


List of catalogs, schemas, tables, and columns read. Query must be a SELECT query that returned more than 0 rows.


Time spent creating and optimizing the query plan in ms


Total time a query’s tasks were scheduled on a thread to run in ms


Information about a query’s failure reason (if applicable), as a JSON formatted string.


The type of query. Possible values are the ones enumerated for the queryType selector rule for Resource groups.


The type of update the query performed (e.g. CREATE TABLE); this column is null for SELECT queries.

Query tables#

For every table referenced by a query, zero, one or more rows with information about the accessed table is created in the query_tables table. The query_id defines the relationship to the query in the completed_queries table.

Columns of the query_tables table#




Unique identifier of the query


Name of the catalog containing the queried table


Name of the schema containing the queried table


Name of the queried table


Number of bytes read or written


Number of rows read or written


Boolean flag indicating whether the table was written to (true) or read from (false)

Analyzing the query log#

Query log data is available for analysis and inspection in Insights query overview.

To create custom analysis and visualizations of the query log data, you can create a catalog with the PostgreSQL connector using the same JDBC connection parameters. This allows you to use the Trino CLI, or any other application connected to the catalog to create queries, dashboards, perform ad hoc analysis and more.

Example use cases:

  • measure query performance numbers and trends

  • understand impact of different cluster configurations

  • enable cluster workload management and resource consumption

The following section describes an example analysis.

Generate query logger data by running a query.

trino> CREATE TABLE memory.default.country AS
        SELECT n.name nation_name, r.name region_name
        FROM delta.sf1.nation n JOIN delta.sf1.region r ON n.regionkey = r.regionkey;

Query 20201209_021306_00003_qsyab, FINISHED, 1 node
Splits: 68 total, 68 done (100.00%)
3.34 [35 rows, 5.35KB] [10 rows/s, 1.6KB/s]

Query logger data is written to the completed_queries table in the PostgreSQL database.

In the following example, the catalog file is named sepquerylogger.properties and the public schema contains the completed_queries table. The query_id from the execution is used to locate the correct record. The output in the following example has been trimmed.

trino> SELECT *
        FROM sepquerylogger.public.completed_queries
        WHERE query_id = '20201209_021306_00003_qsyab';
-[ RECORD 1 ]-------------------------+---------------------------------------
query_id                              | 20201209_021306_00003_qsyab
catalog                               | NULL
schema                                | NULL
principal                             | demo
user_agent                            | StatementClientV1/345
client_info                           | NULL
source                                | trino-cli
environment                           | demo
remote_client_address                 |
server_version                        | 350-e
usr                                   | demo
query_state                           | FINISHED
query                                 | CREATE TABLE memory.default.country AS SELECT n.name nation_name, r.name region_name FROM delta.sf1.nation n JOIN delta.sf1.region r ON n.regionkey = r.regionkey
query_plan                            | Fragment 0 [COORDINATOR_ONLY]
                                      |     CPU: 43.24ms, Scheduled: 60.81ms, Input: 2 rows (81B); per task: avg.: 2.00 std.dev.: 0.00, Output: 1 row (9B)
                                      |     Output layout: [rows]
                                      | ...
total_rows                            | 35
total_bytes                           | 5477
output_rows                           | 1
output_bytes                          | 9
written_rows                          | 25
written_bytes                         | 597
cpu_time_ms                           | 392
wall_time_ms                          | 3342
queued_time_ms                        | 10
peak_user_memory_bytes                | 387318
peak_total_non_revocable_memory_bytes | 387318
peak_task_user_memory                 | 387318
peak_task_total_memory                | 387318
physical_input_bytes                  | 5274
physical_input_rows                   | 30
internal_network_bytes                | 933
internal_network_rows                 | 32
cumulative_memory                     | 7.7086698E7
completed_splits                      | 68
plan_node_stats_and_costs             | {"stats":{},"costs":{}}
stage_gc_statistics                   | [{"stageId":0,"tasks":1,"fullGcTasks":0,...
cpu_time_distribution                 | [{"stageId":0,"tasks":1,"p25":25,"p50":25,...
operator_summaries                    | [{"stageId":0,"pipelineId":0,"operatorId":0,..
resource_waiting_time                 | 783
analysis_time                         | 783
execution_time                        | 2559
create_time                           | 2020-12-09 02:13:06.140000 UTC
end_time                              | 2020-12-09 02:13:09.482000 UTC
accessed_metadata                     | [{"catalogName":"delta","schema":"sf1","table":"nation","columns":["name","regionkey"],...
failure_info                          | NULL
update_type                           | CREATE TABLE
planning_time                         | 1711
scheduled_time                        | 1263

The tables accessed by the query are recorded in the query_tables table and can be identified using the same query_id.

trino> SELECT *
        FROM sepquerylogger.public.query_tables
        WHERE query_id = '20201209_021306_00003_qsyab';
          query_id           | catalog_name | schema_name | table_name | physical_bytes | physical_rows | is_output
 20201209_021306_00003_qsyab | memory       | default     | country    |            597 |            25 | true
 20201209_021306_00003_qsyab | delta        | sf1         | region     |           1691 |             5 | false
 20201209_021306_00003_qsyab | delta        | sf1         | nation     |           3583 |            25 | false
(3 rows)