Starburst MCP server#

Starburst Enterprise includes an integrated Model Context Protocol (MCP) server that lets AI agents and large language model (LLM) applications securely query your Starburst cluster. MCP server on Starburst Enterprise exposes a stateless, authenticated HTTP endpoint for programmatic data access.

Requirements#

To use MCP server, you need:

  • A valid MCP server license.

Configuration#

Add the following property to your coordinator configuration file to enable the Starburst MCP server:

starburst.mcp.enabled=true

The MCP server is then available at the https://<coordinator-host>:<port>/mcp endpoint URL.

General configuration properties#

The following general configuration properties must be added to the coordinator configuration file.

Property name

Description

Default

starburst.mcp.enabled

Enable the MCP server on the coordinator.

false

mcp.query.max-query-execution-time

Sets the maximum duration that a query can run when executed through the MCP queryReadOnly endpoint. The minimum duration is 1s.

5m

mcp.query.max-result-size

Specifies the maximum size of result data returned to clients. You can set this value between 64kB and 8GB. Queries that return results larger than the configured limit fail. It is recommended to keep this value low for agent use cases.

1MB

Fault-tolerant execution support#

The MCP server does not support custom retry-policy configurations for clusters using fault-tolerant execution. Any existing retry policy configurations are overridden. Policy overrides only affect queries that are run using the MCP server.

Querying data#

The server provides a query tool that lets AI agents and LLM applications run read-only queries against your cluster and receive structured results in JSON format. Queries run synchronously. The agent waits for the query to finish before returning a response. Response times may be longer for large or complex queries.

Note

The result set size must be smaller than the value of mcp.query.max-result-size. If the result exceeds this limit, the query fails with an error.

Input format#

The following example shows a request used by an agent that retrieves five rows from the tpch.tiny.nation table:

{
  "query": "SELECT name, regionkey FROM tpch.tiny.nation ORDER BY name LIMIT 5"
}

The tool accepts the following input parameter:

Input parameter#

Parameter

Type

Description

query

string

The SQL query to execute. This is a required property.

Response format#

The MCP server returns structured JSON similar to the following example, continuing the query of the tpch.tiny.nation table.

{
  "queryId": "1234abcdefg",
  "columns": [
    {"columnName": "name", "columnType": "varchar"}
    {"columnName": "regionkey", "columnType": "bigint"}
  ],
  "rows": [
    ["ALGERIA", "0"]
    ["ARGENTINA", "1"]
    ["BRAZIL", "1"]
  ]
}
Query returned attributes#

Attribute

Type

Description

queryId

string

Unique identifier for the executed query.

columns

array

List of column metadata objects.

columnName

string

Name of column.

columnType

string

Data type of the column. For example, "VARCHAR" or "BIGINT"

rows

array

Array of result rows. Each row is an array of stringified values. Null values are represented as "null".

Unsupported SQL operations#

The query tool supports only read-only SQL queries. It rejects any query that contains the following operations:

  • INSERT

  • UPDATE

  • DELETE

  • MERGE

  • TRUNCATE

  • GRANT

  • REVOKE

  • CREATE, ALTER, or DROP statements for tables, schemas, or catalogs.

Limitations#

  1. Result Size Limit: Query results are capped by the mcp.query.max-result-size configuration property.

    • queries returning results larger than this limit fail.

    • instruct agents to LIMIT clauses or filters to reduce result sizes for large datasets.

  2. Queries follow standard Starburst query timeout configurations.

  3. Queries are subject to resource management rules and terminate if they exceed configured limits.

Searching and retrieving data products#

The server provides data product search and retrieval tools that let AI agents and LLM applications search and investigate data products configured in your cluster. The server’s instructions also encourage LLMs to look for relevant data products if a user’s request is ambiguous or could be answered by a data product. These tools return structured results in JSON format.

Search Data Products#

The search tool lets agents perform keyword searches for data products based on a search term. It can search for terms in the data product name, summary, and description, as well as any views or materialized views included in the data product. The tool’s name in the MCP server is searchDataProducts.

Input format#

The tool accepts the following input parameter:

Input parameter#

Parameter

Type

Description

query

string

The search term. Use * to return all data products. This is a required property.

The following example shows a request to perform a fuzzy search for data products related to the term “finance”:

{
  "query": "finance"
}

Response format#

The tool returns the following output parameters:

Returned attributes#

Attribute

Type

Description

dataProductName

string

Name of the data product.

dataProductId

string

The data product ID.

catalogName

string

Name of the catalog. Omitted if not set in the data product.

schemaName

string

Name of the schema. Omitted if not set in the data product.

summary

string

Summary of the data product.

description

string

Description of the data product.

createdBy

string

The user who created this data product.

createdAt

string

A timestamp of when the data product was created.

updatedAt

string

A timestamp of when the data product was last updated.

status

string

The status of the data product. May be one of these values: DRAFT, PUBLISHED, or PENDING_CHANGES.

The MCP server returns structured JSON similar to the following example.

{
  "dataProductName": "1234abcdefg",
  "dataProductId": "550e8400-e29b-41d4-a716-446655440000",
  "catalogName": "sample",
  "schemaName": "mybankschema",
  "summary": "Data product containing financial data for analysis",
  "description": "This data product supports analytics related to finance.",
  "createdBy": "dataproductadmin@example.com",
  "createdAt": "2024-01-01T12:00:00Z",
  "updatedAt": "2024-01-15T15:30:00Z",
  "status": "PUBLISHED"
}

Retrieve Data Product Details#

The MCP server exposes a tool for retrieving all relevant details about a specific data product, including all its views and materialized views. This tool is useful for agents to see schemas, column information, and other metadata about the data product so it can generate accurate SQL queries. The tool’s name in the MCP server is getDataProductDetails.

Input format#

The tool accepts the following input parameter:

Input parameter#

Parameter

Type

Description

dataProductId

string

The data product ID to retrieve. This is a required property.

To retrieve the data product details, the agent sends a request with the data product identifier. The following example shows a request submitted by an agent to retrieve details for a data product:

{
  "dataProductId": "550e8400-e29b-41d4-a716-446655440000"
}

Response format#

The MCP server returns structured JSON similar to the following example:

{
  "dataProductName": "1234abcdefg",
  "dataProductId": "550e8400-e29b-41d4-a716-446655440000",
  "catalogName": "sample",
  "schemaName": "mybankschema",
  "summary": "Data product containing financial data for analysis",
  "description": "This data product supports analytics related to finance.",
  "createdBy": "dataproductadmin@example.com",
  "createdAt": "2024-01-01T12:00:00Z",
  "updatedAt": "2024-01-15T15:30:00Z",
  "synchronizedAt": "2024-01-20T10:00:00Z",
  "status": "PUBLISHED",
  "views": [
    {
      "name": "finance_view",
      "description": "A view over the finance_table",
      "createdBy": "dataproductadmin@example.com",
      "status": "PUBLISHED",
      "columns": [
        {
          "name": "id",
          "type": "bigint",
          "description": "Unique identifier for each record"
        },
        {
          "name": "amount",
          "type": "double",
          "description": "Financial amount associated with the record"
        },
        {
          "name": "date",
          "type": "date",
          "description": "Date of the financial record"
        }
      ],
      "markedForDeletion": "false",
      "createdAt": "2024-01-10T09:00:00Z",
      "updatedAt": "2024-01-15T15:30:00Z",
      "updatedBy": "dataproductadmin@example.com",
      "publishedAt": "2024-01-20T10:00:00Z",
      "publishedBy": "dataproductadmin@example.com",
      "matchesTrinoDefinition": true,
      "definitionQuery": "SELECT * FROM sample.mybankschema.finance_table",
      "viewSecurityModel": "INVOKER"
    }
  ],
  "materializedViews": [
    {
      "name": "finance_materialized_view",
      "description": "A materialized view over the finance_table",
      "createdBy": "dataproductadmin@example.com",
      "status": "PUBLISHED",
      "columns": [
        {
          "name": "id",
          "type": "bigint",
          "description": "Unique identifier for each record"
        },
        {
          "name": "amount",
          "type": "double",
          "description": "Financial amount associated with the record"
        },
        {
          "name": "date",
          "type": "date",
          "description": "Date of the financial record"
        }
      ],
      "markedForDeletion": "false",
      "createdAt": "2024-01-10T09:00:00Z",
      "updatedAt": "2024-01-15T15:30:00Z",
      "updatedBy": "dataproductadmin@example.com",
      "publishedAt": "2024-01-20T10:00:00Z",
      "publishedBy": "dataproductadmin@example.com",
      "matchesTrinoDefinition": true,
      "definitionQuery": "SELECT * FROM sample.mybankschema.finance_table",
      "definitionProperties": {
        "refreshInterval": "24h"
      }
    }
  ]
}
Returned attributes#

Attribute

Type

Description

dataProductName

string

Name of the data product.

dataProductId

string

The data product ID.

catalogName

string

Name of the catalog. Omitted if not set in the data product.

schemaName

string

Name of the schema. Omitted if not set in the data product.

summary

string

Summary of the data product.

description

string

Description of the data product.

createdBy

string

The user who created the data product.

createdAt

string

A timestamp of when the data product was created.

updatedAt

string

A timestamp of when the data product was last updated.

status

string

The status of the data product. May be one of these values: DRAFT, PUBLISHED, or PENDING_CHANGES.

synchronizedAt

string

An optional timestamp of when the data product was last synchronized with the underlying data sources.

views

array

A list of the views included in this data product, with details about each view such as name, description, columns, and status.

materializedViews

array

A list of the materialized views included in this data product, with details about each materialized view such as name, description, columns, and status.

Parameterized queries#

The server provides two parameterized query tools listParametrizedQueryTools to check for existing pre-defined templates, and parametrizedQuery to execute the queries. These tools let administrators define pre‑approved SQL templates in YAML and allow AI agents to safely execute them by supplying only parameter values. These tools provide a controlled way to expose business logic while preventing SQL injection and ensuring parameter validation, making them useful for governed, reusable analytics workflows rather than arbitrary ad‑hoc SQL written by the AI agent.

Configuration#

To enable the parameterized query tools, add the following properties to your coordinator configuration file:

mcp.parametrized-tools.enabled=true
mcp.parametrized-tools.definition-paths=/etc/trino/parametrized-tools/

Multiple tool definition paths can be specified using a comma‑separated list of paths to tool definition files and/or directories. Files must use the .yaml or .yml extension. All tools are loaded and validated at server startup. Every change or definition added requires a restart.

A parameterized tool is defined by a YAML file that specifies the following properties:

  • name – Unique identifier of the tool. This name is used by AI agents to select and invoke the tool.

  • description – A natural‑language description for AI agents

  • tables (optional) – Static table resources used by the tool’s queries. Each entry defines a logical table name. These resources are referenced from templateLogic. For example, {{ customers }} instead of hard‑coding physical table names.

  • parametersTyped input parameters for query behavior. Each parameter declares a logical type, SQL type, optional default value, and validation constraints.

  • templateLogic – A Pebble SQL template that produces the final rendered parameterized SQL statement at runtime. See Pebble Template syntax for more information and syntax.

name: tool_name                    # Unique identifier for the tool
description: Tool description      # Human-readable description for AI agents
tables:                            # Optional: static table references
  - name: alias_name
    tableName: catalog.schema.table
parameters:                        # List of parameters
  - name: param_name
    type: CONSTANT                 # Parameter type (see below)
    description: Parameter description
    required: true                 # Whether the parameter is required
    sqlType: VARCHAR               # Required for CONSTANT and CONSTANT_LIST
    defaultValue: default          # Required for optional parameters
    constraints:                   # Optional: validation constraints
      min: 0
      max: 100
templateLogic: |                   # Pebble template for the SQL query
  SELECT * FROM {{ alias_name }}
  WHERE column = {{ param_name }}

At runtime, an AI agent calls a parameterized tool by name and supplies parameters as JSON. The server validates the input, resolves schemas, safely renders SQL, enforces that the final statement is read‑only, and then executes it.

For complete usage examples, see Parameterized query tools usage examples.

Static table resources#

Static table resources configure the tables and relationships that the tool’s queries use. They are defined once in the YAML definition and reused by templateLogic, instead of being rebuilt inside each template. The tableName property value must be fully qualified as catalog.schema.table. Quoted identifiers are supported for names containing special characters. For example:

tables:
  - name: customers
    tableName: tpch.tiny.customer
  - name: orders
    tableName: tpch.tiny.orders

templateLogic: |
  SELECT c.*, o.totalprice
  FROM {{ customers }} c
  JOIN {{ orders }} o ON c.custkey = o.custkey

Table resource names must not conflict with parameter names. This is validated at startup.

Parameter types#

Each parameter has a parameter type (type:) that defines how it is rendered into SQL and what validations apply.

Parameter types#

Parameter type

Description

CONSTANT

A single scalar value (string, number, boolean) injected as a bind variable (?). Requires sqlType and required fields, and validated according to sqlType and value constraints such as min, max, pattern, and allowedValues whitelist.

CONSTANT_LIST

List of scalar values of the same type, each becoming its own bind variable (?, ?, ?). Typically used in IN clauses. Requires sqlType and required fields, and supports collection constraints minItems and maxItems, as well as value constraints on each element.

COLUMN_REF

Reference to a column name, auto‑quoted for safety. Can optionally specify a parent table or query to validate that the column exists. Validated against resolved schema metadata.

TABLE_REF

Reference to a fully qualified table name, auto‑quoted as "catalog"."schema"."table". Input must be a 3‑part name catalog.schema.table. Validated via information_schema.

QUERY

Sub‑query that must be a valid SELECT statement wrapped in parentheses. Validated by the SQL parser and can be constrained with the schema constraint, requiredColumns.

FUNCTION_REF

SQL function name rendered as‑is. Only allows valid unquoted SQL identifiers (alphanumeric characters and underscores). Can be constrained with allowedValues and function signature constraints returnType, argumentTypes, functionType validated via SHOW FUNCTIONS LIKE 'name'. These verify that the function exists in the catalog and matches the expected signature at runtime. For more information, see Function signature constraints.

ALIAS

SQL alias for columns, CTEs, or other identifiers in output position. Must be a valid unquoted SQL identifier (alphanumeric characters and underscores). Rendered as‑is, not quoted. Used to control output column names safely.

EXPRESSION

Raw SQL expression rendered as‑is after parser validation. Must parse as a valid SQL expression. For example, CASE expressions, function calls, or arithmetic. For simple values, prefer CONSTANT or CONSTANT_LIST. See also Considerations.

OPERATOR

A composite parameter containing named, typed fields. Rendered as a Map in the Pebble context, accessed as {{ param.field }}. Requires a fields definition specifying the struct schema.

STRUCT

Composite parameter containing named, typed fields; rendered as a map ({{ struct.field }}) in the Pebble template context. Requires a fields definition. Each field uses the field typessystem. All fields are implicitly required. Useful for grouping related parameters such as table, column, and cutoff date. For a usage example, see Parameterized query tools usage examples.

STRUCT_LIST

List of composite parameters with the same schema; iterable with {% for item in param %}. Primary mechanism for building repeated SQL patterns. For example, UNION ALL of multiple populations or repeated metric blocks. Each item is a STRUCT; constraints such as minItems or maxItems can be applied. Requires a fields definition. See Field types, and for a usage example, see Parameterized query tools usage examples.

The following parameters show how CONSTANT and CONSTANT_LIST values are used in a template and rendered to bind variables:

parameters:
  - name: min_balance
    type: CONSTANT
    description: Minimum account balance filter
    sqlType: DOUBLE
    required: true
    constraints:
      min: 0

  - name: segments
    type: CONSTANT_LIST
    description: Market segments to include in analysis
    sqlType: VARCHAR
    required: true
    constraints:
      minItems: 1
      maxItems: 5
      allowedValues: ["BUILDING", "AUTOMOBILE", "MACHINERY", "HOUSEHOLD", "FURNITURE"]

templateLogic: |
  SELECT *
  FROM tpch.tiny.customer
  WHERE mktsegment IN ({{ segments }})
    AND acctbal >= {{ min_balance }}

In the example, min_balance is a CONSTANT with sqlType: DOUBLE and a minimum value of 0. segments is a CONSTANT_LIST of VARCHAR values, with constraints on the number of items and allowedValues.

Invoked input:

{
  "min_balance": 1000,
  "segments": ["BUILDING", "AUTOMOBILE"]
}

Rendered SQL:

EXECUTE IMMEDIATE '
  SELECT *
  FROM tpch.tiny.customer
  WHERE mktsegment IN (?, ?)
    AND acctbal >= ?
' USING 'BUILDING', 'AUTOMOBILE', 1000

The following details supported sqlType values for the CONSTANT and CONSTANT_LIST parameter types.

Supported sqlType values for CONSTANT and CONSTANT_LIST#

sqlType

Java type

SQL literal

Notes

VARCHAR, CHAR

String

'value'

INTEGER, BIGINT, SMALLINT, TINYINT

Long

123

Any type containing INT

DOUBLE, REAL, FLOAT

Double

1.5

Binary floating point

DECIMAL, NUMERIC

BigDecimal

DECIMAL '1.5'

Exact decimal precision

BOOLEAN

Boolean

TRUE / FALSE

DATE

DateValue

DATE '2026-01-15'

Format: YYYY-MM-DD

TIMESTAMP

TimestampValue

TIMESTAMP '2026-01-15 10:30:00'

Format: YYYY-MM-DD HH:MM:SS[.fraction]

Note

Use DECIMAL instead of DOUBLE when exact precision is required. For example, monetary values. DOUBLE uses binary floating point which cannot represent values like 0.1. Decimals are compiled to DECIMAL 'value' literals. Do not explicitly add the DECIMAL keyword in your template to avoid the invalid sql DECIMAL DECIMAL 'value'.

Optional parameters#

Optional parameters, required: false, must have a defaultValue. This requirement is validated at startup and ensures templates are simpler and more predictable.

Instead of using complex template conditionals, implement optional filtering using the following SQL patterns:

  1. LIKE with wildcard default. Use % as a wildcard to disable filtering when no specific value is provided:

parameters:
  - name: region_filter
    type: CONSTANT
    description: Filter by region (use % for all regions)
    sqlType: VARCHAR
    required: false
    defaultValue: "%"
    constraints:
      allowedValues: ["AMERICA", "EUROPE", "ASIA", "AFRICA", "%"]

templateLogic: |
  SELECT * FROM customers
  WHERE region LIKE {{ region_filter }}
  1. Numeric range with extreme defaults. Use extreme minimum and maximum values to disable range filtering when the default is not overridden:

parameters:
  - name: min_balance
    type: CONSTANT
    description: Minimum balance filter (0 for no minimum)
    sqlType: DOUBLE
    required: false
    defaultValue: 0
    constraints:
      min: 0

  - name: max_balance
    type: CONSTANT
    description: Maximum balance filter (use large value for no maximum)
    sqlType: DOUBLE
    required: false
    defaultValue: 999999999999
    constraints:
      min: 0

templateLogic: |
  SELECT * FROM accounts
  WHERE balance >= {{ min_balance }}
    AND balance <= {{ max_balance }}
  1. IN clause with “all” option. Use a default that includes all allowed values until it’s manually narrowed:

parameters:
  - name: status_filter
    type: CONSTANT_LIST
    description: Filter by status (include all statuses for no filter)
    sqlType: VARCHAR
    required: false
    defaultValue: ["PENDING", "ACTIVE", "COMPLETED", "CANCELLED"]
    constraints:
      minItems: 1
      allowedValues: ["PENDING", "ACTIVE", "COMPLETED", "CANCELLED"]

templateLogic: |
  SELECT * FROM orders
  WHERE status IN ({{ status_filter }})

Field types#

Field types describe the types of the named fields inside STRUCT and STRUCT_LIST parameters. They are used to define and validate each field within these composite parameters, and they mirror the parameter type system so that fields use the same kinds of types as top‑level parameters.

Field types do not support nesting or subqueries: STRUCT, STRUCT_LIST, and QUERY are not available as field types, so fields cannot contain nested structs or subquery fields.

Supported field types#

Field type

Description

Requires

CONSTANT

Bind variable placeholder.

sqlType

CONSTANT_LIST

Comma-separated bind variable placeholders.

sqlType

COLUMN_REF

Auto-quoted identifier, validated against parent schema.

parent

TABLE_REF

Auto-quoted qualified name, validated via information_schema.

FUNCTION_REF

Valid SQL identifier (alphanumeric and underscores).

ALIAS

Valid SQL identifier (alphanumeric and underscores); rendered as-is.

EXPRESSION

SQL-parser-validated expression, rendered as-is.

Constraints#

Constraints validate parameter values before query execution. The system enforces that constraints are only applied to compatible parameter types. Using a constraint on the wrong type causes a startup error. Define constraints in an optional constraints block under parameters. The tools offer the following types of constraints.

Value constraints#

Value constraints restrict what values are allowed for a parameter or field. They apply to many parameter and field types, including scalar constants, list elements, identifiers, and some reference types.

Constraint

Applies to parameter type

Description

allowedValues

CONSTANT, CONSTANT_LIST, COLUMN_REF, TABLE_REF, FUNCTION_REF, ALIAS, EXPRESSION, OPERATOR

List of allowed values. The value must match one of the listed values. For example, 'Retail' and 'retail' are different values.

min

CONSTANT, CONSTANT_LIST

Minimum allowed numeric value.

max

CONSTANT, CONSTANT_LIST

Maximum allowed numeric value.

pattern

CONSTANT, CONSTANT_LIST, COLUMN_REF, TABLE_REF, FUNCTION_REF, ALIAS, EXPRESSION

Regular expression that the value must match. TABLE_REF matches the raw input string before identifier quoting. For example, the dotted catalog.schema.table format.

maxLength

CONSTANT, CONSTANT_LIST, ALIAS, EXPRESSION, QUERY

Maximum number of characters allowed in the value. For CONSTANT_LIST, maxLength applies per element.

Collection constraints#

Collection constraints apply to list types and control the size of the collection, regardless of the element type.

Constraint

Applies to parameter type

Description

minItems

CONSTANT_LIST, STRUCT_LIST

Minimum number of items required in the list.

maxItems

CONSTANT_LIST, STRUCT_LIST

Maximum number of items allowed in the list.

Schema constraints#

The requiredColumns constraint applies to table and query types and ensures that required columns are available. This is useful when your template uses SELECT * or references specific columns from a dynamic source.

If the provided table or query doesn’t include the required columns, validation fails with an error message that lists the missing columns.

Constraint

Applies to parameter type

Description

requiredColumns

TABLE_REF, QUERY

List of column names that must exist in the referenced table or query result.

Function signature constraints#

Function signature constraints apply to function reference types and validate the function’s shape against the database’s function catalog.

Constraint

Applies to parameter type

Description

returnType

FUNCTION_REF

Expected SQL return type for the function.

argumentTypes

FUNCTION_REF

Expected SQL types for the function’s arguments as a list. For example, [bigint], [array(t), t].

functionType

FUNCTION_REF

Expected function category. Must be scalar or aggregate.

Function signature constraints are validated at runtime using SHOW FUNCTIONS LIKE 'name'. If the function has multiple overloads, validation passes if at least one overload matches all specified constraints. Matching is case-insensitive.

When validation fails, the error message includes all available overloads so that you can see what signatures exist, for example:

Function 'sum' for parameter 'agg_func' does not match constraints: returnType=varchar.
Available overloads: (bigint) -> bigint [aggregate], (double) -> double [aggregate], ...

Pebble Template syntax#

These tools use templateLogic which uses the Pebble Template engine for SQL generation. Pebble supports variables, conditionals, loops, macros, and filters.

Variables#

Use double braces {{ ... }} to insert parameter values and static table resources into the SQL template:

{{ param_name }}           -- top-level parameter
{{ struct.field_name }}    -- struct field
{{ item.field_name }}      -- inside a for loop

Conditionals#

Use {% if %} blocks to include or omit parts of the query based on parameter values:

{% if sample_limit > 0 %}
LIMIT {{ sample_limit }}
{% endif %}

Loops#

Use {% for %} loops to iterate over STRUCT_LIST parameters:

{% for item in populations %}
SELECT {{ item.label }} AS population FROM {{ item.source_table }}
{% if not loop.last %}UNION ALL{% endif %}
{% endfor %}

The loop variable provides loop.index, a 0-based iteration index. The loop.first variable means it’s true on first iteration. The loop.last means true on last iteration, and loop.length is the total number of items.

Filters#

Pebble supports filters (value | filter) for simple, inline transformations.

{{ func_name | lower }}    -- lowercase a string
{{ func_name | upper }}    -- uppercase a string

Macros#

Use {% macro %} to define reusable SQL fragments and call them from templateLogic:

{% macro pct(condition, alias) %}
CAST(SUM(CASE WHEN {{ condition }} THEN 1 ELSE 0 END) AS DECIMAL(38,5))
  / COUNT(id) * 100.0 AS {{ alias }}
{% endmacro %}

SELECT
  population,
  {{ pct("status = 'active'", "pct_active") }},
  {{ pct("status = 'inactive'", "pct_inactive") }}
FROM base
GROUP BY population

Macros help keep complex expressions in one place and avoid repeating long CASE or aggregate expressions across the template. See Customer profile analysis with STRUCT_LIST and Pebble macros.

Troubleshooting#

Pebble resolves dotted attribute access left-to-right. When evaluating {{ pop.source_table }}, it first resolves pop to a map, then looks up source_table as an attribute of that map. If the lookup fails, the error message only reports the immediate attribute name and the object type, not the full expression path. For example:

Attribute [source_table] of [com.google.common.collect.RegularImmutableMap] does not exist

This means source_table was not found as a key in the struct map. The RegularImmutableMap type indicates a struct field lookup, a top-level variable miss would reference the root scope instead.

Considerations#

The parameterized query tools are designed to make tool definitions safe by default. However, there are some considerations to read.

  • EXPRESSION parameter type values are validated by the Trino SQL parser to ensure they are syntactically valid SQL expressions, but they are not restricted to a whitelist. Use CONSTANT, FUNCTION_REF, or ALIAS types where possible for tighter control.

  • Query-Only Enforcement: The final rendered SQL must parse as a valid SELECT/query statement. DDL, DML, and other non-query statements are rejected by the QueryVerifier stage.

  • Bind Marker Protection: The rendering pipeline uses internal marker tokens that are replaced with ? after template rendering. If a parameter value contains the marker pattern, it is rejected to prevent bind variable confusion.

  • Schema Validation: Column and table references are validated against actual database schemas. Function references can be validated against the catalog’s function registry.

Starburst AI Agent via MCP#

This tool exposes the Starburst AI Agent through MCP. It lets an authenticated MCP client open a stateful conversational AI agent session within a specific data product and interact with that agent using a configured model alias.

To use the AI Agent over MCP, you need:

  • Data products enabled.

  • Access to at least one configured language AI model. The agent uses the model alias when invoked through MCP.

Configuration#

Add the following properties to your coordinator configuration file.

Property name

Description

Default

mcp.tool.data-product-agent.concurrent-request-timeout

Sets the maximum duration a request waits to start if another request from the same client session is already running for the same data product. If the request does not start within this time, it fails.

45s

mcp.tool.data-product-agent.model-id

Specifies the AI model alias used when invoking the data product AI agent through MCP. For example, claude-sonnet-4.5.

The agent runs with the identity configured in the MCP connection. All actions the agent performs are subject to the privileges and access controls of that identity.

Session history#

A data product agent session through MCP is created the first time a client sends a request during an MCP session for a specific data product. The session remains associated with that MCP session and data product for the lifetime of the MCP connection. To start a new session for the same client and data product, the MCP client must reconnect and establish a new MCP session.

Agent sessions initiated through MCP appear in the data product agent session history. However, MCP sessions are read-only in the UI.

Concurrent requests that target the same agent session are serialized so that only one request runs at a time. Additional requests are queued and wait up to the duration configured by mcp.tool.data-product-agent.concurrent-request-timeout. If a request does not start within this time, it fails.

Authentication and authorization types#

The MCP server requires authentication for all requests and supports the same authentication methods as running queries. Authentication uses your existing SEP configuration. For more information about authentication types, see authentication types.

Clients must provide authentication credentials in standard HTTP headers when making requests to the MCP server. The MCP server supports all headers defined in the Starburst Client Protocol specification.

Authentication configuration properties#

The following authentication and authorization configuration properties should be added to the coordinator configuration file.

Property name

Description

Default

mcp.auth.scopes-supported

Comma-separated list of OAuth scopes to advertise MCP clients. For example: mcp.auth.scopes-supported=openid,email,profile This property is optional.

Defaults to the scopes in http-server.authentication.oauth2.scopes.

mcp.auth.servers.override

Override the authorization server URLs advertised in the OAuth metadata resource endpoint. This property is optional.

mcp.auth.custom.metadata

Add additional custom metadata entries to the OAuth Protected Resource Metadata response. For example: mcp.auth.custom.metadata=custom_property:value1,another_claim:value2.

mcp.cors.allowed-origins

Configures Cross-Origin Resource Sharing (CORS) for the OAuth metadata endpoint. For example: mcp.cors.allowed-origins=https://app.example.com,https://dashboard.example.com.

OAuth scopes#

Specify the OAuth and OpenID Connect scopes supported by the MCP server. This must be a subset of the supported scopes defined in the http-server.authentication.oauth2.scopes parameter.

mcp.auth.scopes-supported=openid,email,profile

This configuration appears in the OAuth Protected Resource Metadata endpoint, https://<coordinator-host>:<port>/.well-known/oauth-protected-resource, and informs MCP clients which scopes to request when obtaining access tokens.

OAuth metadata endpoint#

When OAuth 2.0 http-server.authentication.type=oauth2 or delegated OAuth http-server.authentication.type=delegated-oauth2 is enabled, the coordinator exposes an RFC 8414-compliant OAuth Protected Resource Metadata endpoint:

GET https://<coordinator-host>:<port>/.well-known/oauth-protected-resource

This endpoint provides the following metadata required by OAuth clients for automatic configuration with the MCP server:

  • resource: The base URL of the Starburst MCP Server

  • resource_name: “Starburst MCP”

  • authorization_servers: List of authorization server URLs from either OAuth configuration or override.

  • jwks_uri: JSON Web Key Set endpoint for token verification

  • scopes_supported: List of supported OAuth scopes

Custom OAuth metadata#

You can add custom metadata entries to the OAuth Protected Resource Metadata response.

The following example defines two additional metadata fields registration_endpoint and okta_api:

mcp.auth.custom.metadata=registration_endpoint:https://<auth_server>/register,okta_api_key=<api_key>

Two more metadata fields are included in the protected resources response. The protected resource metadata endpoint is publicly available. Users who have read access to your SEP cluster can view this metadata.

Authorization servers override#

Override the list of authorization servers advertised in the OAuth metadata resource endpoint.

mcp.auth.servers.override=https://auth.example.com,https://backup-auth.example.com

If unset, the authorization server displayed in the protected resource metadata endpoint defaults to the issuer URL from http-server.authentication.oauth2.issuer.

Dynamic client registration#

MCP can interoperate with authorization servers that support OAuth 2.1 Dynamic Client Registration protocol (DCR), allowing MCP clients to register automatically.

The MCP spec on authorization recommends supporting the OAuth DCR to dynamically generate client IDs and secrets for MCP sessions. DCR allows MCP clients to register automatically with your OAuth authorization server without manual configuration. DCR support depends on your authorization server, not on Starburst Enterprise. Starburst does not support management of dynamic clients with any authorization servers.

Configuration#

If your authorization server supports DCR, follow these steps to enable it:

  1. Enable DCR on your authorization server, if not enabled by default.

  2. Configure registration endpoint: Ensure your OAuth metadata includes the registration_endpoint field.

  3. Set appropriate scopes: Configure which scopes are available for dynamically registered clients.

  4. MCP Client Configuration: MCP clients can discover the registration endpoint from the authorization server metadata and automatically register.