Starburst MCP server#
Starburst Enterprise includes an integrated Model Context Protocol (MCP) server that lets AI agents and large language model (LLM) applications securely query your Starburst cluster. MCP server on Starburst Enterprise exposes a stateless, authenticated HTTP endpoint for programmatic data access.
Requirements#
To use MCP server, you need:
A valid MCP server license.
Configuration#
Add the following property to your coordinator configuration file to enable the Starburst MCP server:
starburst.mcp.enabled=true
The MCP server is then available at the https://<coordinator-host>:<port>/mcp
endpoint URL.
General configuration properties#
The following general configuration properties must be added to the coordinator configuration file.
Property name |
Description |
Default |
|---|---|---|
|
Enable the MCP server on the coordinator. |
|
|
Sets the maximum duration that a query can run when
executed through the MCP |
|
|
Specifies the maximum size of result data returned to clients. You can set
this value between |
|
Fault-tolerant execution support#
The MCP server does not support custom retry-policy configurations for
clusters using fault-tolerant execution. Any existing retry policy
configurations are overridden. Policy overrides only affect queries that are run
using the MCP server.
Querying data#
The server provides a query tool that lets AI agents and LLM applications run read-only queries against your cluster and receive structured results in JSON format. Queries run synchronously. The agent waits for the query to finish before returning a response. Response times may be longer for large or complex queries.
Note
The result set size must be smaller than the value of
mcp.query.max-result-size. If the result exceeds this limit, the query fails
with an error.
Input format#
The following example shows a request used by an agent that retrieves five rows
from the tpch.tiny.nation table:
{
"query": "SELECT name, regionkey FROM tpch.tiny.nation ORDER BY name LIMIT 5"
}
The tool accepts the following input parameter:
Parameter |
Type |
Description |
|---|---|---|
|
|
The SQL query to execute. This is a required property. |
Response format#
The MCP server returns structured JSON similar to the following example,
continuing the query of the tpch.tiny.nation table.
{
"queryId": "1234abcdefg",
"columns": [
{"columnName": "name", "columnType": "varchar"}
{"columnName": "regionkey", "columnType": "bigint"}
],
"rows": [
["ALGERIA", "0"]
["ARGENTINA", "1"]
["BRAZIL", "1"]
]
}
Attribute |
Type |
Description |
|---|---|---|
|
|
Unique identifier for the executed query. |
|
|
List of column metadata objects. |
|
|
Name of column. |
|
|
Data type of the column. For example, |
|
|
Array of result rows. Each row is an array of stringified values. Null
values are represented as |
Unsupported SQL operations#
The query tool supports only read-only SQL queries. It rejects any query that contains the following operations:
INSERTUPDATEDELETEMERGETRUNCATEGRANTREVOKECREATE,ALTER, orDROPstatements for tables, schemas, or catalogs.
Limitations#
Result Size Limit: Query results are capped by the
mcp.query.max-result-sizeconfiguration property.queries returning results larger than this limit fail.
instruct agents to
LIMITclauses or filters to reduce result sizes for large datasets.
Queries follow standard Starburst query timeout configurations.
Queries are subject to resource management rules and terminate if they exceed configured limits.
Searching and retrieving data products#
The server provides data product search and retrieval tools that let AI agents and LLM applications search and investigate data products configured in your cluster. The server’s instructions also encourage LLMs to look for relevant data products if a user’s request is ambiguous or could be answered by a data product. These tools return structured results in JSON format.
Search Data Products#
The search tool lets agents perform keyword searches for data products based on
a search term. It can search for terms in the data product name, summary, and
description, as well as any views or materialized views included in the data
product. The tool’s name in the MCP server is searchDataProducts.
Input format#
The tool accepts the following input parameter:
Parameter |
Type |
Description |
|---|---|---|
|
|
The search term. Use |
The following example shows a request to perform a fuzzy search for data products related to the term “finance”:
{
"query": "finance"
}
Response format#
The tool returns the following output parameters:
Attribute |
Type |
Description |
|---|---|---|
|
|
Name of the data product. |
|
|
The data product ID. |
|
|
Name of the catalog. Omitted if not set in the data product. |
|
|
Name of the schema. Omitted if not set in the data product. |
|
|
Summary of the data product. |
|
|
Description of the data product. |
|
|
The user who created this data product. |
|
|
A timestamp of when the data product was created. |
|
|
A timestamp of when the data product was last updated. |
|
|
The status of the data product. May be one of these values: |
The MCP server returns structured JSON similar to the following example.
{
"dataProductName": "1234abcdefg",
"dataProductId": "550e8400-e29b-41d4-a716-446655440000",
"catalogName": "sample",
"schemaName": "mybankschema",
"summary": "Data product containing financial data for analysis",
"description": "This data product supports analytics related to finance.",
"createdBy": "dataproductadmin@example.com",
"createdAt": "2024-01-01T12:00:00Z",
"updatedAt": "2024-01-15T15:30:00Z",
"status": "PUBLISHED"
}
Retrieve Data Product Details#
The MCP server exposes a tool for retrieving all relevant details about a
specific data product, including all its views and materialized views. This tool
is useful for agents to see schemas, column information, and other metadata
about the data product so it can generate accurate SQL queries. The tool’s name
in the MCP server is getDataProductDetails.
Input format#
The tool accepts the following input parameter:
Parameter |
Type |
Description |
|---|---|---|
|
|
The data product ID to retrieve. This is a required property. |
To retrieve the data product details, the agent sends a request with the data product identifier. The following example shows a request submitted by an agent to retrieve details for a data product:
{
"dataProductId": "550e8400-e29b-41d4-a716-446655440000"
}
Response format#
The MCP server returns structured JSON similar to the following example:
{
"dataProductName": "1234abcdefg",
"dataProductId": "550e8400-e29b-41d4-a716-446655440000",
"catalogName": "sample",
"schemaName": "mybankschema",
"summary": "Data product containing financial data for analysis",
"description": "This data product supports analytics related to finance.",
"createdBy": "dataproductadmin@example.com",
"createdAt": "2024-01-01T12:00:00Z",
"updatedAt": "2024-01-15T15:30:00Z",
"synchronizedAt": "2024-01-20T10:00:00Z",
"status": "PUBLISHED",
"views": [
{
"name": "finance_view",
"description": "A view over the finance_table",
"createdBy": "dataproductadmin@example.com",
"status": "PUBLISHED",
"columns": [
{
"name": "id",
"type": "bigint",
"description": "Unique identifier for each record"
},
{
"name": "amount",
"type": "double",
"description": "Financial amount associated with the record"
},
{
"name": "date",
"type": "date",
"description": "Date of the financial record"
}
],
"markedForDeletion": "false",
"createdAt": "2024-01-10T09:00:00Z",
"updatedAt": "2024-01-15T15:30:00Z",
"updatedBy": "dataproductadmin@example.com",
"publishedAt": "2024-01-20T10:00:00Z",
"publishedBy": "dataproductadmin@example.com",
"matchesTrinoDefinition": true,
"definitionQuery": "SELECT * FROM sample.mybankschema.finance_table",
"viewSecurityModel": "INVOKER"
}
],
"materializedViews": [
{
"name": "finance_materialized_view",
"description": "A materialized view over the finance_table",
"createdBy": "dataproductadmin@example.com",
"status": "PUBLISHED",
"columns": [
{
"name": "id",
"type": "bigint",
"description": "Unique identifier for each record"
},
{
"name": "amount",
"type": "double",
"description": "Financial amount associated with the record"
},
{
"name": "date",
"type": "date",
"description": "Date of the financial record"
}
],
"markedForDeletion": "false",
"createdAt": "2024-01-10T09:00:00Z",
"updatedAt": "2024-01-15T15:30:00Z",
"updatedBy": "dataproductadmin@example.com",
"publishedAt": "2024-01-20T10:00:00Z",
"publishedBy": "dataproductadmin@example.com",
"matchesTrinoDefinition": true,
"definitionQuery": "SELECT * FROM sample.mybankschema.finance_table",
"definitionProperties": {
"refreshInterval": "24h"
}
}
]
}
Attribute |
Type |
Description |
|---|---|---|
|
|
Name of the data product. |
|
|
The data product ID. |
|
|
Name of the catalog. Omitted if not set in the data product. |
|
|
Name of the schema. Omitted if not set in the data product. |
|
|
Summary of the data product. |
|
|
Description of the data product. |
|
|
The user who created the data product. |
|
|
A timestamp of when the data product was created. |
|
|
A timestamp of when the data product was last updated. |
|
|
The status of the data product. May be one of these values: |
|
|
An optional timestamp of when the data product was last synchronized with the underlying data sources. |
|
|
A list of the views included in this data product, with details about each view such as name, description, columns, and status. |
|
|
A list of the materialized views included in this data product, with details about each materialized view such as name, description, columns, and status. |
Parameterized queries#
The server provides two parameterized query tools listParametrizedQueryTools
to check for existing pre-defined templates, and parametrizedQuery to execute
the queries. These tools let administrators define pre‑approved SQL templates in
YAML and allow AI agents to safely execute them by supplying only parameter
values. These tools provide a controlled way to expose business logic while
preventing SQL injection and ensuring parameter validation, making them useful
for governed, reusable analytics workflows rather than arbitrary ad‑hoc SQL
written by the AI agent.
Configuration#
To enable the parameterized query tools, add the following properties to your coordinator configuration file:
mcp.parametrized-tools.enabled=true
mcp.parametrized-tools.definition-paths=/etc/trino/parametrized-tools/
Multiple tool definition paths can be specified using a comma‑separated list of
paths to tool definition files and/or directories. Files must use the .yaml or
.yml extension. All tools are loaded and validated at server startup. Every
change or definition added requires a restart.
A parameterized tool is defined by a YAML file that specifies the following properties:
name– Unique identifier of the tool. This name is used by AI agents to select and invoke the tool.description– A natural‑language description for AI agentstables(optional) – Static table resources used by the tool’s queries. Each entry defines a logical table name. These resources are referenced fromtemplateLogic. For example,{{ customers }}instead of hard‑coding physical table names.parameters– Typed input parameters for query behavior. Each parameter declares a logical type, SQL type, optional default value, and validation constraints.templateLogic– A Pebble SQL template that produces the final rendered parameterized SQL statement at runtime. See Pebble Template syntax for more information and syntax.
name: tool_name # Unique identifier for the tool
description: Tool description # Human-readable description for AI agents
tables: # Optional: static table references
- name: alias_name
tableName: catalog.schema.table
parameters: # List of parameters
- name: param_name
type: CONSTANT # Parameter type (see below)
description: Parameter description
required: true # Whether the parameter is required
sqlType: VARCHAR # Required for CONSTANT and CONSTANT_LIST
defaultValue: default # Required for optional parameters
constraints: # Optional: validation constraints
min: 0
max: 100
templateLogic: | # Pebble template for the SQL query
SELECT * FROM {{ alias_name }}
WHERE column = {{ param_name }}
At runtime, an AI agent calls a parameterized tool by name and supplies parameters as JSON. The server validates the input, resolves schemas, safely renders SQL, enforces that the final statement is read‑only, and then executes it.
For complete usage examples, see Parameterized query tools usage examples.
Static table resources#
Static table resources configure the tables and relationships that the tool’s
queries use. They are defined once in the YAML definition and reused by
templateLogic, instead of being rebuilt inside each template. The tableName
property value must be fully qualified as catalog.schema.table. Quoted
identifiers are supported for names containing special characters. For example:
tables:
- name: customers
tableName: tpch.tiny.customer
- name: orders
tableName: tpch.tiny.orders
templateLogic: |
SELECT c.*, o.totalprice
FROM {{ customers }} c
JOIN {{ orders }} o ON c.custkey = o.custkey
Table resource names must not conflict with parameter names. This is validated at startup.
Parameter types#
Each parameter has a parameter type (type:) that defines how it is
rendered into SQL and what validations apply.
Parameter type |
Description |
|---|---|
|
A single scalar value (string, number, boolean) injected as a bind variable
( |
|
List of scalar values of the same type, each becoming its own bind variable
( |
|
Reference to a column name, auto‑quoted for safety. Can optionally specify a
|
|
Reference to a fully qualified table name, auto‑quoted as
|
|
Sub‑query that must be a valid |
|
SQL function name rendered as‑is. Only allows valid unquoted SQL identifiers
(alphanumeric characters and underscores). Can be constrained with
|
|
SQL alias for columns, CTEs, or other identifiers in output position. Must be a valid unquoted SQL identifier (alphanumeric characters and underscores). Rendered as‑is, not quoted. Used to control output column names safely. |
|
Raw SQL expression rendered as‑is after parser validation. Must parse as a
valid SQL expression. For example, |
|
A composite parameter containing named, typed fields. Rendered as a Map in
the Pebble context, accessed as |
|
Composite parameter containing named, typed fields; rendered as a map ( |
|
List of composite parameters with the same schema; iterable with |
The following parameters show how CONSTANT and CONSTANT_LIST values are used
in a template and rendered to bind variables:
parameters:
- name: min_balance
type: CONSTANT
description: Minimum account balance filter
sqlType: DOUBLE
required: true
constraints:
min: 0
- name: segments
type: CONSTANT_LIST
description: Market segments to include in analysis
sqlType: VARCHAR
required: true
constraints:
minItems: 1
maxItems: 5
allowedValues: ["BUILDING", "AUTOMOBILE", "MACHINERY", "HOUSEHOLD", "FURNITURE"]
templateLogic: |
SELECT *
FROM tpch.tiny.customer
WHERE mktsegment IN ({{ segments }})
AND acctbal >= {{ min_balance }}
In the example, min_balance is a CONSTANT with sqlType: DOUBLE and a
minimum value of 0. segments is a CONSTANT_LIST of VARCHAR values,
with constraints on the number of items and allowedValues.
Invoked input:
{
"min_balance": 1000,
"segments": ["BUILDING", "AUTOMOBILE"]
}
Rendered SQL:
EXECUTE IMMEDIATE '
SELECT *
FROM tpch.tiny.customer
WHERE mktsegment IN (?, ?)
AND acctbal >= ?
' USING 'BUILDING', 'AUTOMOBILE', 1000
The following details supported sqlType values for the CONSTANT and
CONSTANT_LIST parameter types.
sqlType |
Java type |
SQL literal |
Notes |
|---|---|---|---|
|
|
|
|
|
|
|
Any type containing |
|
|
|
Binary floating point |
|
|
|
Exact decimal precision |
|
|
|
|
|
|
|
Format: |
|
|
|
Format: |
Note
Use DECIMAL instead of DOUBLE when exact precision is required. For example,
monetary values. DOUBLE uses binary floating point which cannot represent
values like 0.1. Decimals are compiled to DECIMAL 'value' literals. Do not
explicitly add the DECIMAL keyword in your template to avoid the invalid sql
DECIMAL DECIMAL 'value'.
Optional parameters#
Optional parameters, required: false, must have a defaultValue. This
requirement is validated at startup and ensures templates are simpler and more
predictable.
Instead of using complex template conditionals, implement optional filtering using the following SQL patterns:
LIKEwith wildcard default. Use%as a wildcard to disable filtering when no specific value is provided:
parameters:
- name: region_filter
type: CONSTANT
description: Filter by region (use % for all regions)
sqlType: VARCHAR
required: false
defaultValue: "%"
constraints:
allowedValues: ["AMERICA", "EUROPE", "ASIA", "AFRICA", "%"]
templateLogic: |
SELECT * FROM customers
WHERE region LIKE {{ region_filter }}
Numeric range with extreme defaults. Use extreme minimum and maximum values to disable range filtering when the default is not overridden:
parameters:
- name: min_balance
type: CONSTANT
description: Minimum balance filter (0 for no minimum)
sqlType: DOUBLE
required: false
defaultValue: 0
constraints:
min: 0
- name: max_balance
type: CONSTANT
description: Maximum balance filter (use large value for no maximum)
sqlType: DOUBLE
required: false
defaultValue: 999999999999
constraints:
min: 0
templateLogic: |
SELECT * FROM accounts
WHERE balance >= {{ min_balance }}
AND balance <= {{ max_balance }}
INclause with “all” option. Use a default that includes all allowed values until it’s manually narrowed:
parameters:
- name: status_filter
type: CONSTANT_LIST
description: Filter by status (include all statuses for no filter)
sqlType: VARCHAR
required: false
defaultValue: ["PENDING", "ACTIVE", "COMPLETED", "CANCELLED"]
constraints:
minItems: 1
allowedValues: ["PENDING", "ACTIVE", "COMPLETED", "CANCELLED"]
templateLogic: |
SELECT * FROM orders
WHERE status IN ({{ status_filter }})
Field types#
Field types describe the types of the named fields inside STRUCT and
STRUCT_LIST parameters. They are used to define and validate each field within
these composite parameters, and they mirror the parameter type system so that
fields use the same kinds of types as top‑level parameters.
Field types do not support nesting or subqueries: STRUCT, STRUCT_LIST, and
QUERY are not available as field types, so fields cannot contain nested
structs or subquery fields.
Field type |
Description |
Requires |
|---|---|---|
|
Bind variable placeholder. |
|
|
Comma-separated bind variable placeholders. |
|
|
Auto-quoted identifier, validated against parent schema. |
|
|
Auto-quoted qualified name, validated via information_schema. |
|
|
Valid SQL identifier (alphanumeric and underscores). |
|
|
Valid SQL identifier (alphanumeric and underscores); rendered as-is. |
|
|
SQL-parser-validated expression, rendered as-is. |
Constraints#
Constraints validate parameter values before query execution. The system
enforces that constraints are only applied to compatible parameter types. Using
a constraint on the wrong type causes a startup error. Define constraints in an
optional constraints block under parameters. The tools offer the following
types of constraints.
Value constraints#
Value constraints restrict what values are allowed for a parameter or field. They apply to many parameter and field types, including scalar constants, list elements, identifiers, and some reference types.
Constraint |
Applies to parameter type |
Description |
|---|---|---|
|
|
List of allowed values. The value must match one of the listed values. For
example, |
|
|
Minimum allowed numeric value. |
|
|
Maximum allowed numeric value. |
|
|
Regular expression that the value must match. |
|
|
Maximum number of characters allowed in the value. For |
Collection constraints#
Collection constraints apply to list types and control the size of the collection, regardless of the element type.
Constraint |
Applies to parameter type |
Description |
|---|---|---|
|
|
Minimum number of items required in the list. |
|
|
Maximum number of items allowed in the list. |
Schema constraints#
The requiredColumns constraint applies to table and query types and ensures
that required columns are available. This is useful when your template uses
SELECT * or references specific columns from a dynamic source.
If the provided table or query doesn’t include the required columns, validation fails with an error message that lists the missing columns.
Constraint |
Applies to parameter type |
Description |
|---|---|---|
|
|
List of column names that must exist in the referenced table or query result. |
Function signature constraints#
Function signature constraints apply to function reference types and validate the function’s shape against the database’s function catalog.
Constraint |
Applies to parameter type |
Description |
|---|---|---|
|
|
Expected SQL return type for the function. |
|
|
Expected SQL types for the function’s arguments as a list. For example,
|
|
|
Expected function category. Must be |
Function signature constraints are validated at runtime using SHOW FUNCTIONS LIKE 'name'. If the function has multiple overloads, validation passes if at
least one overload matches all specified constraints. Matching is
case-insensitive.
When validation fails, the error message includes all available overloads so that you can see what signatures exist, for example:
Function 'sum' for parameter 'agg_func' does not match constraints: returnType=varchar.
Available overloads: (bigint) -> bigint [aggregate], (double) -> double [aggregate], ...
Pebble Template syntax#
These tools use templateLogic which uses the
Pebble Template engine for SQL generation. Pebble
supports variables, conditionals, loops, macros, and filters.
Variables#
Use double braces {{ ... }} to insert parameter values and static table
resources into the SQL template:
{{ param_name }} -- top-level parameter
{{ struct.field_name }} -- struct field
{{ item.field_name }} -- inside a for loop
Conditionals#
Use {% if %} blocks to include or omit parts of the query based on parameter
values:
{% if sample_limit > 0 %}
LIMIT {{ sample_limit }}
{% endif %}
Loops#
Use {% for %} loops to iterate over STRUCT_LIST parameters:
{% for item in populations %}
SELECT {{ item.label }} AS population FROM {{ item.source_table }}
{% if not loop.last %}UNION ALL{% endif %}
{% endfor %}
The loop variable provides loop.index, a 0-based iteration index. The
loop.first variable means it’s true on first iteration. The loop.last means
true on last iteration, and loop.length is the total number of items.
Filters#
Pebble supports filters (value | filter) for simple, inline transformations.
{{ func_name | lower }} -- lowercase a string
{{ func_name | upper }} -- uppercase a string
Macros#
Use {% macro %} to define reusable SQL fragments and call them from
templateLogic:
{% macro pct(condition, alias) %}
CAST(SUM(CASE WHEN {{ condition }} THEN 1 ELSE 0 END) AS DECIMAL(38,5))
/ COUNT(id) * 100.0 AS {{ alias }}
{% endmacro %}
SELECT
population,
{{ pct("status = 'active'", "pct_active") }},
{{ pct("status = 'inactive'", "pct_inactive") }}
FROM base
GROUP BY population
Macros help keep complex expressions in one place and avoid repeating long
CASE or aggregate expressions across the template. See
Customer profile analysis with STRUCT_LIST and Pebble macros.
Troubleshooting#
Pebble resolves dotted attribute access left-to-right. When evaluating {{ pop.source_table }}, it first resolves pop to a map, then looks up
source_table as an attribute of that map. If the lookup fails, the error
message only reports the immediate attribute name and the object type, not the
full expression path. For example:
Attribute [source_table] of [com.google.common.collect.RegularImmutableMap] does not exist
This means source_table was not found as a key in the struct map. The
RegularImmutableMap type indicates a struct field lookup, a top-level variable
miss would reference the root scope instead.
Considerations#
The parameterized query tools are designed to make tool definitions safe by default. However, there are some considerations to read.
EXPRESSIONparameter type values are validated by the Trino SQL parser to ensure they are syntactically valid SQL expressions, but they are not restricted to a whitelist. UseCONSTANT,FUNCTION_REF, orALIAStypes where possible for tighter control.Query-Only Enforcement: The final rendered SQL must parse as a valid SELECT/query statement. DDL, DML, and other non-query statements are rejected by the QueryVerifier stage.
Bind Marker Protection: The rendering pipeline uses internal marker tokens that are replaced with
?after template rendering. If a parameter value contains the marker pattern, it is rejected to prevent bind variable confusion.Schema Validation: Column and table references are validated against actual database schemas. Function references can be validated against the catalog’s function registry.
Starburst AI Agent via MCP#
This tool exposes the Starburst AI Agent through MCP. It lets an authenticated MCP client open a stateful conversational AI agent session within a specific data product and interact with that agent using a configured model alias.
To use the AI Agent over MCP, you need:
Data products enabled.
Access to at least one configured language AI model. The agent uses the model alias when invoked through MCP.
Configuration#
Add the following properties to your coordinator configuration file.
Property name |
Description |
Default |
|---|---|---|
|
Sets the maximum duration a request waits to start if another request from the same client session is already running for the same data product. If the request does not start within this time, it fails. |
|
|
Specifies the AI model alias used when invoking
the data product AI agent through MCP. For example, |
The agent runs with the identity configured in the MCP connection. All actions the agent performs are subject to the privileges and access controls of that identity.
Session history#
A data product agent session through MCP is created the first time a client sends a request during an MCP session for a specific data product. The session remains associated with that MCP session and data product for the lifetime of the MCP connection. To start a new session for the same client and data product, the MCP client must reconnect and establish a new MCP session.
Agent sessions initiated through MCP appear in the data product agent session history. However, MCP sessions are read-only in the UI.
Concurrent requests that target the same agent session are serialized so that
only one request runs at a time. Additional requests are queued and wait up to
the duration configured by
mcp.tool.data-product-agent.concurrent-request-timeout. If a request does not
start within this time, it fails.