Starburst AI Agent#

The Starburst AI Agent is a chatbot that helps you analyze data by converting natural language questions into SQL queries and analyzing their results. Starburst Agent can generate and execute SQL queries and provide metadata about available datasets.

Requirements#

To use AI Agent, you need:

  • A valid AI workflows license.

  • A valid Agentic layer license.

  • Access to at least one configured language AI model.

Considerations#

  • AI Agent sessions can be stored in coordinator memory or the Insights database. Each session supports one SQL query. The Agent can answer as many questions about the data produced by the query. The Agent can respond to both data-related and metadata-related questions. For example, you can ask questions such as Which columns are related to customer behavior?.

  • AI Agent is conversational. You can reference earlier questions and answers in the same session. For example, after asking which columns are related to customer behavior, you may follow up with, Show trends in customer spend over the last two years, grouping by the customer behavior dimensions identified in your previous answer.

  • Data product enrichment affects AI Agent’s response quality. For example, the more detailed a data product’s metadata is, the higher quality the agent’s response may be.

  • AI Agent can now use tools and run multiple queries as need to analyze data. The agent executes these steps automatically and does not request prior approval. You cannot edit a query produced by AI Agent.

Configuration#

To configure AI Agent, add the following property to your coordinator configuration file:

starburst.agent.enabled=true

General configuration properties#

The following table contains general configuration properties for the AI agent. Add relevant properties to the coordinator configuration file.

Property name

Description

Default

ai.agent.allowed-models-regex

Specifies which model id the AI agent can use, defined as regular expression patterns. By default, the AI agent allows all Claude 4, GPT-5, Gemini 3, Gemini 2.5, Opus, and Qwen 3 model groups:

  • (.*claude.*4.*)|(.*gpt-5.*)|(.*gemini-3.*)|(.*gemini-2.5.*)|(.*opus.*)|(.*qwen3.*)

Caution

This property bypasses model validation. Use it only to test custom or on-premises models when you understand and accept the risks of potentially unreliable agent behavior. Starburst values your feedback on any issues encountered with unvalidated models. For more information, See Model validation.

ai.agent.compaction.batch.max-messages

Specifies the maximum number of messages included in a single session compaction batch. The minimum value is 2.

20

ai.agent.compaction.batch.min-messages

Specifies the minimum number of messages required before session compaction is triggered. The minimum value is 1

4

ai.agent.compaction.batch.threshold-chars

Specifies the character count threshold that triggers session compaction. The minimum value is 1000.

10000

ai.agent.compaction.min-unsummarized-recent-messages

Specifies the number of most recent messages at the end of the conversation that are not summarized during session compaction. The minimum value is 1.

8

ai.agent.data-profile.enabled

Enables data profiling. When set to true, the agent collects sample rows and column statistics from data product datasets and embeds them into language model prompts during SQL generation. For more information, see Data profiling.

false

ai.agent.max-query-run-time

Sets the maximum duration a query generated by the AI agent is allowed to run. If the runtime exceeds the set limit, the query is canceled.

Warning

When this property is enabled, users must have permission to configure the query_max_run_time session property or the query fails.

ai.agent.max-result-set-size

Sets the maximum size of a result set that the agent can return. The agent truncates results that exceed the set limit. The possible value can range from 1KB to 2MB.

128KB

ai.agent.persona-directory-path

Specifies the path to a directory containing custom persona prompt JSON files. Each file must contain name, description, and prompt keys. For more information, read Custom persona prompts. This property is optional.

ai.agent.schema-link.candidate-timeout

Maximum wait time per SQL generation attmept. This applies to each model call individually. Set it high enough to accommodate your slowest configured model. The minimum value is 1s.

1m

ai.agent.schema-link.max-request-concurrency

Sets the maximum number of concurrent LLM requests allowed for SQL generation across all users and sessions. Set this high enough to avoid request queueing under expected load. When not set, concurrency is unbounded. The minimum value is 1.

ai.agent.schema-link.max-sample-queries

Specifies the maximum number of sample queries included in the SQL generation prompt. The minimum value is 1.

5

ai.agent.schema-link.model-ids

Comma-separated list of model IDs used for SQL generation. For each model, the agent generates the number of SQL expressions set by ai.agent.schema-link.samples-per-model. A diverse set of models can improve accuracy and reduce bias. When not set, the agent uses the session model.

ai.agent.schema-link.samples-per-model

Specifies the number of SQL expressions generated per model. The minimum value is 1.

3

ai.agent.session-compaction-enabled

Enables AI-based session compaction. When enabled, older segments of the conversation history are summarized to control LLM context size while preserving key information needed for context.

true

ai.agent.use-extracted-tool-call

Some LLMs do not format tool calls properly and return them inline as part of the assistant response. When this property is enabled, if the model returns inline tool calls in its response, the agent attempts to extract the tool calls from the response and to execute it. When this property is set to false, and an inline tool call is detected, the agent requests a new response from the model.

true

starburst.agent.enabled

Enables or disables the AI Agent. When disabled, it also disables AI features that rely on the AI Agent.

false

Model Validation#

Before deployment, all models are tested for their ability to follow complex instructions and use tools. This process has three possible outcomes:

  • Fully approved: Models that can follow complex instructions requiring tool use.

  • Conditionally available: Models that use tools with simple instructions but fail complex instruction tests. These appear with a yellow indicator in the agent model selector.

  • Unavailable: Models that fail both tests and cannot be used with the agent.

By default, the ai.agent.allowed-models-regex property uses patterns that match only fully approved and conditionally available models. You can configure this property to test custom or on-premis models that have not been validated. Unvalidated models may produce unreliable results. Use this property only when necessary and with the understanding that agent behavior may be unreliable. Starburst encourages you to provide feedback about any issues you encounter when using unvalidated models.

Data profiling#

Data profiling can improve natural-language-to-SQL accuracy. When enabled, the agent collects sample rows and column statistics from data product datasets and embeds that information into the language model prompts it uses to generate SQL. Sample rows are collected for both views and materialized views. Column statistics are collected for materialized views only.

To enable data profiling, set the following property in your coordinator configuration file:

ai.agent.data-profile.enabled

The following table details additional configuration properties for data profiling.

Data profiling configuration properties#

Property name

Description

Default

ai.agent.data-profile.enabled

Enables data profiling. When set to true, the agent collects sample rows and column statistics from data product datasets and embeds them into language model prompts during SQL generation.

false

ai.agent.data-profile.sample-row-limit

Specifies the maximum number of rows fetched per dataset when collecting sample data. The minimum value is 1.

10

ai.agent.data-profile.query-timeout

Sets the maximum duration allowed for each data profiling query. When a profiling query exceeds this limit, the agent cancels it and excludes that dataset’s data from the profile. The minimum value is 1s.

30s

ai.agent.data-profile.max-tables

Specifies the maximum number of datasets included in the data profile for a given data product. When not set, all datasets in the data product are profiled. The minimum value is 1.

ai.agent.data-profile.cache-expiry

Sets the duration a data profile is retained in memory after it was last accessed. The minimum value is 1s.

1h

ai.agent.data-profile.max-cached-sessions

Sets the maximum number of data profiles held in memory at one time. When the cache reaches this limit, the least recently accessed entries are removed. The minimum value is 100.

1000

Resource group governance#

All data profiling queries are tagged with the Trino client tag agent-data-profiler. You can use this tag in resource group rules to route and limit profiling queries separately from user queries.

The following example shows a selector that routes profiling queries into a dedicated resource group:

{
  "selectors": [
    {
      "clientTags": ["agent-data-profiler"],
      "group": "profiling"
    }
  ]
}

Using AI Agent#

To open the chat dialog and begin a session with AI Agent:

  1. Navigate to the Data products tab in the Starburst Enterprise web UI.

  2. Select an existing data product.

  3. Click AI Agent Icon
Sparkle AI Agent’s icon at the bottom-right of the screen.

  4. In the chat interface:

    • Use the left drop-down menu to select a persona.

    • Use the right drop-down menu to select an AI model. If only one model is configured, it is preselected.

  5. Enter a question or prompt in the text area.

  6. Press Enter or click the send submit button.

Session history#

Use session history to review past AI Agent responses, including how different personas and AI models affected the answers. Session history shows the steps the AI Agent performed, including any tool calls and SQL queries executed during the conversation.

chat Chats are located on the left of the AI Agent chat dialog. If chats are stored in coordinator memory, previous chats are not displayed.

Manage chats#

Each conversation appears in the chat Chats list.

  • To rename a chat, hover over it and click the more_vertoptions menu, then select Rename.

  • To delete a chat, hover over it and click more_vertoptions menu, then select delete.

  • Use the search search bar to find previous chats by name.

The following describes the icons used in the AI Agent chat dialog:

AI Agent chat dialog icons#

Icon

Description

content_copy

Copy the agent’s response to clipboard.

download_2

Download the agent’s response.

send

Submit a question.

close

Minimize the AI Agent chat dialog.

more_vert

Open the options menu.

add

Start a new chat.

search

Search bar.

edit

Rename a chat.

delete

Delete a chat.

Personas#

AI Agent supports three personas. Each persona tailors its responses to suit different user roles and goals. The following sections describe the Executive, Analyst, and Data engineer personas.

Executive#

Provides high-level summaries tailored to executives and decision-makers.

  • Focuses on business insights and trends

  • Omits technical detail unless explicitly requested

  • Presents concise bullet points for quick understanding

Analyst#

Offers detailed analytical summaries suitable for analysts and data scientists.

  • Includes statistical analysis and relationships in the data

  • Adds contextual information and potential implications

  • May include suggestions for further exploration

Data engineer#

Provides technical summaries tailored to engineers.

  • Focuses on structure, data quality, and metadata

  • Includes schema details, cardinality, and patterns

  • Highlights potential data issues or anomalies

Custom persona prompts#

You can customize the prompt that AI Agent uses for each persona by providing your own persona prompt files. This lets you tailor how each persona behaves, responds, and frames its output.

To configure custom persona prompts, set the following property in your coordinator configuration file:

ai.agent.persona-directory-path=/PATH/TO/PERSONA_PROMPTS

When this property is configured, the AI Agent loads persona definitions from the specified directory. If a persona file is not found in the directory, the AI Agent defaults to the corresponding built-in persona prompt.

The specified directory must contain JSON files for the personas you want to customize. The following file names are supported:

  • analyst.json

  • data_engineer.json

  • executive.json

Each JSON file requires the name, description, and prompt keys, and must use the following structure:

{
  "name": "Analyst",
  "description": "A description of the persona's role and purpose.",
  "prompt": "The system prompt that defines the persona's behavior."
}

To view the current prompt for the selected persona, including any custom overrides, click the rule icon next to the Persona drop-down menu in the AIDA UI.