AI functions#

AI functions let you invoke large language models (LLMs) to perform a variety of text-based tasks or construct retrieval-augmented generation (RAG) workflows in your data applications. You can perform vector searches using embeddings generated from data stored in your data lake, and then use those search results as input to prompt an LLM.

Note

Starburst Enterprise AI functions are available as a private preview. Contact your Starburst account team for further information.

The following functions are available in the catalog configured for the starburst_ai connector (starburst) under the ai schema and use a configured provider. Starburst Enterprise supports a number of task, prompt, and embedding specific functions. To use the following functions, ensure you have configured an AI model and provider.

Embedding functions#

The following function generates embeddings for input text using the given embedding model. Embeddings are numerical vector representations of text that can be used for semantic similarity and clustering tasks.

ai.generate_embeddings(text, model)#

Generate the embedding for the given text.

SELECT ai.generate_embedding('Which chapter should I read to understand how to balance the weight of a Boeing 747?', 'bedrock_titan');
-- [ 0.0061195422895252705, 0.013783863745629787, ...]

Prompt functions#

The following function generates text or responses based on the input prompt using the given language model.

ai.prompt(text, model) → varchar#

Generates text based on the input prompt.

SELECT ai.prompt('What is the capital of the USA? Only provide the name of the capital.','bedrock_claude35');
-- Washington, D.C.

Task functions#

The following functions perform specific tasks such as sentiment analysis, classification, grammar correction, masking, and translation using the given language model.

ai.analyze_sentiment(text, model) → varchar#

Analyzes the sentiment of the input text.

The sentiment result is positive, negative, neutral, or mixed.

SELECT ai.analyze_sentiment('I love Starburst', 'bedrock_claude35');
-- positive

ai.classify(text, labels, model) → varchar#

Classifies the input text according to the provided labels.

SELECT ai.classify('Buy now!', ARRAY['spam', 'not spam'], 'bedrock_claude35');
-- spam

ai.fix_grammar(text, model) → varchar#

Corrects grammatical errors in the input text.

SELECT ai.fix_grammar('I are happy? What you doing.', 'bedrock_llama32-3b');
-- I am happy. What are you doing?

ai.mask(text, labels, model) → varchar#

Masks the values for the provided labels in the input text by replacing them with the text [MASKED].

SELECT ai.mask(
    'Contact me at 555-1234 or visit us at 123 Main St.',
    ARRAY['phone', 'address'], 'openai_gpt-4o-mini');
-- Contact me at [MASKED] or visit us at [MASKED].

ai.translate(text, language, model) → varchar#

Translates the input text to the specified language.

SELECT ai.translate('I like coffee', 'es', 'openai_gpt-4o-mini');
-- Me gusta el café

SELECT ai.translate('I like coffee', 'zh-TW', 'bedrock_claude35');
-- 我喜歡咖啡

Use cases#

You can combine vector search functions and Common Table Expressions (CTEs) to perform a RAG workflow with AI functions to enhance your results.

The following example uses a CTE named vector_search and uses the vector search function, cosine_similarity() to retrieve five documents that are most similar and asks the LLM to select the best chapter.

The map_agg function aggregates the results from the vector_search CTE into a key-value JSON object. The aggregated map is explicitly cast to JSON. Then the prompt function sends the question, JSON formatted object, and CTE results to the LLM.

-- Retrieve the top 5 most relevant chapters based on semantic similarity
WITH vector_search AS(
    SELECT
        book_title,
        chapter_number,
        chapter_title,
        chapter_intro,
        cosine_similarity(
            generate_embedding('Which chapter should I read to understand how to balance the weight of a Boeing 747?', 'bedrock_titan'),
            chapter_intro_embeddings) AS similarity_score
    FROM iceberg.example-schema.faa_book_chapters
    ORDER BY similarity_score DESC
    LIMIT 5
),

-- Augment the results by converting them into a JSON object
json_results AS (
SELECT CAST(map_agg(chapter_number, json_object(
    key 'book title' VALUE book_title,
    key 'chapter title' VALUE chapter_title,
    key 'chapter intro' VALUE chapter_intro)) AS JSON) AS json_data
FROM
    vector_search
)

--Generate an augmented response using the LLM
SELECT
    prompt(concat(
        'Which chapter should I read to understand how to balance the weight of a Boeing 747? Explain why. The list of chapters is provided in JSON:',
        json_format(json_data)), 'bedrock_claude35')
FROM json_results;

The output of the query:

Based on the provided JSON, I recommend reading Chapter 9: "Weight and Balance
Control - Commuter Category and Large Aircraft" for understanding how to balance
the weight of a Boeing 747.

Here's why:

1. The Boeing 747 is a large aircraft that exceeds 12,500 pounds in takeoff
   weight, which is specifically addressed in Chapter 9's introduction.

...

Vector search functions are often used for comparing the similarity or distance between embedding vectors, which is common for search and retrieval tasks. These functions can assist in determining how closely related two pieces of data are.

Read more about the following vector search functions:

Prompt overrides#

Prompt overrides let you modify the system or function prompt for task-based functions, giving you control over how the language model behaves and responds. Instead of adjusting user messages, {sb} recommends you define the desired behavior in a system prompt, also known as a developer prompt than in the user message.

Example#

Suppose the default prompt for a classification function is verbose, and you want to classify an email as either "spam" or "not spam" In this example, the input value is the email content, such as "Buy now and save 50%!", and the classifiers refer to the available classifications, which are ["spam", "not spam"].

If the model outputs both a classification and an explanation, you can use a prompt override to refine the response.

Classify "Buy now and save 50%!" as one of these: ["spam", "not spam"]

You are a sentiment analysis expert. Please respond with "positive", "neutral", or "negative" only. The user will input the text to be analyzed and you will return the sentiment.

Add to new model#

You can add a prompt override for a supported function in the Advanced configuration options section of the Connect to an external LLM or embedding model configuration dialog.

Edit existing model#

In the Starburst Enterprise web UI, navigate to the AI models tab. There is a list or grid view of the available AI models. Follow these steps to edit an existing model and add a prompt override:

Click the options ︙ menu of the desired AI model.
Click Edit connection.
In the Advanced configuration options section, click Add prompt override.
Select the function you would like to add the override to, and specify the parameters. You can add overrides to multiple functions at the same time.
Click Save.

General system prompts#

General system prompts let you customize and improve the output of your language model by providing instructions or messages that define its behavior, role, tone, or context before any interaction takes place. These prompts act as a guiding framework for the system, shaping how it responds to your queries and conversations.

Example#

You can define multiple system prompts, each tailored to specific organizational requirements. For example, different departments can establish their own guardrails:

HR could add a prompt to ensure the model does not use or generate offensive language.
Security could add a prompt to prevent the model from revealing any personally identifiable information (PII).
Legal might define prompts to avoid generating content that could be interpreted as legal advice.

# HR prompt1
Never use or generate offensive or discriminatory language.

# Security prompt2
Do not reveal any personally identifiable information (PII), such as names, email addresses, or account numbers.

# Legal prompt3
Do not provide legal advice or interpret laws.

Add to new model#

You can add a general system prompt to a supported or compatible LLM in the Advanced configuration options section of the Connect to an external LLM or embedding model configuration dialog.

Edit existing model#

In the Starburst Enterprise web UI, navigate to the AI models tab. There is a list or grid view of the available AI models. Follow these steps to edit an existing model and add a prompt override:

Click the options ︙ menu of the desired AI model.
Click Edit connection.
In the Advanced configuration options section, click Add general system prompts.
Select the function you would like to add the override to, and specify the parameters. You can add overrides to multiple functions at the same time.
Click Save.