AI functions#
AI functions let you invoke large language models (LLMs) to perform a variety of text-based tasks or construct retrieval-augmented generation (RAG) workflows in your data applications. You can perform vector searches using embeddings generated from data stored in your data lake, and then use those search results as input to prompt an LLM.
Note
Starburst Enterprise AI functions are available as a private preview. Contact your Starburst account team for further information.
The following functions are available in the catalog configured for the
starburst_ai
connector (starburst
) under the ai
schema and use a
configured provider. Starburst Enterprise supports a number of
task, prompt, and
embedding specific functions. To use the following
functions, ensure you have configured an AI model and
provider.
Embedding functions#
The following function generates embeddings for input text using the given embedding model. Embeddings are numerical vector representations of text that can be used for semantic similarity and clustering tasks.
- ai.generate_embeddings(text, model)#
Generate the embedding for the given text.
SELECT ai.generate_embedding('Which chapter should I read to understand how to balance the weight of a Boeing 747?', 'bedrock_titan'); -- [ 0.0061195422895252705, 0.013783863745629787, ...]
Prompt functions#
The following function generates text or responses based on the input prompt using the given language model.
- ai.prompt(text, model) varchar #
Generates text based on the input prompt.
SELECT ai.prompt('What is the capital of the USA? Only provide the name of the capital.','bedrock_claude35'); -- Washington, D.C.
Task functions#
The following functions perform specific tasks such as sentiment analysis, classification, grammar correction, masking, and translation using the given language model.
- ai.analyze_sentiment(text, model) varchar #
Analyzes the sentiment of the input text.
The sentiment result is
positive
,negative
,neutral
, ormixed
.SELECT ai.analyze_sentiment('I love Starburst', 'bedrock_claude35'); -- positive
- ai.classify(text, labels, model) varchar #
Classifies the input text according to the provided labels.
SELECT ai.classify('Buy now!', ARRAY['spam', 'not spam'], 'bedrock_claude35'); -- spam
- ai.fix_grammar(text, model) varchar #
Corrects grammatical errors in the input text.
SELECT ai.fix_grammar('I are happy? What you doing.', 'bedrock_llama32-3b'); -- I am happy. What are you doing?
- ai.mask(text, labels, model) varchar #
Masks the values for the provided labels in the input text by replacing them with the text
[MASKED]
.SELECT ai.mask( 'Contact me at 555-1234 or visit us at 123 Main St.', ARRAY['phone', 'address'], 'openai_gpt-4o-mini'); -- Contact me at [MASKED] or visit us at [MASKED].
- ai.translate(text, language, model) varchar #
Translates the input text to the specified language.
SELECT ai.translate('I like coffee', 'es', 'openai_gpt-4o-mini'); -- Me gusta el café SELECT ai.translate('I like coffee', 'zh-TW', 'bedrock_claude35'); -- 我喜歡咖啡
Use cases#
You can combine vector search functions and Common Table Expressions (CTEs) to perform a RAG workflow with AI functions to enhance your results.
The following example uses a CTE named vector_search
and uses the vector
search function, cosine_similarity() to retrieve five
documents that are most similar and asks the LLM to select the best chapter.
The map_agg
function aggregates the results from the vector_search
CTE into
a key-value JSON object. The aggregated map is explicitly cast to JSON. Then the
prompt
function sends the question, JSON formatted object, and CTE results to
the LLM.
-- Retrieve the top 5 most relevant chapters based on semantic similarity
WITH vector_search AS(
SELECT
book_title,
chapter_number,
chapter_title,
chapter_intro,
cosine_similarity(
generate_embedding('Which chapter should I read to understand how to balance the weight of a Boeing 747?', 'bedrock_titan'),
chapter_intro_embeddings) AS similarity_score
FROM iceberg.example-schema.faa_book_chapters
ORDER BY similarity_score DESC
LIMIT 5
),
-- Augment the results by converting them into a JSON object
json_results AS (
SELECT CAST(map_agg(chapter_number, json_object(
key 'book title' VALUE book_title,
key 'chapter title' VALUE chapter_title,
key 'chapter intro' VALUE chapter_intro)) AS JSON) AS json_data
FROM
vector_search
)
--Generate an augmented response using the LLM
SELECT
prompt(concat(
'Which chapter should I read to understand how to balance the weight of a Boeing 747? Explain why. The list of chapters is provided in JSON:',
json_format(json_data)), 'bedrock_claude35')
FROM json_results;
The output of the query:
Based on the provided JSON, I recommend reading Chapter 9: "Weight and Balance
Control - Commuter Category and Large Aircraft" for understanding how to balance
the weight of a Boeing 747.
Here's why:
1. The Boeing 747 is a large aircraft that exceeds 12,500 pounds in takeoff
weight, which is specifically addressed in Chapter 9's introduction.
...