Configure Starburst AI#

To use the Starburst AI connector and the supported models and functions, configure external large language models (LLMs) or embeddings with a supported provider.

Note

Starburst Enterprise AI is available as a private preview. Contact your Starburst account team for further information.

Configuration#

To use the AI functions, configure a JSON file to register the connection information for specific models and providers and reference it in the catalog configuration file. This JSON file should contain the provider and model configuration properties detailed in the following sections.

For example, create a catalog named starburst that references the starburst_ai connector. Then create a JSON file, etc/ai-providers.json to configure the providers and models.

connector.name=starburst_ai
ai.models-file=etc/ai-providers.json

You can use an embedding or language model. The AI functions are available with the ai schema name. For the preceding example, the functions use the starburst.ai catalog and schema prefix.

To avoid needing to reference the functions with their fully qualified name, for example, starburst.ai.prompt(), configure the sql.path SQL environment property in the config.properties file to include the catalog prefix:

sql.path=starburst

After configuring the sql.path, you can simplify the reference to the function as ai.prompt().

Since a catalog can reference multiple models within the same JSON file, it is sufficient to configure a single catalog for AI functions.

In the ai schema, there are two tables that list the available embedding and language models, embedding_models and language_models, respectively.

Providers#

AI functions invoke an external LLM. Access to the LLM API must be configured in the catalog in a JSON file. Performance, output, and cost of all AI function invocations are dependent on the LLM provider and the model used. Choose a model that aligns with your specific use case and performance requirements. Starburst supports AWS Bedrock and OpenAI compatible APIs.

Model configuration properties#

Property name

Description

id

The identifier used to reference the model in an AI function. For example, "id": "bedrock_claude35".

kind

The type of model to use. Possible values are GENERATE or EMBED.

connectionInfo

The JSON object that contains the specific provider’s model connection properties. See the AWS Bedrock or OpenAI sections for specific connection properties.

modelName

The identifier used by the specific provider to identify the model. For example, "modelName": "anthropic.claude-3-5-haiku-20241022-v1:0".

AWS Bedrock#

The AWS Bedrock provider offers access to a suite of foundation models hosted on AWS. Integrating AWS Bedrock models requires configuring AWS credentials and specifying the models you wish to use for different AI functions. To use AWS Bedrock models, configure the necessary model and provider connection configuration properties.

Configuration#

The AWS Bedrock provider has the following connection configuration properties:

Connection configuration properties#

Property name

Description

provider

Required name of the provider. Must be AWS_BEDROCK when using AWS Bedrock.

awsAccessKey

AWS access key to use for authentication. Required when provider is set to AWS_BEDROCK and not using IAM authentication.

awsSecretKey

AWS secret key to use for authentication. Required when provider is set to AWS_BEDROCK and not using IAM authentication.

iamRole

The ARN of an IAM role to assume when connecting to AWS. If set, externalId must be configured as well.

externalId

External ID for the IAM role trust policy when connecting to AWS.

region

The AWS region. This property is optional.

To use IAM authentication instead of an access and secret key pair, add the following configuration properties to the JSON file:

iamRole=<role-arn>
externalId=<external-id>

Use secrets to avoid exposing API or secret key values in the JSON file.

To use a language model such as anthropic.claude-3-5-haiku-20241022-v1:0, add the following configuration properties to your JSON file. Replace the properties as appropriate for your setup:

{
  "id": "bedrock_claude35",
  "kind": "GENERATE",
  "connectionInfo": {
    "provider": "AWS_BEDROCK",
    "awsAccessKey": "AWS_ACCESS_KEY_ID",
    "awsSecretKey": "ENV:AWS_SECRET_ACCESS_KEY"
  },
  "modelName": "arn:aws:bedrock:us-east-2:123456789012:inference-profile/us.anthropic.claude-3-5-haiku-20241022-v1:0"
}

To use an embedding model such as amazon.titan-embed-text-v2:0, add the following configuration properties to your JSON file. Replace the properties as appropriate for your setup:

{
  "id": "aws_bedrock_titan",
  "kind": "EMBED",
  "connectionInfo": {
    "provider": "AWS_BEDROCK",
    "awsAccessKey": "AWS_ACCESS_KEY",
    "awsSecretKey": "AWS_SECRET_ACCESS_KEY",
    "region": "us-east-1"
  },
  "modelName": "amazon.titan-embed-text-v2:0"
}

The modelName property value can be a foundation model name or an inference profile.

AWS Bedrock supported models#

Model name

Description

amazon.titan-embed-text-v2:0

The AWS Bedrock Titan Text Embedding v2 embedded model.

cohere.embed-multilingual-v3

The AWS Bedrock Cohere embed multilingual model.

anthropic.claude-3-5-haiku-20241022-v1:0

The AWS Bedrock provided Claude3.5 LLM.

meta.llama3-2-3b-instruct-v1:0

The AWS Bedrock provided Llama 3.2 3B LLM.

For more information about AWS Bedrock’s providers and models, read the documentation.

OpenAI#

The OpenAI provider offers access to a number of language and embedding models. To use OpenAI models, configure the necessary API keys and select appropriate models for your tasks.

Configuration#

The OpenAI provider has the following connection configuration properties:

Connection configuration properties#

Property name

Description

provider

Required name of the provider. Must be OPENAI for OpenAI and providers offering an OpenAI compatible API.

endpoint

URL for the OpenAI API endpoint. This is optional and defaults to https://api.openai.com/v1. Specify the endpoint when using another provider’s OpenAI compatible API.

apiKey

API key value for OpenAI API access. Required when provider is set to OPENAI. This property is optional when using a compatible OpenAI API that does not require a key.

Use secrets to avoid exposing API or secret key values in the JSON file.

To use the gpt-4o-mini LLM, add the following to your JSON file.

{
 "id": "openai_small",
 "kind": "GENERATE",
 "modelName": "gpt-4o-mini",
 "connectionInfo": {
   "provider": "OPENAI",
   "endpoint": "https://api.openai.com/v1",
   "apiKey": "OPEN_AI_API_KEY"
  },
}
OpenAI supported models#

Model name

Description

text-embedding-3-small

The OpenAI text embedding 3 model.

text-embedding-3-large

The OpenAI text embedding 3 model.

gpt-4o-mini

The OpenAI GPT-4o mini. A compact version of the standard GPT-4o LLM.

Read more about Starburst AI functions to see a list of the supported functions and use cases.