Enable Starburst Warp Speed for your cluster#

Starburst Warp Speed transparently adds an indexing and caching layer to enable higher performance. You can take advantage of the performance improvements by updating your cluster to suitable hardware and configuring the Starburst Warp Speed utility connector for any catalog accessing object storage with the Hive, Iceberg, or Delta Lake connector. A cluster deployment on Amazon Elastic Kubernetes Service (EKS), Microsoft Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE) is required.

Requirements#

To use Starburst Warp Speed, you need:

One or more catalogs that use the Hive, Iceberg, or Delta Lake connectors.
A cluster deployment on Amazon Elastic Kubernetes Service (EKS), Microsoft Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE) as detailed in Cluster configuration.
A valid Starburst Enterprise license.

Supported object storage#

Starburst Warp Speed only supports the following object storage:

Amazon S3
Google Cloud Storage (GCS)
Microsoft ADLS Gen 2

No other other object storage systems, including on-premises storage, S3-compatible storage such as MinIO, and any others are supported.

Note

Starburst strongly recommends using a native file system implementation rather than the default Hadoop file system implementation for any catalogs with Starburst Warp Speed.

Platform-specific requirements#

Starburst Warp Speed requires your cluster to operate on a Kubernetes-based platform. Specifically the Plan your Kubernetes deployment apply.

In addition, Starburst Warp Speed requires specific nodes in terms of CPU and memory. The most important additional requirement is that sufficiently performant and sized Non-Volatile Memory Express (NVMe) solid-state drive (SSD) storage is available on all nodes, and exclusively used by SEP.

The following specific details apply for the supported platforms:

EKS#

Required node sizes:

m7gd.4xlarge or larger
r7gd.4xlarge or larger
m6id.4xlarge or larger
m6idn.4xlarge or larger
m6gd.4xlarge or larger
r5d.4xlarge or larger
r5dn.4xlarge or larger
r6gd.4xlarge or larger
i3.4xlarge or larger

Create an EKS Managed Node Group with the desired size. Use this node group for all nodes in the cluster. For Kubernetes version 1.30, use the latest release of eksctl.

For information on specific deployment scenarios based on your privileges and node types, see the Cluster configuration section.

AKS#

Required node sizes:

Standard_L16s_v2 or larger Lsv2 series Azure VMs like Standard_L32s_v2 with SSDs attached. Lsv2-series VM SSDs are not encrypted by Azure Storage encryption. We strongly recommend Standard_L16s_v3 or larger Lsv3 series VMs.
Standard_Dpdsv6 (ARM-based) with premium SSD v2. Starburst recommends these ARM-based VMs for the best price-performance ratio for running Starburst Warp Speed in Azure environments. Premium SSD v2 is the only storage type that supports both encryption at rest and ephemeral storage.
In the Starburst Warp Speed section of values.yaml, set the following configuration:

warpSpeed:
  image:
    tag: "a.b.c-azure"

GKE#

Required node sizes:

n2-highmem-16 or larger with a minimum of two local NVME SSDs attached
n2d-standard-16 or larger
C4A Axion (ARM-based) with Titanium SSD. Starburst recommends these ARM-based VMs for the best price-performance ratio for running Starburst Warp Speed in Google Cloud environments.
GKE version 1.25.3-gke.1800 or higher

Attach the SSDs during node pool creation, and use the pool for the cluster creation.

Caution

Use the gcloud CLI, not Google Cloud console, for node pool creation. Using the Google console UI creates incompatible disk types.

Use the ephemeral-storage-local-ssd gcloud CLI command to provision local SSDs on the cluster. Select an even number of workers.

All platforms#

Configuration considerations:

The task.max-worker-threads task property can not be changed with Starburst Warp Speed so it must be left at the default value.
SEP clusters with a vCPU count of 16 or fewer per worker node support a maximum of four Starburst Warp Speed catalogs. Clusters with larger worker nodes support a maximum of seven Starburst Warp Speed catalogs.

Warning

Starburst Warp Speed is not supported on a cluster running in Fault-tolerant execution mode.

Deployment and management is performed with the SEP Helm charts detailed in Deploy with Kubernetes and Configuring Starburst Enterprise in Kubernetes.

Add the following section in your values file for the specific cluster to enable Starburst Warp Speed:

warpSpeed:
  enabled: true

Starburst Warp Speed uses a filesystem on top of the underlying storage in order to not require privileged mode. Add the following filesystem and image configuration to your values file:

warpSpeed:
  enabled: false
    # Physical drive configuration requires the Warp Speed init container within
    # a worker pod to be privileged.
    # When configuration is done, the container's lifecycle ends.
    # If autoConfigure is not enabled, devices must be manually configured
    # before SEP pods are started. For example, during the node machine
    # bootstrap process (preBootstrapCommands script in EKS deployments).
    # This functionality is available for AKS, EKS, and GKE only.
    # For AWS, use default setting.
    # For Azure and GKE deployments, or S3 deployments using the native
    # filesystem implementation, set autoConfigure to true.
  autoConfigure: true
    # Additional percentage of container memory reduced from heap size assigned to Java, must be less than 100
  additionalHeapSizePercentage: 15
  fileSystem:
    # The path for the mount point used to mount the local SSDs. Value differs
    # between clouds:
    # AWS, Azure - Defined by the user
    # Google Cloud - Must be set to /mnt/stateful_partition/kube-ephemeral-ssd/data
    localStorageMountPath: /opt/data/<subdirectory>
  image:
    # Image that prepares the filesystem for Starburst Warp Speed.
    # Due to system limitations, this must be done by an init container running
    # in privileged mode.
    repository: "harbor.starburstdata.net/starburstdata/starburst-warpspeed-init"
    # Tag value differs between clouds:
    # AWS, Google Cloud - Use default tag
    # Azure - Append the "azure" suffix, for example "1.0.0-azure"
    # Update this tag to the latest available version of the Starburst Warp
    # Speed init container
    tag: "1.0.16"
    pullPolicy: "IfNotPresent"

You need to ensure that you use a dedicated coordinator that is not scheduled for query processing, and adjust the query processing configuration to allow for more splits:

coordinator:
  additionalProperties: |
    ...
    node-scheduler.include-coordinator=false
    node-scheduler.max-splits-per-node=4096

Use Helm to update the values and restart the cluster nodes. Confirm the cluster is operating correctly with the new configuration, but without any adjusted catalogs, and then proceed to configure catalogs.

Related to the catalog usage, the cluster needs to allow internal communication between all workers, as well as with the coordinator on all the HTTP ports configured by the different values for http-rest-port in all catalogs.

When starting the cluster, Starburst Warp Speed parses all configuration parameters and can send invalid warnings such as Configuration property 'cache-service.password' was not used. You can safely ignore these warnings.

Cluster configuration#

The following sections outline the step-by-step process for configuring and deploying Starburst Warp Speed on your Kubernetes cluster.

AWS deployment scenarios#

You can deploy Starburst Warp Speed on AWS using three different approaches, depending on your privileges and node configuration.

Scenario 1: Full privileges deployment#

If your privileges let you execute pre-boot scripts, configure the NVMe SSDs before you start the cluster. This allows optimal performance and control over your storage configuration.

Pre-deployment configuration#

Add the following preBootstrapCommands section to your EKS managed node group configuration:

preBootstrapCommands:
  "yum install -y mdadm"
  "sysctl -w fs.aio-max-nr=8388608 >> /etc/sysctl.conf"
  'devices=""; for device in $(ls /sys/block/); do if [[ $(grep -e "Amazon EC2 NVMe Instance Storage" -e "ec2-nvme-instance" /sys/block/${device}/device/subsysnqn -c 2> /dev/null) -gt 0 ]]; then devices="${devices} /dev/${device}"; fi; done; echo ${devices} > /tmp/devices'
  "mdadm --create /dev/md0 $(cat /tmp/devices) --level=0 --force --raid-devices=$(cat /tmp/devices | wc -w)"
  "mkfs.ext4 /dev/md0 -O ^has_journal"
  "mkdir -p /opt/data"
  "mount /dev/md0 /opt/data"
  "chmod 777 -R /opt/data"

Note

The aio-max-nr parameter specifies the maximum number of asynchronous I/O (AIO) events the system allows. Set this value to 8388608 to support high-performance workloads.

Helm configuration#

In your values.yaml file, set the following values:

warpSpeed:
  enabled: true
  autoConfigure: false  # Pre-boot script handles configuration
  fileSystem:
    localStorageMountPath: /opt/data
  image:
    repository: "harbor.starburstdata.net/starburstdata/starburst-warpspeed-init"
    tag: "1.0.17"
    pullPolicy: "IfNotPresent"

Deployment#

Use the following command to deploy Starburst Warp Speed using Helm:

helm upgrade --install sep ./helm/sep --values values.yaml

Scenario 2: Limited privileges deployment#

If your privileges do not let you execute pre-boot scripts, configure NVMe storage for Starburst Warp Speed using a specialized container image. This lets you manage storage configuration from within the Kubernetes environment.

This deployment scenario uses a container image with the -awsprivileged suffix that has the permissions to automatically configure the NVMe drives.

Helm configuration#

In your values.yaml file, set the following values:

warpSpeed:
  image:
    tag: "1.0.17-awsprivileged"
  autoConfigure: true

Deployment#

Use the following command to deploy Starburst Warp Speed using Helm:

helm upgrade --install sep ./helm/sep --values values.yaml

Scenario 3: Bottlerocket OS deployment#

If you are using AWS Bottlerocket nodes, configure the disk setup through the Bottlerocket bootstrap container system. This lets you prepare NVMe storage for Starburst Warp Speed in Bottlerocket’s container-optimized environment.

EC2 user data configuration#

Add the following to your Amazon EC2 user data:

[settings.bootstrap-containers.disk-setup]
essential = true
mode = "once"
source = "http://harbor.starburstdata.net/starburstdata/starburst-warpspeed-init:1.0.17-bottlerocket"

If necessary, include your Harbor credentials:

[[settings.container-registry.credentials]]
registry = "harbor.starburstdata.net"
username = "some_user"
password = "a_password"

Note

You must have Harbor registry credentials to access the Starburst Warp Speed init container images.

Helm configuration#

In your values.yaml file, set the following values:

warpSpeed:
  autoConfigure: false
  fileSystem:
    localStorageMountPath: /mnt/opt/data

Deployment#

Use the following command to deploy Starburst Warp Speed using Helm:

helm upgrade --install sep ./helm/sep --values values.yaml

Verification#

Verify your configuration with the following steps.

NVMe drives#

Verify that Starburst Warp Speed properly detects your NVMe drives:

ls /sys/block/
cat /sys/block/<device>/device/subsysnqn

If the output does not contain “Amazon EC2 NVMe Instance Storage” or “ec2-nvme-instance”, update the disk filter list in your Helm configuration:

warpSpeed:
  fileSystem:
    diskFilterStringList:
      - "custom-nvme-identifier"

Storage mount#

Check the storage:

kubectl exec -it <worker-pod-name> -- df -ah

Look for a mount at /opt/data (or /mnt/opt/data for Bottlerocket) with significant storage space available.

Starburst Warp Speed status#

Query the Starburst Warp Speed warming status:

curl -X GET 'https://sep.example.com/ext/<catalog-name>/warming/status' \
  -H 'Accept: application/json'

A successful response shows warming status across nodes.

Common warnings#

Safely ignore warnings like Configuration property 'cache-service.password' was not used during startup.

Catalog configuration#

After a successful Cluster configuration, you can configure the desired catalogs to use Starburst Warp Speed.

Only catalogs using the Hive, Iceberg, or Delta Lake connectors can be accelerated:

connector.name=hive
connector.name=iceberg
connector.name=delta_lake

For more details, see Delta Lake considerations, Iceberg considerations, and Hive considerations.

Only catalogs backed by S3, GCS, and ADLS object storage are supported. For more details, see S3 considerations, GCS considerations, and ADLS considerations.

Update the example catalog that uses the Hive connector with AWS Glue in the values file.

catalogs:
  example: |
    connector.name=hive
    hive.metastore=glue
    ...

Enable Starburst Warp Speed on the catalog by updating the connector name to warp_speed and adding the required configuration properties:

catalogs:
  example: |
    connector.name=warp_speed
    warp-speed.proxied-connector=hive
    warp-speed.cluster-uuid=example-cluster-567891234567
    # Do not configure the following property if you are using a native
    # filesystem implementation.
    warp-speed.config.internal-communication.shared-secret=aLongSecretString
    hive.metastore=glue
    ...

The properties setting the connector name, the proxied connector and the cluster identifier are required.

The shared secret must be set to the same value as the secret for the cluster itself set in sharedSecret:. This is required unless the REST API is disabled.

For testing purposes, or alternatively for permanent usage of a new catalog name, such as faster, in parallel to the existing catalog, you can copy the configuration of a catalog and update it:

catalogs:
  example: |
    connector.name=hive
    hive.metastore=glue
    ...
  faster: |
    connector.name=warp_speed
    warp-speed.proxied-connector=hive
    warp-speed.cluster-uuid=example-cluster-567891234567
    # Do not configure the following property if you are using a native
    # filesystem implementation.
    warp-speed.config.internal-communication.shared-secret=aLongSecretString
    hive.metastore=glue
    ...

This allows you to query the same data with or without Starburst Warp Speed using different catalog names. However, existing scripts and statements that include the old catalog name example are not accelerated.

Catalog configuration properties#

The following table provides more information about the available catalog configuration properties:

Catalog configuration properties#
Property name	Description
`connector.name`	Required. Must be set to `warp_speed`.
`warp-speed.proxied-connector`	Required. The type of embedded connector that is used for accessing cold data through Starburst Warp Speed. Defaults to `hive`. Valid values are `hive`, `iceberg`, or `delta-lake`. All properties supported by these connectors, including metastore, file format, and connector properties, can be used to configure the catalog.
`warp-speed.cluster-uuid`	Required. Unique identifier of the cluster. Used as the folder name in the store path. Use the same value for all catalogs. When creating a new cluster and the same `warp-speed.store.path` and `warp-speed.cluster-uuid` are used, then the cluster warmup rules for index and cache creation are imported into the newly created cluster.
`warp-speed.config.internal-communication.shared-secret`	Required, unless REST API is disabled or you are using a native filesystem implementation The shared secret value of the cluster. It is configured for secure internal communication in `sharedSecret:` for the Kubernetes deployment. The identical value used for the property `warp-speed.config.internal-communication.shared-secret` for each catalog. Do not configure if you are using a native filesystem implementation.
`warp-speed.store.path`	The path to a bucket in object storage where SSD data is backed up for fast cache rehydration. Requires `warp-speed.enable.import-export` be set to true.
`warp-speed.objectstore.store.path`	The path to a bucket in object storage that stores custom warmup rules. See also, Warmup rule persistency.
`warp-speed.enable.import-export`	Enable or disable index and cache resiliency. Defaults to `false`. Requires an object storage configured with `warp-speed.store.path`.
`warp-speed.use-http-server-port`	Specifies to run the REST API on the coordinator in the same server as SEP. Defaults to `true`. Do not configure if you are using a native filesystem implementation. For details, see REST API access.
`warp-speed.config.http-rest-port-enabled`	Optional parameter to enable the REST API server on the coordinator and each worker for each catalog. Defaults to `false`. This REST API is only suitable for testing and debugging purposes, since it is not secured. For production, use the default configuration. Set the property to `true` and add a unique `http-rest-port` value for each catalog. Endpoints use the specified port for a catalog, such as `8088`, instead of the `/ext/{catalogName}` section of the context, and are otherwise identical to the secured REST API. Typically the port is not exposed outside the cluster, so users need to forward the port or call the API from inside the cluster. Do not configure if you are using a native filesystem implementation.
`warp-speed.config.extensions.enabled`	Enable or disable Starburst Warp Speed extensions. Defaults to `true`.

Warmup rules storage configuration properties#

The following table includes the warmup rules storage configuration properties:

Warmup rules storage configuration properties#
Property name	Description
`warp-speed.objectstore.s3.endpoint`	The endpoint URL for an S3 bucket that stores warmup rules.
`warp-speed.objectstore.s3.path-style-access`	Enable or disable path-style access to an S3 bucket that stores warmup rules. Defaults to `false`. Set to `true` when using S3-compatible storage that requires path-style URLs.
`warp-speed.objectstore.s3.region`	The AWS region for an S3 bucket that stores warmup rules.
`warp-speed.objectstore.fs.native-s3.enabled`	Enable or disable native filesystem implementation for an S3 bucket that stores warmup rules. Defaults to `false`.
`warp-speed.objectstore.s3.aws-access-key`	The AWS access key for authentication to an S3 bucket that stores warmup rules.
`warp-speed.objectstore.s3.aws-secret-key`	The AWS secret key for authentication to an S3 bucket that stores warmup rules.

Hive considerations#

Most configurations of the Hive connector are supported. Additionally, the following considerations apply when using the Hive connector as the proxied connector for Starburst Warp Speed:

Materialized views are supported.
S3 proxy is not supported.
Server-side encryption with S3 managed keys and KMS managed keys is supported. S3 client-side encryption is not supported.
ORC ACID transactional tables are not supported.

For optimal performance, add the following properties to your catalog configuration:

catalogs:
  example: |
    ...
    hive.max-outstanding-splits-size=512MB
    hive.max-initial-splits=0
    hive.max-outstanding-splits=3000
    ...

Iceberg considerations#

All configurations of the Iceberg connector are supported. Additionally, the following considerations apply when using the Iceberg connector as the proxied connector for Starburst Warp Speed:

Materialized views are supported.
Server-side encryption with S3 managed keys is supported. S3 client-side encryption is not supported.
An associated split is served from object storage and no acceleration occurs when:
- A row-level update or delete operation.
- A merge operation that causes a record update.

Delta Lake considerations#

All configurations of the Delta Lake connector are supported. Additionally, the following considerations apply when using the Delta Lake connector as the proxied connector for Starburst Warp Speed:

Materialized views are not supported.
Server-side encryption with S3 managed keys is supported. S3 client-side encryption is not supported.
An associated split is served from object storage and no acceleration occurs when:
- A row-level update or delete operation.
- A merge operation that causes a record update.

For optimal performance, add the following properties to your catalog configuration:

catalogs:
  example: |
    ...
    delta.max-outstanding-splits=3000
    ...

S3 considerations#

Starburst Warp Speed supports Amazon S3 with catalogs using the Hive, Iceberg, and Delta Lake connectors.

Using the s3:// protocol is required.

GCS considerations#

Starburst Warp Speed supports Google Cloud Storage (GCS).

Authentication to GCS can use a JSON key file or an OAuth 2.0 access token configured identically for the Hive, Delta Lake, or Iceberg connector in the catalog properties:

hive.gcs.json-key-file-path=/path/to/gcs_keyfile.json
hive.gcs.use-access-token=false

The following OAuth 2.0 access scopes for Google APIs must be attached during GKE node creation to enable the index and cache resiliency:

For more information about authorization, refer to Google Cloud Service accounts documentation.

ADLS considerations#

Starburst Warp Speed supports Microsoft ADLS Gen 2. ADLS Gen1 is not supported.

Using the abfs:// or abfss:// protocol is required.

ADLS can be used with catalogs using the Hive and Delta Lake connectors with the following configuration properties to connect to Azure storage:

catalogs:
  faster: |
    ...
    warp-speed.store.path=abfs://<container_name@account_name>.dfs.core.windows.net/folder
    hive.azure.abfs-storage-account=<storage_account_name>
    hive.azure.abfs-access-key=xxx
    ...

It is possible to secure the connection with TLS and use the abfss protocol with the URI syntax.

Cluster management#

Starburst Warp Speed accommodates cluster expansion and contraction. Be aware of the following when scaling up or down:

When scaling a cluster horizontally (adding or removing worker nodes), Starburst Warp Speed continues operating, assuming that requirements are properly fulfilled. A cluster restart is not required when adding or removing nodes.
Scaling a cluster vertically to use larger nodes requires a cluster restart, which facilitates the replacement of all worker nodes to the larger node size.
After restarting the cluster, the default acceleration becomes active. New caches and indexes get created and populated based on the query workload.

Default acceleration#

When a query accesses a column that is not accelerated, the system performs data and index materialization on the cluster to accelerate future access to the data in the column. This process of creating the indexes and caches is also called warmup. Warmup is performed individually by each worker based on the processed splits and uses the local high performance storage of the worker. Typically, these are SSD NVMe drives.

When new data is added to a table or the index and cache creation are in progress, the new portions of the table that are not accelerated are served from the object storage. After the asynchronous indexing and caching is complete, query processing accessing that data is accelerated, because the data is available directly in the cluster from the indexes and caches, and no longer has to be retrieved from the remote object storage.

This results in immediately improved performance for recently used datasets. In addition to the automatic default acceleration, advanced users can create specific warmup rules. The default acceleration has a lower priority than a user-created warmup rule.

Default acceleration is performed for SELECT * FROM <table_name> queries that are commonly used to explore a table rather than to retrieve specific data. The maximum number of accelerated columns in SELECT * clauses is dynamic and depends on the column type.

Acceleration types#

Starburst Warp Speed uses different types of acceleration to improve query processing performance:

Data cache acceleration
Index acceleration
Text search acceleration

These acceleration types are used automatically by default acceleration, and can also be configured manually with warmup rules defined with the REST API.

Data cache acceleration#

Data cache acceleration is the system that caches the raw data objects from the object storage directly on the high-performance storage attached to the workers in the cluster. The data from one or more objects is processed in the cluster as splits. The data from the splits and associated metadata are managed as a row group. These row groups are used to accelerate any queries that access the contained data. The row groups are stored in a proprietary columnar block caching format.

Use the WARM_UP_TYPE_DATA value in the warmUpType property to configure data cache acceleration for a specific column with the REST API.

Index acceleration#

Index acceleration uses the data in a specific column in a table to create an index. This index is added to the row group and used when queries access a column to filter rows. It accelerates queries that use predicates, joins, filters, and searches, and minimizes data scanning.

The index types (such as bitmap, tree, and others), are determined automatically by the column data types, and data patterns and characteristics.

Use the WARM_UP_TYPE_BASIC value in the warmUpType property to configure index acceleration for a specific column with the REST API.

Text search acceleration#

Text search acceleration creates an index of the content of text columns using Apache Lucene. This index is used in query predicates. It accelerates queries that use predicates of filters and searches on text columns.

Starburst Warp Speed automatically enables text search acceleration, and maintains the indexes.

Text search acceleration uses Apache Lucene indexing to accelerate text analytics and provide fast text filters, particularly with LIKE predicates. The KeywordAnalyzer provides full support for LIKE semantics to search for the exact appearance of a value in a filtered column.

A use case is a search for a specific short string in a larger column, such as a description. For example, consider a table with a column named city and a value New York, United States. The index is case-sensitive. When indexing is applied to the column, the following query returns that record because the LIKE predicate is an exact match:

SELECT *
FROM tbl
WHERE city LIKE '%New York%'

The following queries do not return the results because the LIKE predicates are not an exact match. The first query is missing a space in the pattern:

SELECT *
FROM tbl
WHERE city LIKE '%NewYork%'

The second query uses lowercase:

SELECT *
FROM tbl
WHERE city LIKE '%new york%'

Text search acceleration indexing is recommended for:

Queries with LIKE predicates, prefix or suffix queries, or queries that use the starts_with functions.
Range queries on string columns. A common use is dates that are stored as strings that have range predicates. For example, date_string>='yyyy-mm-dd'.

Text search acceleration indexing supports the following data types:

CHAR
VARCHAR
CHAR ARRAY
VARCHAR ARRAY

Use the WARM_UP_TYPE_LUCENE value in the warmUpType property to configure text search acceleration for a specific column with the REST API.

Limitations:

The maximum supported string length is 33k characters.
Queries with nested expressions, such as starts_with(some_nested_method(col1), 'aaa'), are not accelerated.
Query predicates can contain a maximum of 128 unique columns.

Index and cache usage#

Once you have configured Starburst Warp Speed, you can view acceleration details and other summary statistics on the Index and cache usage tab in the Starburst Enterprise web UI.

For more information, see the reference documentation.

Fast warmup#

Starburst Warp Speed includes a “fast warmup” feature, formerly known as index and cache resiliency. Fast warmup is optional but highly recommended. When a new index is created or data is cached, it is stored on the SSD NVMe (nonvolatile memory express) drives attached to each worker by default, and in addition on a dedicated, shared bucket in your object storage. When you scale the cluster, the indexes and data cache remain available in the shared storage.

The fast warmup feature eliminates the need for keeping instances idling when a cluster is not in use, by selectively warming the designated bucket on the object storage to check if the indexes or cache are ready and loads them accordingly. If the indexes or cache are not available in the shared object storage or cannot be loaded for any reason, the data is warmed as usual.

There are three storage tiers accessed for queries:

Hot data and index: Uses SSD NVMe attached to the workers in a cluster to process queries and store hot data and a cache for optimal performance. This layer is enabled by default with Starburst Warp Speed.
Warm data and index: Indexes and cached data are stored in a designated object storage bucket. This layer is shared amongst all of your workers to ensure minimal resources are allocated to indexing when scaling up a cluster and adds resiliency. When a cluster is scaled down or eliminated, and nodes are shut down, the indexes remain available as warm data. The warm layer is disabled by default. Enabling index and cache resiliency makes the warm tier available for fast warmup of indexes and caches.
Cold data: Direct access of your object storage.

With this tiered approach, you can continue using your existing scaling and auto scaling policies.

As a best practice, allocate at least the average available SSD storage of your cluster.

Enable the feature#

To enable fast warmup, use the following property:

warp-speed.enable.import-export=true

For a complete list of properties, see Catalog configuration properties.

Set up a backup location#

To set a backup location in your object storage for index or data caches, use the following property:

warp-speed.store.path=<backup-location-bucket>

In most use cases, this location can reside in the catalog’s bucket. For more complex, cross-region deployments this bucket should be located in the same region as the compute account.

Edit AWS privileges#

To use fast warmup, you must include read/write permissions to a backup location in object storage. The following shows a privilege example for read/write access to S3:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "s3ReadWrite",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketPolicy",
                "s3:GetObject",
                "s3:GetObjectAttributes",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<backup-location-bucket>/*",
                "arn:aws:s3:::<backup-location-bucket>",
                "arn:aws:s3:::<data-bucket>/*",
                "arn:aws:s3:::<data-bucket>/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
              "s3:ListAllMyBuckets",
              "glue:*"
            ],
            "Resource": "*"
        }
    ]
}

Lifecycle Policies#

Storing the indexes and data caches can fill up your object storage. Setting a lifecycle policy to control the associated cost is highly recommended when using fast warmup. Refer to the following lifecycle documentation for your cloud provider:

Automated clean up#

When the available storage on the cluster is about to run out of storage space, index and cache elements are automatically deleted. As a user or administrator, you don’t need to manage index and cache allocation. When the storage capacity threshold is exceeded, the system deletes the following content until the clean up threshold is reached:

All expired content based on the TTL value.
Content with the lowest values on the priority property that were created as a result of the default acceleration.
Content related to custom warmup rules for indexing and caching.

After a clean up, new data is indexed and cached as needed based on the data access by the processed queries.

Warmup rule persistency#

Advanced users, who configure custom warmup rules with the REST API, must use object storage to ensure user-defined warmup rules are not lost after a cluster restart.

Specify an object store path where custom warmup rules are stored. This path should point to a location in your object storage bucket:

warp-speed.objectstore.store.path=protocol://account_name.path.to.bucket/storage

Specify a separate path for each catalog. Multiple catalogs can use a single object storage bucket with a separate file for each catalog.

Storing custom warmup rules can fill up your object storage. Set a lifecycle configuration to manage warmup rules stored in your bucket and exclude them from deletion. Refer to the following lifecycle documentation for your cloud provider:

Starburst Warp Speed management#

Starburst Warp Speed automatically creates and manages its data based on processed queries, also called the Default acceleration

Additional custom configuration can be applied with the REST API.

The Index and cache usage tab provides summary statistics that indicate performance gains from using Starburst Warp Speed.

REST API access#

The Starburst Warp Speed REST API is available on the coordinator with a separate context for each catalog on the same port and domain as the SEP web UI, the Starburst Enterprise REST API and the Trino REST API.

Access to the REST API is controlled by the authentication and authorization identical to the Starburst Enterprise REST API.

The REST API is not enabled by default. To enable the Warp Speed REST API, set warp-speed.config.extensions.enabled=true. The shared secret value of the cluster, configured for secure internal communication, must be set in warp-speed.config.internal-communication.shared-secret for each catalog.

Note

If you are using a native file system implementation rather than the default Hadoop implementation, do not add warp-speed.config.internal-communication.shared-secret to your configuration, and set warp-speed.use-http-server-port=false.

The endpoints path structure includes the name of the catalog, /ext/{catalogName}/{paths: .+}. The following example shows the /ext path to the warmup/warmup-rule-set endpoint for the catalog named faster on a cluster without authentication exposed via HTTP:

curl -X GET 'http://sep.example.com:8080/ext/faster/warmup/warmup-rule-get' \
  -H 'Accept: application/json'

A secured server needs to be accessed via HTTPS and potentially include authentication information:

curl -X GET 'https://sep.example.com/ext/faster/warmup/warmup-rule-get' \
  -H 'Accept: application/json'

REST API overview#

The following sections detail the REST API and available endpoints. The example calls use plain curl calls to the endpoints for the faster catalog on the cluster at sep.example.com using HTTPS and omitting any authentication.

Warming status#

You can determine the status of the warmup for Starburst Warp Speed with a GET operation of the /warming/status endpoint. It measures the warmup progress for splits across workers and if warming is currently taking place.

curl -X GET 'https://sep.example.com/ext/faster/warming/status' \
  -H 'Accept: application/json'

Example response:

{"nodesStatus":
  {"172.31.16.98": {"started":22136,"finished":22136},
   "172.31.25.207":{"started":20702,"finished":20702},
   "172.31.19.167":{"started":21116,"finished":21116},
   "172.31.22.28":{"started":20678,"finished":20678}},
   "warming":false}

The response shows that warmup started and finished on four workers, and is currently not in progress.

Debug tools#

The debug-tools endpoint requires an HTTP POST to specify the detailed command with a JSON payload to retrieve the desired data. You can use it to return the storage utilization:

curl -X POST "https://sep.example.com/ext/faster/debug-tools"  \
  -d '{"commandName" : "all","@class" : "io.trino.plugin.warp.execution.debugtools.DebugToolData"}' \
  -H 'Content-Type: application/json'

Example response:

{"coordinator-container":
  {"result":
    {"Storage_capacity":15000000,
     "Allocated 8k pages":1000000,
     "Num used stripes":0
    }
  }
}

Calculate the storage utilization percentage with (Allocated 8k pages / Storage_capacity) * 100.

Debug tools are blocked and can not be used during warming.

Row group count#

A row group in Starburst Warp Speed is a collection of index and cache elements that are used to accelerate processing of Trino splits from the SSD storage.

Note

A row group in Starburst Warp Speed is not equivalent to a Parquet row group or an ORC stripe, but a higher level artifact specific to Starburst Warp Speed. It can be related to a specific Parquet row group or ORC stripe but can also represent data from a whole file or more.

The row-group/row-group-count endpoint exposes all currently warmed up columns via an HTTP GET:

curl -X GET "https://sep.example.com/ext/faster/row-group/row-group-count" \
  -H "accept: application/json"

The result is a list of columns specified by schema.table.column.warumuptype as the key. The value represents the corresponding count of accelerated row groups. Warmup types:

WARM_UP_TYPE_DATA represents data cache acceleration.
WARM_UP_TYPE_BASIC represents index acceleration.
WARM_UP_TYPE_LUCENE represents text search acceleration.

In the following example, 20 row groups of the tripid column of the trips_data table in the trips schema are accelerated with a data cache and an index.

{
  trips.trips_data.tripid.WARM_UP_TYPE_DATA": 20,
  trips.trips_data.tripid.WARM_UP_TYPE_BASIC": 20
}

Create a warmup rule#

Use the warmup/warmup-rule-set endpoint with an HTTP POST and a JSON payload to create a warmup rule. You can create a warmup rule at the column or table level. Access to a table or column initiates the creation of index and caching data. Warmup rules can prevent index and cache creation or impact the order in which the index and cache data is removed, when storage limits are reached.

If there are warmup rules defined for both a column and its table, the column rule takes precedence, unless the priority of the column rule is lower than the priority of the table rule.

Column-level warmup rule#

The following example creates a column-level warmup rule for the int_1 column in the aaa table of the tmp schema:

curl -X POST 'https://sep.example.com/ext/faster/warmup/warmup-rule-set' \
  -d '[ { "column":{"classType":"RegularColumn", "key":"int_1"}, "schema": "tmp", "table": "aaa", "warmUpType": "WARM_UP_TYPE_BASIC", "priority": 8, "ttl": "PT720H", "predicates": [ ] } ]'
  -H 'Content-Type: application/json'

Find more details about the JSON payload in the table Warmup rule properties.

Table-level warmup rule#

The following example creates a table-level warmup rule for the aaa table of the tmp schema:

curl -X POST 'https://sep.example.com/ext/faster/warmup/warmup-rule-set' \
  -d '[ {"column":{"classType":"WildcardColumn","key":"*"},"schema": "tmp","table": "aaa","warmUpType": "WARM_UP_TYPE_BASIC","priority": -1,"ttl": "PT720H","predicates": [ ]} ]' \
  -H 'Content-Type: application/json'

Find more details about the JSON payload in the table Warmup rule properties.

Warmup rule properties#

Warmup rule properties#
Property name	Description
`columnid`	Name of the column to which a warmup rule is attached.
`column`	Defines the columns to accelerate in a table-level rule. Specify all columns by using `classType` set to `WildcardColumn` and `key` to `*` as used in Table-level warmup rule.
`schema`	Name of the schema that contains the specified table.
`table`	Name of the table that contains the specified column.
`warmUpType`	The materialization type performed on the specified column in the specified table. Valid values are `WARM_UP_TYPE_DATA` for data cache acceleration `WARM_UP_TYPE_BASIC` for index acceleration, and `WARM_UP_TYPE_LUCENE` for text search acceleration.
`priority`	Priority for the warmup rule. To ensure a column is accelerated even if storage capacity is exceeded, set the `priority` as high as `10`. Rules with a higher priority take precedence. To ensure a column is never accelerated and prevent data cache or index creation, set to `-10`. Valid range of values: `-10` to `10`.
`ttl`	Duration for which the warmup rule remains active. Use `PT0M` to prevent expiration of the rule. Use duration specified in ISO-8601 duration format (PnDTnHnMn).
`predicates`	Defaults to all partitions. Use the JSON array syntax `["example1", "example2"]` to limit to specific partitions.

Get all warmup rules#

The warmup/warmup-rule-get endpoint exposes all defined warmup rules via an HTTP GET:

curl -X GET 'https://sep.example.com/ext/faster/warmup/warmup-rule-get' \
  -H 'Accept: application/json'

Response:

{
  "id":186229827,
  "schema":"ride_sharing_dataset",
  "table":"trips_data_big",
  "columnid":"d_date",
  "column":
    {
      "classType":"RegularColumn",
      "key":"d_date"
     },
  "warmUpType":"WARM_UP_TYPE_BASIC",
  "priority":8.0,
  "ttl":2592000.000000000,
  "predicates":[]
}

Delete a warmup rule#

The warmup/warmup-rule-delete endpoint allows you to delete a warmup rule via an HTTP DELETE. The identifier for the rule is a required parameter and can be seen from the result of warmup/warmup-rule-get in the id value.

curl -X DELETE 'https://sep.example.com/ext/faster/warmup/warmup-rule-delete' \
  -d '[186229827]' -H 'Accept: application/json'

You can delete multiple rules by using a comma-separated list such as [186229827,186229827] as parameter.

When you delete a warmup rule, the column index and cache data is de-prioritized to data from a default acceleration, and therefore is subject to earlier deletion.

SQL support#

All SQL statements and functions supported by the connector used in the accelerated catalog are supported by Starburst Warp Speed:

Starburst Warp Speed supports all data types, including structural data types. All structural data types are accessible, but indexing is only applicable to fields within ROW data types.

For some functions, Starburst Warp Speed does not accelerate filtering operations on columns. For example, the following filtering operation is not accelerated:

SELECT count(*)
FROM catalog.schema.table
WHERE lower(company) = 'starburst';

Starburst Warp Speed indexing accelerates the following functions when used on the left or the right side of the predicate:

ceil(x) with REAL and DOUBLE data types
in_nan(x) with REAL and DOUBLE data types
cast(x as type) with DOUBLE cast to REAL, or any type cast to VARCHAR
cast(x as type) with DOUBLE and DECIMAL data types
day(d) and day_of_month(d) with DATE and TIMESTAMP data types
day_of_year(d) and doy(y) with DATE and TIMESTAMP data types
day_of_week(d) and dow(d) with DATE and TIMESTAMP data types
year(d) with DATE and TIMESTAMP data types
year_of_week(d) and yow(d) with DATE and TIMESTAMP data types
week(d) and week_of_year(d) with DATE and TIMESTAMP data types
LIKE and NOT LIKE with VARCHAR data type
contains(arr_varchar, value) with array of VARCHAR data type
substring and substr with VARCHAR data type
strpos with BIGINT data type

The maximum supported string length for any cached data type is 48000 characters.

FAQ#

What happens in case data is not cached and indexed? Am I getting partial results?

No. In case a split can be served from SSD, it is served; but if not, Starburst Warp Speed gets the data for this split from the object storage to complete the query and sends back the results. Then the index and cache are created asynchronously, based on priority and available SSD storage, so that future queries can leverage the index and cache.

Is there a chance a user can get stale results?

No. Starburst Warp Speed uses a mapping between the generated splits and index and cache data on SSDs during query processing. If a split can be served from SSD, it is; but if not, Starburst Warp Speed gets the data for this split from the object storage and then asynchronously indexes and caches it as appropriate.

What is the caching and indexing speed?

Performance depends on many different factors. For example, indexing and caching the entire TPC-DS SF1000 dataset takes about 20 minutes on a cluster with two workers with the machine size r5d.8xlarge.