Using the cache service #

Starburst Enterprise platform (SEP) offers caching options through its cache service to help you reduce query time and costs.

The cache service can automatically manage refreshes of materialized views in Hive catalogs.

With table scan redirections, you can redirect data requests for a table in one catalog to a cached version of it in another catalog on a more performant system, reducing the query load on the original data source. This redirection is transparent to the user, and therefore provides performance improvements without the need to modify user queries.

This document provides an overview of the cache service, including links to relevant sections of our reference documentation.

Select the right cache strategy #

Both materialized views and table scan redirections improve query performance. Find out which one is right for your organization in this guide.

Materialized views #

Materialized views created in Hive catalogs are backed by the cache service and the Hive Metastore and can be automatically refreshed. If you use SEP with Hive and the cache service, you already have the necessary prerequisites to use automatically refreshed materialized views.

Any query that runs successfully in SEP can be used to create a materialized view. Users access the materialized view through the catalog it is made available from, just as they would any other data source.

Materialized views in Hive allow you to access the results of a query from any data catalog you have defined in SEP in real time without re-processing the query, and with a refresh scheme that works best for your organization.

Table scan redirections #

Table scan redirections allow you to transparently redirect queries to a cached version in a location that incurs lower egress costs, is more performant, or both. Your users get even faster performance without changing the vetted queries that they depend upon.

Table scan redirection is available for nearly all Starburst connectors.

Get started #

The information in this section helps you get Starburst Cached Views up and running using the Starburst cache service. It includes links to more detailed information in our reference documentation. As with any configuration change in SEP, you must restart the server to apply the changes.

Enable the cache service #

To use table scan redirections or Hive materialized views, the cache service must first be installed and enabled. Our reference documentation has requirements and instructions for doing so. Take the time to read about the two types of deployments, standalone and embedded, to see which is better suited to your cluster. If your organization uses our Helm charts to deploy SEP, you can use the Helm chart for the cache service to deploy it as standalone service.

Before you begin, ensure that you have an external database instance available to store information about materialized views and table scan redirections.

When you are ready, there are a number of configuration properties that must be set, shown in the following example:

service-database.user=alice
service-database.password=test123
service-database.jdbc-url=jdbc:mysql://mysql-server:3306/<service-database-name>
starburst.user=bob
starburst.jdbc-url=jdbc:trino://coordinator:8080
rules.file=etc/rules.json

Where these get set depends on whether or not you are using a standalone or embedded cache service, and whether you are using Helm charts or not:

  • Embedded -
    • via the SEP Helm-chart in the cache.properties node nested under the top level coordinator node in the config section node.
    • Other deployments in the etc/cache.properties file on the coordinator server.
  • Standalone -
    • via the cache service Helm chart in the config.properties node nested under the top level config node.
    • Other deployments in the etc/config.properties file on the cache service server.

Enable materialized views #

Materialized views must be enabled in the Hive catalog that they are accessed from. In addition, you must specify a schema to contain materialized view storage. We strongly recommend that you create a schema dedicated to materialized views for a given catalog. A namespace must also be specified; it is used internally by the cache service to avoid inadvertent name collisions. To create a new schema with a specified location, use a command similar to the following, which creates the views_cache_storage schema in the myhive catalog:

CREATE SCHEMA myhive.views_cache_storage WITH (location = 's3a://<s3-bucket-name>/hivepostgres_views/views_cache_storage/');

In addition, the schema for the materialized view itself must exist. If it does not exist, you must create it:

CREATE SCHEMA myhive.views_schema WITH (location = 's3a://<s3-bucket-name>/hivepostgres_views/views_schemas');

Each Hive catalog must be configured to allow materialized views and use the schema you created for materialized views:

materialized-views.enabled=true
materialized-views.namespace=mymvnamespace
materialized-views.storage-schema=mymvstorage
cache-service.uri=http://<my-cache-service-hostname>:8180

Users with the necessary access privileges in a schema in the configured catalog can create materialized views in the usual manner.

Enable table scan redirections #

Like materialized views, table scan redirections are also managed by the cache service. The redirections are transparent to users, so no additional training, modifications of queries or new commands are needed to use them.

Table scan redirection refreshes are configured in a JSON-formatted rules file. There are numerous properties to govern both global defaults for all rules and rule- specific behaviors.

Table scan redirections offer even finer-grained control for refreshes than do materialized views, including cleanup options.