Configure the cache service in Kubernetes#
The starburst-cache-service
Helm chart configures a standalone
Cache service to use with Starburst Cached Views.
We strongly suggest that you follow best practices to customize your cluster. Creating small, targeted files to override any defaults and add any configuration properties. There is an example file set in our installation guide that describes the recommended way to manage your customizations.
There are two things you must configure to use the Starburst cache service:
The cache service
Starburst Enterprise platform (SEP), to use the cache service
Make sure that the service is available before configuring and restarting SEP to use it.
Requirements#
Externally managed, compatible relational database.
Full access to a dedicated schema on the database with username and password credentials.
Network access on the configured port between the database and the cache service container in the Kubernetes cluster.
Configure and start the cache service#
To update the cache service with any configuration changes, run the helm
update
command with the updated YAML files and the --install
switch. For
example:
$ helm upgrade my-caching-service starburstdata/starburst-cache-service \
--install \
--version 360.15.0 \
--values ./registry-access.yaml
--values ./cache-service-prod.yaml
When you update the cache service, you can use the same command that you use to
upgrade to a new release. Helm compares all --values
files and the version,
and safely ignores any that are unchanged.
Top level nodes#
The top level nodes included in the values.yaml
file are described in the
following table. Default values and examples follow this table:
Node name |
Description |
---|---|
|
Contains the details for the cache service Docker image. Review our
best practices for managing registry
access across all Starburst products. NOTE: The |
|
Contains the secret to be mounted under the specified
|
|
Specifies the configuration properties for the cache service
under |
|
Defines authentication details for Docker registry access. Typically, you
need to use your username and
password for the Starburst Harbor instance. Cannot
be used if using |
|
Alternative authentication for selected Docker registry using secrets.
Cannot be used if using |
|
The CPU and memory resources to use for the cache service. Request and limit values should be identical. These settings can be adjusted to match your workload and available node sizes in the cluster. |
|
Defines the mechanisms and their options that expose the cache service to
an outside network. |
|
Configuration to allow Kubernetes to determine the node and pod to use. These nodes are left empty by default. |
|
Defines common labels to identify all cache service objects in a KRM to
use with the |
image:
#
The following are the default values for the cache service
image:
top level node in the values.yaml
file:
image:
repository: "starburstdata/starburst-cache-service"
tag: "360-e.15"
pullPolicy: "IfNotPresent"
config:
#
The defaults for nodes nested under config:
are described below.
config.properties:
#
This node specifies configuration properties and their values for the cache service. The following are the
default values for the cache service config.properties:
node in the
values.yaml
file:
config:
config.properties: |
service-database.user=alice
service-database.password=test123
service-database.jdbc-url=jdbc:mysql://mysql-server:13306/redirections
starburst.user=bob
starburst.jdbc-url=jdbc:trino://coordinator:8080
rules.file=etc/rules.json
config:jvm.config:
#
This node specifies the command line configuration options for starting
the Java Virtual Machine (JVM) used by the cache
service. The following are the default values for the cache service
config:jvm.config:
node in the values.yaml
file:
config:
jvm.config: |
-server
-Xmx2G
-XX:-UseBiasedLocking
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseGCOverheadLimit
-XX:+ExitOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.nio.maxCachedBufferSize=2000000
log.properties:
#
This optional node specifies the logging configuration for the cache service. The following are the
default values for the cache service log.properties:
node in the
values.yaml
file:
config:
jvm.config: |
io.airlift=INFO
rules.json:
#
The rules.json
node specifies the source tables and target connector for
the cache, along with the schedule for refreshing them. The following are the
default values for the cache service rules.json:
node in the
values.yaml
file:
config:
rules.json: |
{
"rules": []
}
The following example demonstrates how to implement cache service refresh rules in this node by adding the JSON file content in a multi-line segment:
config:
rules.json: |
{
"defaultGracePeriod": "42m",
"defaultMaxImportDuration": "1m",
"defaultCacheCatalog": "default_cache_catalog",
"defaultCacheSchema": "default_cache_schema",
"defaultUnpartitionedImportConfig": {
"usePreferredWritePartitioning": false,
"preferredWritePartitioningMinNumberOfPartitions": 1,
"writerCount": 128,
"scaleWriters": false,
"writerMinSize": "110MB"
},
"defaultPartitionedImportConfig": {
"usePreferredWritePartitioning": true,
"preferredWritePartitioningMinNumberOfPartitions": 40,
"writerCount": 256,
"scaleWriters": false,
"writerMinSize": "52MB"
},
"rules": [
{
"catalogName": "mysql",
"schemaName": "marketing",
"tableName": "events",
"refreshInterval": "2m",
"gracePeriod": "15m",
"incrementalColumn": "event_id",
"deletePredicate": "event_date < date_add('day', -31, CURRENT_DATE)"
}
]
}
type-mapping.json:
#
The type-mapping.json
node specifies type mapping rules between source and target catalogs. This
node is not included in the values.yaml
file by default. The following
example maps three different timestamp types to timestamp(3)
in the target:
config:
type-mapping.json: |
{
"rules": {
"tpch": {
"integer": "long"
},
"hive": {
"timestamp(0)": "timestamp(3)",
"timestamp(1)": "timestamp(3)",
"timestamp(2)": "timestamp(3)"
}
}
}
registryCredentials:
#
The following are the default values for the cache service
registryCredentials:
top level node in the values.yaml
file:
registryCredentials:
enabled: false
registry:
username:
password:
imagePullSecrets:
#
Instead of setting registryCredentials you can pass a list of secrets in the following format. This feature is disabled by default:
# imagePullSecrets:
# - name: secret1
# - name: secret2
imagePullSecrets:
resources:
#
The following are the default values for the cache service resource:
top
level node in the values.yaml
file:
resources:
requests:
memory: 2Gi
cpu: 0.5
limits:
memory: 2Gi
cpu: 4
expose:
#
You must expose the service to allow it to connect to the SEP coordinator, and
to reach it with tools such as the cache service CLI. This service-type configuration is defined by the
expose:
top level node. You can choose from four different mechanisms by
setting the type:
value to the common configurations in k8s.
Depending on your choice, you only have to configure the identically-named sections.
Type |
Description |
---|---|
|
Default value. Only exposes the service internally within the k8s cluster using an IP address internal to the cluster. Use this in the early stages of configuration. |
|
Configures the internal port number of the server for requests from
outside the cluster on the |
|
Used with platforms that provide a load balancer. This
option automatically creates |
|
This option provides a production-level, securable configuration. It
allows a load balancer to route to multiple apps in the cluster, and
may provide load balancing, SSL termination, and name-based virtual
hosting. For example, the SEP coordinator and Ranger server can be
in the same cluster, and can be exposed via |
The following are the default values for the cache service expose:
top level
node in the values.yaml
file:
expose:
port: 8180
# one of: nodePort, clusterIp, loadBalancer, ingress
type: "clusterIp"
clusterIp:
name: "cache-service"
ports:
http:
port: 8180
nodePort:
name: "cache-service"
ports:
http:
port: 8180
nodePort: 30180
loadBalancer:
name: "cache-service"
IP: ""
ports:
http:
port: 8180
annotations: {}
sourceRanges: []
ingress:
ingressName: "cache-service-ingress"
serviceName: "cache-service"
servicePort: 8180
ingressClassName:
tls:
enabled: true
secretName:
host:
path: "/"
annotations: {}
commonLabels:
#
The following are the default values for the commonLabels:
top level node in
the cache service values.yaml
file:
commonLabels: {}
# environment: dev
# myLabel: labelValue
Configure SEP to use the cache service#
The cache service requires a database schema to store configuration data. Ensure that you have created the schema, and note the connection information. Once the cache service and the external RDBMS is configured and running, SEP must be configured as in the following example, which shows a PostgreSQL database providing the backing schema:
coordinator:
etcFiles:
properties:
cache.properties: |
service-database.user=postgres
service-database.password=StarburstR0cks!
service-database.jdbc-url=jdbc:postgresql://<your_rds_endpoint>:5432/redirections
starburst.user=starburst_service
starburst.password=
starburst.jdbc-url=jdbc:trino://coordinator:8080
rules.file=secretRef:cache-rules:cache-rules.json
rules.refresh-period=1m
refresh-initial-delay=1m
refresh-interval=24h
Many connectors support the use of the cache service. For each supported catalog that you wish to use with the cache service, two lines must be added to the catalog properties configuration:
redirection.config-source=SERVICE
cache-service.uri=http://cache-service:8180
In the following example, the mysalesdata
catalog is configured to use the
cache service:
catalogs:
mysalesdata: |
connector.name=postgresql
connection-url=jdbc:postgresql://<mydbhost>:5432/bootcamp
connection-user=postgres
connection-password=StarburstR0cks!
statistics.enabled=true
allow-drop-table=true
redirection.config-source=SERVICE
cache-service.uri=http://cache-service:8180