Starburst Enterprise deployment basics#
This topic provides a high-level overview of Starburst Enterprise platform (SEP) requirements, configuration and starting the cluster.
This foundational topic that helps you to understand the following:
Basic cluster requirements
Deployment options
SEP configuration basics
Basic startup commands
After you have read and understood this topic, you are ready for the next step of preparing for your deployment using the selected option.
Requirements#
Your cluster must meet certain requirements to run SEP.
Network bandwidth#
To ensure optimal performance, SEP requires a minimum of 10Gbps of network bandwidth both within the cluster itself as well as between cluster nodes and data sources. Starburst recommends 25Gbps of bandwidth or more between nodes and object storage to take full advantage of parallelism.
Linux operating system#
RedHat Enterprise Linux (RHEL); other distributions of Linux are not officially supported by Starburst.
64-bit required
Newer releases are preferred, especially when running on containers
Adequate ulimits for the user that runs the Trino process. These limits may depend on the specific Linux distribution you are using. The number of open file descriptors needed for a particular SEP instance scales as roughly the number of machines in the cluster, times some factor depending on the workload. The
nofile
limit sets the maximum number of file descriptors that a process can have, while thenproc
limit restricts the number of processes, and therefore threads on the JVM, a user can create. We recommend setting limits to the following values at a minimum. Typically, this configuration is located in/etc/security/limits.conf
:trino soft nofile 131072 trino hard nofile 131072 trino soft nproc 128000 trino hard nproc 128000
Java runtime environment#
SEP requires a 64-bit version of Java 22, with a minimum required version of 22.0.1 and a recommendation to use the latest patch version. Earlier major versions such as Java 8, Java 11, Java 17 or Java 21 do not work. Newer versions such as Java 23 are not supported – they may work, but are not tested.
We recommend using the Eclipse Temurin OpenJDK distribution from Adoptium as the JDK for Trino, as Trino is tested against that distribution. Eclipse Temurin is also the JDK used by the Trino Docker image.
Processor architectures#
You must use one of the following processor architectures for your SEP deployment:
x86_64 (AMD64)
AArch64 (ARM64)
Backend service database#
The SEP backend service is required. It manages and stores information for a number of features in the product. The backend service requires an existing, external database, and must be configured and running before many other features can be configured.
Deploy SEP#
To deploy SEP, you must use the Starburst Kubernetes Helm charts or Starburst Admin. Starburst Admin is required for bare metal and virtual machines.
Configure SEP#
Starburst provides two deployment options for SEP:
Kubernetes-based deployments - All properties are defined in YAML files which are used to create the configuration files that SEP expects.
Starburst Admin-based deployments - Properties are defined in Jinja2
*.j2
template files which are used to create the configuration files that SEP expects. Starburst Admin is a collection of Ansible playbooks for installing and managing SEP.
There are four main categories of configuration properties that SEP requires in specifically named locations. These categories describe the top-level YAML nodes for Helm-based deployments, or the Jinja2 files for Starburst Admin-based deployments:
Node properties - Environmental configuration specific to each node.
JVM config - Command line options for the Java Virtual Machine
Config properties - Configuration for the coordinator and workers. See the Properties reference for available configuration properties.
Catalog properties - Configuration for data sources. The available catalog configuration properties for a connector are described in the respective connector documentation.
The node properties, JVM config, and config properties must be specified for
both the coordinator and workers in the values.yaml
file for
Kubernetes-based deployments, and in the Jinja2 files located in the
files/coordinator
and files/worker
directories for Starburst Admin
deployments. Catalogs are defined in the top-level catalogs
node of the
values.yaml
file for Kubernetes, or in the file/catalog
directory of
Starburst Admin.
In addition to the above categories, other features such as access control and
cluster security require additional customization in specific sections of the
values.yaml
file for Kubernetes deployments, or the creation of additional
config files for Starburst Admin deployments.
In most cases, the cluster must be restarted for configuration changes to take effect.
Managing configuration in Kubernetes deployments#
In Kubernetes-based deployments, we strongly suggest that you leave the
contents of the etcFiles.properties
sections in the top level coordinator
and worker
nodes with their default values untouched, and make any
customizations using the additionalProperties
nodes instead.
Additionally, we suggest following the recommended guidelines for creating a customization file set as described in the Kubernetes deployment link in the next steps section of this topic.
Managing the default values in the vars.yml
file#
In Starburst Admin-based deployments, we strongly suggest that you leave the contents
of the vars.yml
with its default values untouched unless there is a
compelling technical reason to change them.
Some configuration properties, if customized in Jinja2 files, may conflict with
the values in vars.yml
. In these cases, you must set them alike in both
files.
Node properties#
All properties described in this section are defined as follows, depending on the deployment type:
Kubernetes: In the
additionalProperties
section of the the top-levelcoordinator
andworker
nodes in thevalues.yaml
file.Starburst Admin: In the
files/coordinator/config.properties.j2
andfiles/worker/config.properties.j2
files.
These configuration properties are applied to each specific node in the cluster. A node is a single installed instance of SEP. This file is typically created by the deployment system when SEP is first installed. The following is a minimal configuration:
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/var/trino/data
The above properties are described below:
node.environment
: The name of the environment. All SEP nodes in a cluster must have the same environment name. The name must start with a lowercase alphanumeric character and only contain lowercase alphanumeric or underscore (_
) characters.node.id
: The unique identifier for this installation of SEP. This must be unique for every node. This identifier should remain consistent across reboots or upgrades of SEP. If running multiple installations of SEP on a single machine (i.e. multiple nodes on the same machine), each installation must have a unique identifier. The identifier must start with an alphanumeric character and only contain alphanumeric,-
, or_
characters.node.data-dir
: The location (filesystem path) of the data directory. SEP stores logs and other data here.
JVM config#
All properties described in this section are defined as follows, depending on the deployment type:
Kubernetes: In the
additionalProperties
section of the the top-levelcoordinator
andworker
nodes in thevalues.yaml
file.Starburst Admin:: In the
files/coordinator/jvm.config.j2
andfiles/worker/jvm.config.j2
files.
These configuration properties comprise a list of command line options used for launching the Java Virtual Machine. The list of options must be written one per line. These options are not interpreted by the shell, so options containing spaces or other special characters should not be quoted.
The following provides a good starting point for creating a performant JVM configuration:
-server
-Xmx16G
-XX:InitialRAMPercentage=80
-XX:MaxRAMPercentage=80
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExitOnOutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000
-Dfile.encoding=UTF-8
# Allow loading dynamic agent used by JOL
-XX:+EnableDynamicAgentLoading
Note
If your environment is using a Java 22 version lower than 22.0.2, add the following line to your JVM config:
-XX:G1NumCollectionsKeepPinned=10000000
You must adjust the value for the memory used by SEP, specified with -Xmx
to the available memory on your nodes. Typically, values representing 70 to 85
percent of the total available memory is recommended. For example, if all
workers and the coordinator use nodes with 64GB of RAM, you can use -Xmx54G
.
SEP uses most of the allocated memory for processing, with a small percentage
used by JVM-internal processes such as garbage collection.
The rest of the available node memory must be sufficient for the operating system and other running services, as well as off-heap memory used for native code initiated the JVM process.
On larger nodes, the percentage value can be lower. Allocation of all memory to the JVM or using swap space is not supported, and disabling swap space on the operating system level is recommended.
Large memory allocation beyond 32GB is recommended for production clusters.
Because an OutOfMemoryError
typically leaves the JVM in an
inconsistent state, we write a heap dump, for debugging, and forcibly
terminate the process when this occurs.
The temporary directory used by the JVM must allow execution of code.
Specifically, the mount must not have the noexec
flag set. The default
/tmp
directory is mounted with this flag in some installations, which
prevents SEP from starting. You can workaround this by overriding the
temporary directory by adding -Djava.io.tmpdir=/path/to/other/tmpdir
to the
list of JVM options.
Config properties#
SEP provides general configuration properties for many aspects of the cluster. The adminstration section of this documentation provides a comprehensive list of the supported properties for topics such as General properties, Resource management properties, Query management properties, Web UI properties, and others. The examples in the section show how a small number of the available configuration properties are used.
General configuration properties are defined as follows, depending on the deployment type:
Kubernetes: In the
additionalProperties
section of the the top-levelcoordinator
andworker
nodes in thevalues.yaml
file.Starburst Admin: In the
files/coordinator/config.properties.j2
andfiles/worker/config.properties.j2
files.
A cluster is required to include one coordinator, as dedicating a machine to only perform coordination work provides the best performance on larger clusters. Scaling and parallelization is achieved by using many workers.
The following is a minimal configuration for the coordinator:
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
discovery.uri=http://example.net:8080
And this is a minimal configuration for the workers:
coordinator=false
http-server.http.port=8080
discovery.uri=http://example.net:8080
Alternatively, if you are setting up a single machine for purposes of very limited testing that functions as both a coordinator and worker, use this configuration:
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery.uri=http://example.net:8080
These properties require some explanation:
coordinator
: Allow this SEP instance to function as a coordinator, so to accept queries from clients and manage query execution.node-scheduler.include-coordinator
: Allow scheduling work on the coordinator. For larger clusters, processing work on the coordinator can impact query performance because the machine’s resources are not available for the critical task of scheduling, managing and monitoring query execution.http-server.http.port
: Specifies the port for the HTTP server. SEP uses HTTP for all communication, internal and external.discovery.uri
: The SEP coordinator has a discovery service that is used by all the nodes to find each other. Every SEP instance registers itself with the discovery service on startup and continuously heartbeats to keep its registration active. The discovery service shares the HTTP server with SEP and thus uses the same port. Replaceexample.net:8080
to match the host and port of the SEP coordinator. If you have disabled HTTP on the coordinator, the URI scheme must behttps
, nothttp
.
The above configuration properties are a minimal set to help you get started. All additional configuration is optional and varies widely based on the specific cluster and supported use cases.
Log levels#
All properties described in this section are optional, and allow setting the minimum log level for named logger hierarchies. Every logger has a name, which is typically the fully qualified name of the class that uses the logger. Loggers have a hierarchy based on the dots in the name, like Java packages. Logging properties are defined as follows, depending on the deployment type:
Kubernetes: In the
additionalProperties
section of the the top-levelcoordinator
andworker
nodes in thevalues.yaml
file.Starburst Admin: In the
files/coordinator/log.properties
andfiles/worker/log.properties
files. These are plain text files, not Jinja2 templates.
For example, consider the following log levels setting:
io.trino=INFO
This sets the minimum level to INFO
for both
io.trino.server
and io.trino.plugin.hive
.
The default minimum level is INFO
,
thus the above example does not actually change anything.
There are four levels: DEBUG
, INFO
, WARN
and ERROR
.
Catalog properties#
SEP accesses data via connectors, which are mounted in catalogs. The
connector provides all of the schemas and tables inside of the catalog. For
example, the Hive connector maps each Hive database to a schema. If the Hive
connector is mounted as the hivewebevents
catalog, and the data source
contains a table clicks
in database web
, that table can be accessed in
SEP as hivewebevents.web.clicks
.
Catalogs are defined as follows, depending on the deployment type. Each catalog has a separate named entry:
Kubernetes: In the top-level
catalogs
node in thevalues.yaml
file.Starburst Admin: Separate files for each catalog are added to the the
files/catalog
directory.
For example, to create the hivewebevents
catalog in Kubernetes deployments,
add the following map to the top-level catalogs
node:
hivewebevents: |-
connector.name=hive
For Starburst Admin deployments, create files/catalog/hivewebevents.properties
with the following contents to mount the hive
connector as the
hivewebevents
catalog:
connector.name=hive
See Connector overview for more information about configuring connectors.
Run SEP#
The launcher
script can be used manually or as a daemon startup script.
The location of the script is defined as follows, depending on the deployment type:
Kubernetes: In the top-level
initFile
node in thevalues.yaml
file.Starburst Admin: A default script is included in the the
bin
directory; however, we recommend that you use the Start playbook included with Starburst Admin.
The default script accepts the following commands:
Command |
Action |
---|---|
|
Starts the server in the foreground and leaves it running. To shut down
the server, use Ctrl+C in this terminal or the |
|
Starts the server as a daemon and returns its process ID. |
|
Shuts down a server started with either |
|
Stops then restarts a running server, or starts a stopped server, assigning a new process ID. |
|
Shuts down a possibly hung server by sending the SIGKILL signal. |
|
Prints a status line, either Stopped pid or Running as pid. |
A number of additional options allow you to specify configuration file and
directory locations, as well as Java options. Run the launcher with --help
to see the supported commands, command line options, and default values.
The -v
or --verbose
option for each command prepends the server’s
current settings before the command’s usual output.
SEP can be started as a daemon by running the following:
bin/launcher start
Use the status command with the verbose option for the pid and a list of configuration settings:
bin/launcher -v status
Alternatively, it can be run in the foreground, with the logs and other output written to stdout/stderr. Both streams should be captured if using a supervision system like daemontools:
bin/launcher run
The launcher configures default values for the configuration directory etc
,
configuration files in etc
, the data directory identical to the installation
directory, the pid file as var/run/launcher.pid
and log files in the var/log
directory.
You can change these values to adjust your SEP usage to any requirements, such as using a directory outside the installation directory, specific mount points or locations, and even using other file names. For example, the SEP RPM adjusts the used directories to better follow the Linux Filesystem Hierarchy Standard (FHS).
After starting SEP, you can find log files in the log
directory inside
the data directory var
:
launcher.log
: This log is created by the launcher and is connected to the stdout and stderr streams of the server. It contains a few log messages that occur while the server logging is being initialized, and any errors or diagnostics produced by the JVM.server.log
: This is the main log file used by SEP. It typically contains the relevant information if the server fails during initialization. It is automatically rotated and compressed.http-request.log
: This is the HTTP request log which contains every HTTP request received by the server. It is automatically rotated and compressed.
Next steps#
You next steps depend on your selected deployment option:
If you have selected the Kubernetes deployment option, start with the Kubernetes installation guide It explains best practices for customization, and directs you to instructions for your specific cloud provider.
Starburst Admin deployments should start with the Starburst Admin get started guide