Configuring Starburst Enterprise with CFT#
Starburst Enterprise platform (SEP) has an extensive set of configuration switches that allow it to be tuned for certain specific requirements. Default values are chosen for the best “out of the box” experience. However, if you need to fine-tune SEP behavior, you can do so when using Starburst’s CloudFormation template.
Default configuration#
The following configuration changes are applied automatically for you:
Java heap maximum memory (-Xmx) is set appropriately for the selected EC2 instance type
JVM’s JIT caches are set to 512 MiB
Java is configured to use G1 garbage collector, this is the recommended garbage collector to use when running SEP
If Hive Metastore is configured (refer to Configuring the Hive Metastore Service with CFT), the
hive
catalog is configured with connector configuration left at default values.A query audit event listener is configured in
etc/event-listener-audit-log.properties
. If you have configured another event listener, add the propertyevent-listener.config-files
in the config properties file, and ensure both files are in the list comma-separated list.The
query.max-memory
property is set to1PB
. This setting overrides the low default value.
Note
All configuration changes generated by the CFT are stored in the etc
directory of the SEP installation directory. Because the installation
directory itself is mounted as a RAM disk, files generated by the CFT
configuration are also stored in memory only.
No secrets in any files, such as usernames or passwords in catalog files, are actually stored on disk at any time and the files can not be access from outside the running EC2 instances.
Custom configuration#
When using Starburst’s CloudFormation template, configuration packages for the coordinator, workers and catalogs are used to customize SEP. These configuration packages are used to append or override the default SEP configuration.
The CloudFormation template provides the
AdditionalCoordinatorConfigurationURI
and
AdditionalWorkersConfigurationURI
parameters used to specify the locations
of the configuration packages for the coordinator and workers respectively. See
the following sections for how to create, upload, and use configuration
packages for SEP.
Note
All configuration changes made to your SEP cluster must be performed via the CloudFormation Template. If you manually change the configurations on the instances running SEP, the changes are not persisted.
Creating a configuration package#
A configuration package is a ZIP file with the structure shown below. All files
are optional except for top-level etc/
directory entry.
etc/
config.properties
jvm.config
catalog/
hive.properties
<catalog-name>.properties
Warning
You must use this exact directory structure or SEP is unable to start correctly.
Node name |
Description |
---|---|
|
This global configuration file is optional. Refer to the properties reference documentation for details. |
|
This Java Virtual Machine configuration file is
optional. Certain options, including |
|
If the configuration package contains this file and the Hive Metastore is not configured (refer to Configuring the Hive Metastore Service with CFT) when launching Starburst’s CloudFormation template, then the file must contain the following: connector.name=hive
hive.metastore.uri=thrift://example.net:9083
If the |
|
When such a file is placed in the configuration package, a catalog called
connector.name=<connector_name>
Where Refer to Auxiliary Files in this table for instructions on how to configure properties that refer to additional files. |
Auxiliary files |
If a configuration property in any of the configuration files accepts a
path to an additional file (e.g., Hive’s For example, if you are configuring Hive connector to use
hive.security=file
security.config-file=etc/catalog/hive-security.json
|
Uploading a configuration package to S3#
To use a configuration package ZIP when launching Starburst’s CloudFormation template, it must first be uploaded to S3 to a location of your choice.
Warning
If the configuration package contains sensitive information such as passwords, AWS access keys or Kerberos keytab files, make sure to use an S3 location that is not publicly accessible.
Using a configuration package#
When launching Starburst’s CloudFormation template, you can use the
AdditionalCoordinatorConfigurationURI
and
AdditionalWorkersConfigurationURI
parameters to refer to the configuration
package that should be applied on top of default configuration done by the
template. The URI should be of the form
s3://my_bucket/path/to/configuration/package.zip
. You may decide to use a
single configuration package for use by both the SEP coordinator and workers
or use different packages for each. Additionally, you may provide a
configuration package only for the coordinator or worker.
If you upload to a location that is not publicly accessible, you must use
IamInstanceProfile
parameter when launching the cluster, and the selected
Instance Profile
must allow read access to the selected S3 location.
Updating a configuration package#
Instead of deleting a CloudFormation stack and creating a new one, you can use
the AWS stack update
feature to update the SEP configuration package. You must first create a new
configuration package with the necessary changes, and then upload it to S3 as
described in the previous sections. Then when updating the CloudFormation stack,
enter the new S3 location as values to the
AdditionalCoordinatorConfigurationURI
and
AdditionalWorkersConfigurationURI
parameters. When CloudFormation is
applying the updates, it updates the stack by using the new configuration
package to configure SEP.
AWS CloudFormation does not update the CloudFormation stack if the values to the parameters have not changed. Therefore you must create a new configuration package zip file with a different name. We recommend including a version name within the file name to avoid any confusion when updating your configurations.
For example, if the original configuration package was located at
s3://my_bucket/path/to/configuration/package-1.0.zip
, then create a new
configuration package with a location such as:
s3://my_bucket/path/to/configuration/package-2.0.zip
. Even if you change
the contents of s3://my_bucket/path/to/configuration/package-1.0.zip
and
keep the name, CloudFormation is not able to update the configuration.
Interactions between default and custom configurations#
It is important to note that default values are overridden only for keys where a
customization exists. If no customizations are made, the default value remains.
However, in the case of jvm.config
, additional configuration entries are
appended to the default configuration.
CFT configuration parameters#
The CFT includes numerous configuration parameters that are grouped in different sections. All listed parameters have a description in the AWS console.
Network configuration#
Parameter key |
Description |
Example |
---|---|---|
|
Virtual Private Cloud ID |
vpc-4bd6ca11 |
|
Subnet to use for SEP nodes (must belong to the selected VPC) |
subnet-123abc2b |
|
Set to |
yes |
|
Additional Security Groups for SEP nodes (e.g: allowing SSH access). Must select at least one. |
sg-12e34aeb |
EC2 configuration#
The EC2 configuration details the infrastructure used for your SEP cluster.
Choose a CoordinatorInstanceType and WorkerInstanceType suitable for
your workload. The r4.4xlarge
instance types are chosen by default and work
well for most workloads. See our CFT deployment guide for information about what instance types may be
best for you.
Parameter key |
Description |
Default |
Example |
---|---|---|---|
|
EC2 instance type of the coordinator. |
r4.xlarge |
r5.12xlarge |
|
EC2 instance type of the workers. |
r4.xlarge |
m5.4xlarge |
|
Name of an EC2 KeyPair to enable SSH access to the instance. See SSH keys for more details. |
john.smith |
|
|
Number of dedicated worker nodes (apart from coordinator) to instantiate. Worker nodes are added to an AWS AutoScaling Group. See Auto scaling for more details. |
10 |
|
|
Number of coordinator nodes to instantiate. If there’s more then one, the coordinator offers HA capabilities. This number represents one active coordinator plus the number of optional hot-standby coordinators. For example, if you specify 3, then there is 1 active coordinator and 2 standby coordinators, if the active one fails. See Coordinator high availability for more details. |
1 |
3 |
|
Mount an additional EBS volume on each worker at |
no |
yes |
|
Type of the additional EBS volume mounted on the workers. |
io1 |
gp2 |
|
Size of the additional EBS volume mounted on the workers, in GiB. Use at least 10GiB with the io1 volume type. Value must be in the range of 4 to 16384. |
4 |
100 |
|
The number of possible I/O operations per second for the additional volume. Used only with the io1 volume type. Each 5000 I/O ops require at least 100 GiB storage size on the volume. Value must be in the range of 100 to 20000. |
100 |
2000 |
|
(Debug only) Keep coordinator node running after the coordinator service fails. |
no |
yes |
SEP configuration#
The SEP configuration parameter allow you to configure all SEP-specific aspects of your coordinators and workers in the cluster.
Parameter key |
Description |
---|---|
|
(Optional) URI of S3 zip file with additional configuration for the
coordinator. This zip file must contain the required directory structure. Example
|
|
(Optional) URI of S3 zip file with additional configuration for the
workers. This zip file must contain the required directory structure. Example
|
|
(Optional) URI of a shell script stored on S3 to execute on all nodes. The script runs after SEP is configured,
but before it is started. For example, a bash script can be used to
create directories, install additional software, deploy UDFs, or deploy
other plugins. When the script is executed, a string argument value of
|
|
Port to use for SEP coordinator and therefore the Starburst Enterprise web UI as well as
JDBC and other client connections. Example |
|
URI of the SEP license in
S3. This is only needed when deploying the CFT (using a privately shared
SEP AMI) without subscribing to the AWS Marketplace. Example
|
Hive connector options#
The Hive connector is required if you plan to access data in HDFS or S3. It requires a Hive Metastore so SEP knows where data lives. Refer to the dedicated documentation Configuring the Hive Metastore Service with CFT to determine your configuration.
Parameter key |
Description |
---|---|
|
Determines what metastore is used by the Hive connector. Defaults to
|
|
When external Metastore is used (see |
|
When external Metastore is used (see When set to
Cannot be empty when
Example |
|
When external Metastore is used (see
Example |
|
When external Metastore is used (see
Example |
|
When external Metastore is used (see
Example |
Ranger and LDAP user synchronization#
The following parameters are related to the global access control with Apache Ranger and the related synchronization of Ranger with an LDAP backend for user and group information.
Parameter key |
Description |
---|---|
|
When enabled, Apache Ranger for global access control is added. Defaults to no. Note that all other
settings in this section are ignored if Ranger is disabled. Example
|
|
Administrator password for Ranger. At least 8 characters, including lowercase, uppercase and digit, are required. When reusing an existing external database for Ranger in your CFT stack, you must provide the same password as the initial one, to ensure access remains functional. |
|
Type of database backend used for Apache Ranger.
The default |
|
Hostname of the external PostgreSQL RDBMS server. |
|
Port of the external PostgreSQL RDBMS server. Defaults to 5432. |
|
Name of the database on the external PostgreSQL RDBMS server to use as
Ranger database backend. The database must already exist. Defaults to
|
|
Name of the database user that Ranger uses to manage the database on the
external PostgreSQL RDBMS. The user must exist, have full permissions to
the database and must have CREATEROLE permissions granted. An additional
user ‘ranger’ is created for non-admin database access. If you specify
‘ranger’, the single user is used for all operations. Defaults to
|
|
Password for the database user. |
|
URL to an optional additional Ranger config file in an S3 bucket. A
template is available to download.
Modify the template and upload it to an S3 bucket. The config file is
required for using Solr Audit with Ranger and other customizations.
Example: |
|
URL to an optional bootstrap script in an S3 bucket. The script is run
before Ranger starts. For example, a bootstrap script can be used to
provide truststore files.
Example: |
|
When enabled, Apache Ranger synchronizes users from an external LDAP directory. Requires Ranger to be enabled, disabled by default. The RangerUserSyncConfigFile setting is ignored if Ranger user sync is disabled. |
|
URL to Ranger user synchronization configuration file in S3 bucket. A user sync template is available to download. Create a modified copy of the template and upload it to an S3 bucket. Required if Ranger user sync is enabled. Example: s3://my-bucket/my-config_file.properties |
Advanced AWS S3 configuration#
The advanced AWS S3 configuration parameters only affect the configuration of provisioned Hive catalogs in order to:
configure custom access credentials for AWS S3
access a third-party S3-compatible storage system
In both of these cases, you must set all three of the the parameters listed in the following table:
Parameter key |
Description |
Example |
---|---|---|
|
URI to AWS S3-compatible endpoint. Your choice of endpoint affects your ability to write to buckets. Specifying https://s3.us-east-2.amazonaws.com allows you to write to any bucket in that region, whereas specifying https://mybucket.s3-us-west-2.amazonaws.com restricts the metastore to reading and writing from a single bucket. |
https://s3.us-east-2.amazonaws.com |
|
Access key to AWS S3-compatible storage |
AKIAIOSFODNN7EXAMPLE |
|
Access secret to AWS S3-compatible storage |
wJarXUI/PiYEXAMPLEKEY |
Warning
Failure to set the S3Endpoint
results in an empty value
for both
S3AccessKey
and S3SecretKey
in the hive-site.xml
file generated
for the CFT deployment, resulting in Access Denied exceptions at runtime.
Monitoring#
Parameter key |
Description |
Example |
---|---|---|
|
Enable integration with CloudWatch metrics. When enabled, OS and SEP metrics are reported for each cluster node and a CloudWatch Dashboard with cluster overview is created. Additional CloudWatch fees are charged. Refer to Configuring Starburst Enterprise with CloudWatch in CFT for more details. |
no |
IAM instance#
Parameter key |
Description |
Example |
---|---|---|
|
Optional name of an IAM instance profile to attach to SEP nodes. See Instance profiles for more detail. If you do not specify the InstanceProfile, the CloudFormation Template creates the necessary IAM role privileges. |
my-ec2-instance-profile |
Other parameters#
Parameter key |
Description |
Example |
---|---|---|
|
When enabled, Superset is deployed and started on an EC2 instance |
yes |