Deploy Hive Metastore with Kubernetes #
This topic covers deploying a Hive metastore service (HMS) with the HMS Helm chart. This deployment is required if you use object stores such as Hive or Iceberg and you are not using an alternate metastore such as AWS Glue.
This topic assumes you are familiar with the HMS and how it is used, as well as with Helm charts and Kubernetes (k8s) tools such as
kubectl. Ensure that you are familiar with the following Starburst Enterprise Kubernetes topics before configuring and deploying the HMS:
- Kubernetes best practices
- Kubernetes requirements
You can deploy HMS using the Starburst Kubernetes (K8s) Helm chart for use with SEP on supported Kubernetes services.
This section describes the process for deploying HMS into your environment. Our reference documentation contains a complete listing of configuration properties and additional customization options in the Helm chart.
Configure the Hive Metastore #
There are several top-level nodes in the HMS Helm chart that you must modify for a minimum HMS configuration:
If you are using TLS, this must also be considered. This section covers getting started with these four configuration steps. Our reference documentation provides details about the content of the HMS Helm chart, including yaml sections not discussed here.
As with SEP, we strongly suggest that you initially deploy HMS with the minimum configuration described in this topic, and ensure that it deploys and is accessible before making any additional customizations described in our reference documentation.
hms-values.yamlthat is used in the Helm
Before you begin #
Configure resources and service account #
Ensure that the following top-level nodes of the Helm chart have the correct values to reflect your environment:
serviceAccountName:- We strongly recommend using a service account for the pod.
resources:- Ensure that the CPU and memory sizes are appropriate for your instance type.
heapSizePercentage:at the default value.
Configure the PostgreSQL backing database #
The configuration properties for the PostgreSQL database are found in the
database: top-level node. As a minimal customization, you must ensure that the
following are set correctly for your environment:
database: type: "internal" internal: port: 5432 databaseName: "hive" databaseUser: "Hive" databasePassword: "HivePassw0rd1234"
You must also configure
volume: persistence and resources, as well as the
resources: for the backing database itself in the
database: node. For a
complete list of available backing database properties, see our reference
database.resources:node is separate from the top level
resources:node. It defines the resources available to the backing database itself, not the HMS server.
Configure storage location and account, and object storage authentication #
The default configuration for the
objectStorage: top-level nodes are empty.
hdfs: top-level node of the Helm chart, add the
to connect to the Hive site defined in the
top-level node to query and create objects.
There are several templates for configuring object storage in the
objectStorage node. For example, you can define how to connect to S3:
objectStorage: awsS3: region: endpoint: accessKey: secretKey: pathStyleAccess: false
There are also templates for Azure and Azure Data Lake, and Google object storage. Secrets are specified directly in the HMS chart.
For a complete list of storage-related configuration options, see our reference documentation.
Configure TLS (optional) #
If your organization uses TLS, you can enable and configure your HMS to work with it. The most straightforward way to handle TLS is to terminate TLS at the load balancer or ingress, using a signed certificate. We strongly suggest this method, which requires no additional configuration in the HMS.
If you choose not to handle TLS using that method, you can instead configure it
expose: top-level node of the HMS Helm chart:
expose: type: "[clusterIp|nodePort|loadBalancer|ingress]"
You must refer to our reference documentation for full details on configuring
each of these
expose: type is
clusterIp. However, this is not suitable for
production environments. If you need help choosing which type is best, refer to
Deploy the HMS #
When the HMS is configured, run the following command to deploy it. In this
example, the minimal values YAML file with the registry
registry-access.yaml is used along with the
containing the HMS customizations:
$ helm upgrade hms starburst/starburst-hive \ --install \ --values ./registry-access.yaml \ --values ./hms-values.yaml
Once the pod is deployed, other services can use this HMS, if needed.
Next steps #
- Complete your HMS configuration
- Add the metastore configuration property to any Hive, Iceberg or Delta Lake catalogs you create. Refer to the specific connector documentation.
Is the information on this page helpful?
- Deploy Hive Metastore with Kubernetes
Is the information on this page helpful?