Starburst Enterprise in Google Cloud Marketplace #
Starburst Enterprise platform (SEP) is available directly through the Google Cloud Platform Marketplace to run on a variety of instance types. Our Google Cloud Marketplace offering allows you to easily set up a monthly contract, after which you can deploy SEP using the command line or Google’s Click to Deploy on Google Kubernetes Engine (GKE).
Deployment options #
We strongly recommend that you deploy using the command line. Google’s Click to Deploy option is best suited for small proofs-of-concept and limits your customization options.
After you have deployed using the command line, you can customize SEP including adding catalogs and services such as:
- Hive Metastore (HMS)
- Apache Ranger
- Starburst Cache Service
Marketplace support #
Starburst Enterprise offers the following support for our marketplace subscribers:
- Email-only support
- Five email issues to gcpsupport@starburstdata.com per month
- First response SLA of one business day
- Support hours between 9 AM - 6 PM US Eastern Time
Set up your subscription #
Before you begin, you must have a Google Cloud login with the ability to subscribe to services.
To subscribe to SEP through the Google Cloud Marketplace:
- Log in with your billable subscriber account and access the Starburst Enterprise offering directly, or enter “Starburst Enterprise” in the marketplace search field and select Starburst Enterprise - Distributed SQL Query Engine.
- Click Configure.
- On the resulting screen, select either Deploy via command line (recommended), or Click to Deploy on GKE.
Set up the GKE cluster #
- Reach out to Starburst Support to have your
service account added to the Starburst Google Container Registry
(GCR) with the
Storage Object Viewer
role. - Create a GKE Standard cluster with two nodepools. One nodepool is for
SEP while the other is for Ranger and HMS. The following
are the recommended minimal specifications to use for a proof-of-concept
deployment:
Cluster name: my-sep-cluster Location type: zonal (lower latency) K8s version: 1.20.9-gke.1001 Primary nodepool name: default-nodepool Number of nodes: 3 Machine configuration: e2-standard-16 (16 CPU and 64 GB RAM) Supplementary nodepool name: nonsep Number of nodes: 1 Machine configuration: e2-standard-8 (8 CPU and 32 GB RAM)
Deploy from Google Cloud Marketplace #
This CLI deployment method covers all Google Cloud listings.
Get Marketplace license file #
- Navigate to the Google Cloud Marketplace SEP offering and select Configure.
- Set App instance name and switch to Deploy via command line tab.
- Select the appropriate reporting/cluster service account and Generate license key.
- Apply the downloaded license file with the following command:
$ kubectl apply -f license.yaml
- Confirm the secret
starburst-enterprise-license
has been created:$ kubectl describe secret starburst-enterprise-license
- Record the license name that was loaded to your cluster, you need it later
for the deployment. Example:
Name: starburst-enterprise-license-121212 Namespace: default Labels: <none> Annotations: <none> Type: Opaque
Get Helm Chart license #
- Download and Extract the Chart:
$ wget https://storage.googleapis.com/starburst-enterprise/helmCharts/sep-gcp/starburst-enterprise-platform-charts-2.3.0.tgz $ tar -zxvf starburst-enterprise-platform-charts-2.3.0.tgz
- Delete the existing values.yaml file bundled with the Charts. This deployment
uses a custom values file:
$ rm starburst-enterprise-platform-charts/values.yaml
- Apply Application CRD to avoid errors:
$ kubectl apply -f "https://raw.githubusercontent.com/GoogleCloudPlatform/marketplace-k8s-app-tools/master/crd/app-crd.yaml"
Build the values.yaml file #
Create a values.yaml
file in the current working directory, not the chart
directory, with the configuration you wish to deploy. Include the catalog
configuration for any data sources that the cluster needs to access in this yaml
file.
Example template
Copy the below example template and overwrite the defaults with values specific to Google Cloud marketplace.
# Top level values for starburst-enterprise-platform
# Overwrite defaults with values specific to Google Cloud marketplace
deployerHelm:
image: "gcr.io/starburst-public/starburstdata/deployer:2.3.0"
reportingSecret: ENTERPRISE_LICENSE_NAME
metricsReporter:
image: "gcr.io/starburst-public/starburstdata/metrics_reporter:2.3.0"
imageUbbagent: "gcr.io/cloud-marketplace-tools/metering/ubbagent:latest"
starburst-enterprise:
image:
repository: "gcr.io/starburst-public/starburstdata"
tag: 2.3.0
initImage:
repository: "gcr.io/starburst-public/starburstdata/starburst-enterprise-init"
tag: 2.3.0
prometheus:
enabled: false
catalogs:
bigquery: |
connector.name=bigquery
bigquery.project-id=GOOGLE_PROJECT_ID
hive: |
connector.name=hive-hadoop2
hive.allow-drop-table=true
hive.metastore.uri=thrift://hive:9083
starburst-insights: |
connector.name=postgresql
connection-url=jdbc:postgresql://INSIGHTS_DATABASE_INSTANCE:5432/insights
connection-user=postgres
connection-password=INSIGHTS_DATABASE_PASSWORD
coordinator:
additionalProperties: |
insights.persistence-enabled=true
insights.metrics-persistence-enabled=true
insights.jdbc.url=jdbc:postgresql://INSIGHTS_DATABASE_INSTANCE:5432/insights
insights.jdbc.user=postgres
insights.jdbc.password=INSIGHTS_DATABASE_PASSWORD
insights.authorized-users=.*
etcFiles:
properties:
config.properties: |
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
discovery-server.enabled=true
discovery.uri=http://localhost:8080
usage-metrics.cluster-usage-resource.enabled=true
http-server.authentication.allow-insecure-over-http=true
web-ui.enabled=true
http-server.process-forwarded=true
event-listener.properties: |
event-listener.name=event-logger
jdbc.url=jdbc:postgresql://INSIGHTS_DATABASE_INSTANCE:5432/insights
jdbc.user=postgres
jdbc.password=INSIGHTS_DATABASE_PASSWORD
password-authenticator.properties: |
password-authenticator.name=file
nodeSelector:
starburstpool: STARBURST_COORDINATOR_NODE_POOL
resources:
limits:
cpu: 15
memory: 56Gi
requests:
cpu: 15
memory: 56Gi
expose:
type: clusterIp
ingress:
serviceName: starburst
servicePort: 8080
host: STARBURST_URL
path: "/"
pathType: Prefix
tls:
enabled: true
secretName: tls-secret-starburst
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt
starburstPlatformLicense: sep-license
userDatabase:
enabled: true
users:
- password: ADMIN_PASSWORD
username: ADMIN_USERNAME
worker:
autoscaling:
enabled: true
maxReplicas: 10
minReplicas: 1
targetCPUUtilizationPercentage: 80
deploymentTerminationGracePeriodSeconds: 30
etcFiles:
properties:
event-listener.properties: |
event-listener.name=event-logger
jdbc.url=jdbc:postgresql://INSIGHTS_DATABASE_INSTANCE:5432/insights
jdbc.user=postgres
jdbc.password=INSIGHTS_DATABASE_PASSWORD
nodeSelector:
starburstpool: STARBURST_WORKER_NODE_POOL
resources:
limits:
cpu: 15
memory: 56Gi
requests:
cpu: 15
memory: 56Gi
starburstWorkerShutdownGracePeriodSeconds: 120
# Hive Chart
starburst-hive:
enabled: true
image:
repository: "gcr.io/starburst-public/starburstdata/hive"
tag: 2.3.0
gcpExtraNodePool: EXTRA_NODE_POOL
database:
external:
driver: org.postgresql.Driver
jdbcUrl: jdbc:postgresql://HIVE_DATABASE_INSTANCE:5432/hive
user: postgres
password: HIVE_DATABASE_PASSWORD
type: external
objectStorage:
gs:
cloudKeyFileSecret: service-account-key
expose:
type: clusterIp
# Ranger Chart
starburst-ranger:
enabled: true
admin:
image:
repository: "gcr.io/starburst-public/starburstdata/starburst-ranger-admin"
tag: 2.3.0
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 1
memory: 1Gi
serviceUser: ADMIN_USERNAME
gcpExtraNodePool: EXTRA_NODE_POOL
usersync:
image:
repository: "gcr.io/starburst-public/starburstdata/ranger-usersync"
tag: 2.3.0
database:
external:
databaseName: ranger
databasePassword: RANGER_DATABASE_PASSWORD
databaseRootPassword: RANGER_DATABASE_ROOT_PASSWORD
databaseRootUser: postgres
databaseUser: ranger
host: RANGER_DATABASE_INSTANCE
port: 5432
type: external
datasources:
- host: coordinator
name: starburst-enterprise
password: ADMIN_PASSWORD
port: 8080
username: ADMIN_USERNAME
expose:
type: clusterIp
loadBalancer:
name: ranger
ports:
http:
port: 6080
ingress:
serviceName: ranger
servicePort: 6080
host: RANGER_URL
path: "/"
pathType: Prefix
tls:
enabled: true
secretName: tls-secret-ranger
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt
initFile: files/initFile.sh
Confirm that you have set the following placeholder entries in the above yaml to match your environment and configuration. You can also include any other configuration required, such as SSO, LDAP, Ingress and custom catalogs to this file:
- ENTERPRISE_LICENSE_NAME - The name of the license in license.yaml that was
uploaded using
kubectl
. - GOOGLE_PROJECT_ID - The Google Project you are deploying to.
- INSIGHTS_DATABASE_INSTANCE - Hostname/IP for the query logger database.
- INSIGHTS_DATABASE_PASSWORD - ‘postgres’ user password for the Insights database.
- STARBURST_COORDINATOR_NODE_POOL - Node Pool for the Coordinator.
- STARBURST_WORKER_NODE_POOL - Node Pool for the worker nodes. Can be the same as Coordinator.
- EXTRA_NODE_POOL - Node pool for Ranger and Hive.
- ADMIN_USERNAME - Starburst Insights login user.
- ADMIN_PASSWORD - Starburst Insights login password.
- HIVE_DATABASE_INSTANCE - Hostname/IP for the Hive database instance.
- HIVE_DATABASE_PASSWORD -
postgres
user password for the Hive database. - RANGER_DATABASE_INSTANCE - Hostname/IP for the Ranger database instance.
- RANGER_DATABASE_ROOT_PASSWORD -
postgres
user password for the Ranger database. - RANGER_DATABASE_PASSWORD -
ranger
user password for the Ranger database.
Run the Helm deployment #
After you have configured the values file for your environment, run the following:
$ helm upgrade starburst-enterprise ./starburst-enterprise-platform-charts --install --values values.yaml
Validate your deployment #
- You can verify that all pods are in a running state or a completed state:
$ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coordinator-64cfdb94fd-v6bxv 2/2 Running 0 4h8m 10.28.3.8 gke-test-mp-cluster-default-pool-5b89684f-6g3b <none> <none> hive-7c8b5b5495-v9gwz 2/2 Running 0 4h8m 10.28.0.27 gke-test-mp-cluster-nonsep-0be06378-b718 <none> <none> ranger-7c6b59bdd5-b9v8s 2/2 Running 0 4h8m 10.28.0.28 gke-test-mp-cluster-nonsep-0be06378-b718 <none> <none> starburst-enterprise-1-lic-secret-job-nfmqg 0/1 Completed 0 4h8m 10.28.2.31 gke-test-mp-cluster-default-pool-5b89684f-qfvp <none> <none> starburst-enterprise-1-metrics-reporter-9f9f5f77-7nvd2 2/2 Running 0 4h8m 10.28.2.30 gke-test-mp-cluster-default-pool-5b89684f-qfvp <none> <none> worker-76ff548b96-c2rwz 1/1 Running 0 4h8m 10.28.2.32 gke-test-mp-cluster-default-pool-5b89684f-qfvp <none> <none> worker-76ff548b96-wj996 1/1 Running 0 4h8m 10.28.1.12 gke-test-mp-cluster-default-pool-5b89684f-kfz2 <none> <none>
- After deployment, confirm that the metrics reporter is able to submit metrics
to Google Cloud metering service:
$ kubectl logs deployment/starburst-enterprise-metrics-reporter -c metrics-reporter 2021-09-21 09:03:02 INFO Trying to find a coordinator service 2021-09-21 09:03:02 INFO Trying to find Starburst Enterprise Coordinator Deployments... 2021-09-21 09:03:02 INFO Trying to find Starburst Enterprise Worker Deployments... 2021-09-21 09:03:02 INFO Trying to get usage metrics from starburst 2021-09-21 09:03:02 INFO Number of cores in this 60 second cycle: 45 2021-09-21 09:03:02 INFO Report submission status: 200 : OK : 2021-09-21 09:04:01 INFO Trying to find a coordinator service 2021-09-21 09:04:01 INFO Trying to find Starburst Enterprise Coordinator Deployments... 2021-09-21 09:04:01 INFO Trying to find Starburst Enterprise Worker Deployments... 2021-09-21 09:04:01 INFO Trying to get usage metrics from starburst 2021-09-21 09:04:01 INFO Number of cores in this 60 second cycle: 45 2021-09-21 09:04:01 INFO Report submission status: 200 : OK :
Every minute you should see:
Report submission status: 200 : OK
Number of cores in this 60 second cycle: 45:
- 1 coordinator * 15 vCPUs + 2 workers * 15 vCPUs = 45 vCPUs in total
- You can also verify that there are no errors reported by
ubbagent
:$ kubectl logs deployment/$APP_INSTANCE_NAME-metrics-reporter -c ubbagent Listening locally on port 6080 I0921 08:58:36.823772 1 main.go:104] Listening locally on port 6080 I0921 09:00:02.498347 1 aggregator.go:88] aggregator: received report: cpu_usage_in_seconds_pricing I0921 09:00:36.823907 1 aggregator.go:197] aggregator: sending 1 report I0921 09:00:36.825296 1 servicecontrol.go:88] ServiceControlEndpoint:Send(): serviceName: starburst-presto.mp-starburst-public.appspot.com body: {"operations":[{"consumerId":"project:pr-a19b90ff70ab335","endTime":"2021-09-21T09:00:02Z","metricValueSets":[{"metricName":"starburst-presto.mp-starburst-public.appspot.com/cpu_usage_in_seconds_pricing","metricValues":[{"endTime":"2021-09-21T09:00:02Z","int64Value":"2700","startTime":"2021-09-21T09:00:02Z"}]}],"operationId":"44a27787-b61b-4b63-80ce-75efee538f98","operationName":"starburst-presto.mp-starburst-public.appspot.com/report","startTime":"2021-09-21T09:00:02Z","userLabels":{"goog-ubb-agent-id":"28c57363-9d08-4a1e-bdbd-48a6ff2e907c"}}]} I0921 09:00:37.176186 1 servicecontrol.go:112] ServiceControlEndpoint:Send(): success I0921 09:01:02.081081 1 aggregator.go:88] aggregator: received report: cpu_usage_in_seconds_pricing
- If you need to re-deploy the entire application you may want to delete the
secret with SEP license first to avoid harmless crashes of the license job:
$ kubectl delete -f sep_manifest.yaml $ kubectl delete secret sep-license
Next steps #
Review our Kubernetes configuration documentation:
The following pages introduce key concepts and features in SEP:
Is the information on this page helpful?
Yes
No
Is the information on this page helpful?
Yes
No