Usage metrics#
Starburst Enterprise platform (SEP) collects and logs information about all nodes in your cluster. This feature is mostly used for Mission Control and other management solutions. Users can also use the resulting log information for their own monitoring and operational purposes.
Configuration#
Usage metrics collection is configured by a number of properties, that you can
add in etc/config.properties
to configure details.
Property name |
Description |
Default value |
---|---|---|
|
Relative path of the usage log files in the data directory. For
|
var/log/starburst/usage-metrics.log |
|
Maximum size of a single usage log file |
|
|
Maximum number of usage log files |
|
|
Path to the directory on the node in which log files are stored, for
example, |
|
|
The initial delay to before tracking usage, allowing the cluster to start up before metrics gathering starts. |
|
|
Length of the interval between usage metric log entry creation. Ideally, this should be set to a small value, such as the default of 1 min. |
|
|
Number of threads used to gather and write all usage information. |
|
|
Expose the REST API end point for usage metrics, aggregated for all nodes
since cluster start, using the end point is a convenient alternative to
the usage metrics parser command line tool, the coordinator exposes the
result at |
|
Logged metrics#
For each collection interval, a JSON record is created for the entire cluster.It includes the following information:
nodeEnvironment
- value of thenode.environment
propertyinstanceId
- a unique, random instance identifierstartTime
- the epoch time of the cluster start time for the cumulative periodnodeVersion
- version of the clustertime
- the timestamp of the recordcumulativeCpuTime
- total cumulative CPU time used by the JVM processes since the last cluster restartcumulativeAvailableCpuTime
- total available CPU time available since the last cluster restartcores
- total cores within the cluster, as the sum of all cores in the coordinator and in all registered workersactiveNodes
- total number of the registered worker nodes and the coordinatorsignature
- the signature for the data recordlicenseOwner
- value of theowner
field from the license filelicenseType
- license type, if license file is presentlicenseHash
- hash of the whole license file, if present
The following is an excerpt of a JSON log file for a cluster as stored in a log
file at usage-metrics.log.path
, showing the collected data for six
collection intervals:
{"nodeEnvironment":"test","instanceId":"03afa560-c38d-43e1-8800-fff8d807a9b0","startTime":1612411299895,"time":"2021-02-05T16:28:37.640Z","cumulativeCpuTime":"1141.53s","cumulativeAvailableCpuTime":"1105507.84s","cores":16,"activeNodes":1,"signature":"13794b5225a5f79a59ad5bd6ecb92b5a93ffc06521a0ac8fc21b50dffb56cfd8","licenseType":"UNKNOWN"}
{"nodeEnvironment":"test","instanceId":"03afa560-c38d-43e1-8800-fff8d807a9b0","startTime":1612411299895,"time":"2021-02-05T16:29:37.658Z","cumulativeCpuTime":"1142.41s","cumulativeAvailableCpuTime":"1106468.06s","cores":16,"activeNodes":1,"signature":"d2d1ecce8ac95701bd4b67caf5c69273d17107dbe93f70a2662967060587d63d","licenseType":"UNKNOWN"}
{"nodeEnvironment":"test","instanceId":"03afa560-c38d-43e1-8800-fff8d807a9b0","startTime":1612411299895,"time":"2021-02-05T16:30:37.668Z","cumulativeCpuTime":"1143.19s","cumulativeAvailableCpuTime":"1107428.22s","cores":16,"activeNodes":1,"signature":"c5d1f73648e575036cc6ac50cd630c1e2ee355b4d76bdc756b06c26bfdd022f3","licenseType":"UNKNOWN"}
{"nodeEnvironment":"test","instanceId":"03afa560-c38d-43e1-8800-fff8d807a9b0","startTime":1612543186896,"time":"2021-02-05T16:40:52.519Z","cumulativeCpuTime":"0.00s","cumulativeAvailableCpuTime":"0.00s","cores":16,"activeNodes":1,"signature":"431c9b3ef438e78ae2d5ac772461701c14470b24e4bf05638c845dce5ed6d376","licenseType":"UNKNOWN"}
{"nodeEnvironment":"test","instanceId":"03afa560-c38d-43e1-8800-fff8d807a9b0","startTime":1612543186896,"time":"2021-02-05T16:41:52.532Z","cumulativeCpuTime":"1.97s","cumulativeAvailableCpuTime":"960.23s","cores":16,"activeNodes":1,"signature":"700540d92bc76a2b4eb900650e1129815ce9811e7f15dfed4041119c15c84d91","licenseType":"UNKNOWN"}
{"nodeEnvironment":"test","instanceId":"03afa560-c38d-43e1-8800-fff8d807a9b0","startTime":1612543186896,"time":"2021-02-05T16:42:52.544Z","cumulativeCpuTime":"4.10s","cumulativeAvailableCpuTime":"1920.41s","cores":16,"activeNodes":1,"signature":"37762b6799f8afeb4eae285391e76e57110c7ad1220108dda5112b43f13ad353","licenseType":"UNKNOWN"}
Accessing logged metrics#
There are several ways to access usage metrics data, using Starburst visualization tools, or through more manual methods.
Starburst Insights#
We highly recommend enabling and using the Starburst Insights interface for the best and most comprehensive user experience. Insights is accessed in the same way as the Starburst Web UI, and has built-in aggregations and visualizations. Insights accesses a much richer, more comprehensive data set that includes event logger data as well as usage metrics data.
Usage metrics parser#
The usage metrics parser is a command line application available separately
from Starburst Support that aggregates and returns usage metrics from log files. To
install the tool, place the JAR file in a convenient directory, and add it to
your PATH
. Then, change the filename and permissions as follows:
$ mv starburst-usage-metrics-parser-*-executable.jar usage-metrics-parser
$ chmod a+x usage-metrics-parser
Because the usage-metrics-parser is installed separately, it is not available on
the nodes where the logs are saved. After the tool is installed locally, copy
and the log files from usage-metrics.log.path
to a local empty folder, and
use the parser with appropriate arguments:
$ usage-metrics-parser [--from <from>] [(-h | --help)] [--to <to>] [--] [<path>]
<from>
-DateTime
value for the beginning of the range, inclusive<to>
-DateTime
value for the end of the range, exclusive<path>
- the fully-qualified path to the folder containing the log files
The usage metrics parser can be run directly, or with the java
command.
# run directly:
$ ./usage-metrics-parser
# run using the java command:
$ java -jar usage-metrics-parser
The begin and end DateTime
options accept the following formats:
Description |
Syntax |
---|---|
datetime |
time | date-opt-time |
time |
‘T’ time-element [offset] |
date-opt-time |
date-element [‘T’ [time-element] [offset]] |
date-element |
std-date-element | ord-date-element | week-date-element |
std-date-element |
yyyy [‘-‘ MM [‘-‘ dd]] |
ord-date-element |
yyyy [‘-‘ DDD] |
week-date-element |
xxxx ‘-W’ ww [‘-‘ e] |
time-element |
HH [minute-element] | [fraction] |
minute-element |
‘:’ mm [second-element] | [fraction] |
second-element |
‘:’ ss [fraction] |
fraction |
(‘.’ | ‘,’) digit+ |
offset |
‘Z’ | ((‘+’ | ‘-‘) HH [‘:’ mm [‘:’ ss [(‘.’ | ‘,’) SSS]]]) |
If you do not specify either of the --from
or --to
options, the metrics are
computed using all available log file entries, as in the following example:
$ usage-metrics-parser /Users/jsmith/tmp
Specifying both --from
and --to
DateTimes
computes metrics inclusive
of the start, and exclusive of the specified end. In the following example, data
for 2021-02-10T16:59:59Z will be included, and 2021-02-10T17:00:00Z is
excluded:
$ usage-metrics-parser --from 2021-02-05T05:00:00Z --to 2021-02-10T17:00:00Z /Users/jsmith/tmp
Specifying --from
with no end time computes metrics from that time up
to the last log entry, inclusive:
$ usage-metrics-parser --from 2021-02-05T05:00:00Z /Users/jsmith/tmp
Specifying --to
with no start time results in metrics computed from the
earliest available log entry until the specified end time, exclusive:
$ usage-metrics-parser --to 2021-02-10T17:00:00Z /Users/jsmith/tmp
No matter what the specified date range is, usage-metrics-parser
displays
the results in the following format:
cluster restarts: 1, cpu time: 265.55s, available cpu time: 93130.57s, cpu utilization: 0.29%, min cores: 16, max cores: 16
cluster restarts
- count of uniquestartTime
in the date rangecpu time
- aggregatedcumulativeCpuTime
in the specified date rangeavailable cpu time
- aggregatedcumulativeAvailableCpuTime
in the specified date rangecpu utilization
- (cpu time
/available cpu time
) * 100min cores
- the smallest value ofactiveNodes
in the records in specified time periodmax cores
- the largest value ofactiveNodes
in the records in specified time period
Persisting metrics#
Understanding the usage of your cluster requires persisting the usage metrics. By default the metrics are not persisted.
Note
No license information is persisted with usage metrics.
Persisting metrics with Starburst Insights#
The easiest way to persist and view comprehensive metrics is by enabling the
Starburst Insights interface, and ensuring that the
insights.persistence-enabled
configuration property is set to true
in
the the config properties file. See the
Starburst Insights section for complete configuration
requirements.
Persisting using Kubernetes with Helm#
While usage metrics are enabled by default when deploying SEP using Kubernetes with Helm, you must enable a persistent volume to persist your usage metrics.
Persisting with Amazon CloudWatch#
The CFT setup automatically configures persisting the metrics in Amazon CloudWatch. Users of Amazon EKS can also use CloudWatch as well.
Logs exported from CloudWatch are decorated and have to be pre-processed to limit information to the raw format displayed in the preceding section to process it with the metrics parser.
Log management solutions integrated with other cloud platforms, or available separately, allow persisting the metrics by capturing the log file or regularly inspecting the REST endpoint.
You can read more about how Starburst integrates with CloudWatch metrics in our AWS documentation.