Cloudera Data Platform support#
Use the Starburst Hive connector to query Cloudera Data Platform (CDP) version 7.1 or higher.
Note
The Cloudera Data Platform support requires a valid Starburst Enterprise license.
Requirements#
The Starburst Hive connector can query the Cloudera Data Platform (CDP), available as version 7.x. It also supports the predecessor Cloudera Distributed Hadoop (CDH) platform, available in versions 5.x and 6.x. Support and compatibility vary based on the version you use, and is detailed in the following table:
Cloudera version |
345-e and higher |
---|---|
CDP 7.x |
Yes, see details in following sections |
CDH 6.x |
Yes |
CDH 5.13+ |
Yes |
CDH 5.12 and lower |
No |
The following details apply for CDH 6.x users:
reading tables and data files created by CDH 6.x is supported
transactional table usage is not supported
CDH 6.x Hive cannot read ORC files created by SEP, due to the behavior of the included Hive version
using the included Apache Sentry is not supported
The following details apply for CDH 5.x users:
reading tables and data files created by CDH 5.x is supported
transactional table usage is not supported
Configuration#
Edit your catalog properties file using the Hive connector
Set the metastore to use
thrift-cdp7
when using CDP 7, andthrift
for older versions.Configure the URI to point to your Hive metastore Thrift service
connector.name=hive
hive.metastore=thrift-cdp7
hive.metastore.uri=thrift://cdp-master:9083
SQL support#
Reading data#
CDP support includes read operations on the following tables:
compacted tables
bucketed tables
partitioned tables
unpartitioned tables
The following file formats can be read:
Avro
CSV
ORC ACID
Parquet
RCFile
Writing data#
Write operations, such as CREATE TABLE AS
or CREATE VIEW
and others, are
generally supported.
Write operations, such as INSERT, DELETE and UPDATE, on ORC ACID tables are not supported.
Performance#
Hive metastore and statistics#
The CDP support includes the improved thrift-cdp7
Hive metastore support. It
supports the metastore thrift communication protocol regarding table statistics
management implemented by CDP.
This supports separate handling of a variety of statistics for SEP:
Column statistics
Partition statistics
Table statistics
All statistics handling, when using CDP, is performed by the Hive connector and
the thrift-cdp7
Hive metastore, and is therefore identical to standard
Hive connector usage.