Hive and Delta Lake access control with Apache Ranger#

You can use the Apache Ranger integration with SEP to control access to Hive, Delta Lake, and Iceberg data sources configured in any catalog using the SEP Hive, Delta Lake, or Iceberg connectors.

This is specifically useful in the following scenarios:

  • You already use Apache Ranger to control access for these data sources

  • You use SQL standard-based authorization

If you want to control access for other catalogs and connectors with Ranger, you need to adopt global access control.

Note

Hive and Delta Lake access control with Apache Ranger requires a valid Starburst license.

Installation#

Verify you fulfill the Apache Ranger requirements.

If you do not yet have Ranger running and configured, you can take advantage of the supported installation methods for global access control with Apache Ranger.

Configuration#

To use Ranger with Hive, Delta Lake, and Iceberg catalogs, you must set the appropriate property for your setup in the catalog configuration file:

For Hive catalogs:

hive.security=system

For Delta Lake catalogs:

delta.security=system

For Iceberg catalogs:

iceberg.security=system

Additionally, you must have the access-control.properties file as well as the hive.security file for Hive catalogs. The access-control.properties and hive.security files are configured to use Apache Ranger for authorization and Kerberos for authentication.

An example access-control.properties file:

access-control.name=hive-based-ranger

ranger.hive-based.catalogs=hive
ranger.hive-based.fallback-access-control=ALLOW_ALL

ranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hive

ranger.authentication-type=KERBEROS
ranger.kerberos-principal=sep-server/sep-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/sep/conf/sep-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml

An example hive.properties file:

connector.name=hive
hive.metastore.uri=thrift://hive-metastore-node:9083

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/sep-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.trino.principal=hdfs/sep-server-node@EXAMPLE.COM
hive.hdfs.trino.keytab=/etc/hadoop/conf/hdfs.keytab

hive.security=system
System access control configuration properties#

Property name

Description

access-control.name

Set to hive-based-ranger to use Apache Ranger with your catalogs.

ranger.hive-based.catalogs

Specifies the comma-separated names of catalogs to be secured by Hive Ranger policies.

ranger.hive-based.fallback-access-control

Fallback access control to control resources that are out of scope of Hive Ranger policies. There are three options:

  • DENY-ALL: all resources that are not related to controlled catalogs are denied.

  • ALLOW_ALL: all resources that are not related to controlled catalogs are allowed.

  • PRESTO_DEFAULT: Similar to ALLOW_ALL, however it also forbids user impersonating and writing system information such as marking nodes offline, or changing runtime flags.

ranger.hive-based.query-access-control.enabled

Determines if access to query execution and killing should be controlled. Default is true.

ranger.hive-based.udf-access-control.enabled

Determines if access to user defined functions should be controlled. Default is false.

materialized-views.run-as-invoker-default-value

Specifies default value for the run-as-invoker property.

See additional configuration properties available in Ranger.

Resource and privilege relationship limitations#

The integration of Ranger with the Hive, Delta Lake, and Iceberg connectors does not support any kind of relation between a resource, such as a table, and the granted access. This is particularly evident when newly created resources are treated with the access rights of the existing policies.

For example, as a user without full administrative access (ALL PRIVILEGES) you can end up in a situation where you create a new table, but can not grant access to that table to other users. As a workaround you can create global policies that use matching patterns and define the desired default access.

If you want all tables ending with -reporting to be accessible by everyone, you can create a policy that uses the value *-reporting for the table. If you use * for database and * for column, you can grant a specific privilege such as reporting a table and all of its columns in any database.

Alternatively, you can grant all privileges to anyone creating new objects and require privileges to grant access to other users. This policy might be too open for your use case, depending on your security policies, the data, the number of users, and other aspects.

Role management limitations#

SQL supports the SET ROLE [role] statement to enable or disable a role. This feature is not supported in Ranger. All user roles are always enabled. Querying information_schema.applicable_roles and information_schema.enabled_roles yields the same results.