Apache Ranger overview#

Apache Ranger is a tool to manage access control policies for Hadoop/Hive and related object storage systems such as Delta Lake. It provides a simple and intuitive web-based console for creating and managing policies controlling access to the data.

The Privacera Platform, powered by Apache Ranger is an extended commercial distribution of Apache Ranger, that can also be used.

Starburst Enterprise platform (SEP) can be integrated with Ranger as an access control system. When a query is submitted to SEP, SEP parses and analyzes the query to understand the privileges required by the user to access objects such as schemas and tables. Once a list of these objects is created, SEP communicates with the Ranger service to determine if the request is valid. If the request is valid, the query continues to execute. If the request is invalid, because the user does not have the necessary privileges to query an object, an error is returned. Ranger policies are cached in SEP to improve performance.

Authentication is handled outside of Ranger, for example using LDAP, and Ranger uses the authenticated user and user groups to associate with the policy definition.

Note

SEP integration with Ranger requires a valid Starburst Enterprise license.

For more information on integrating Apache Ranger with SEP, see the following pages:

Requirements#

Before you configure SEP for any integration with Apache Ranger or Privacera Platform, verify the following prerequisites:

The SEP coordinator and workers have the appropriate network access to communicate with the Ranger service. Typically this is port 6080 or 6182, if SSL is used.
Apache Ranger 2.0.0 or higher must be used
Privacera Platform version 4.7.0.3 is recommended
A policy covering all users that provides read access to system.metadata, system.jdbc, and system.runtime. Access to the system.jdbc schema is granted automatically.

Ranger usage options#

SEP offers the following different integrations with Ranger:

Global access control with Apache Ranger, which works for any catalog using any connector and requires the SEP Ranger plugin to be installed with Ranger.
Hive and Delta Lake access control with Apache Ranger, which works for any catalog using the Hive connector and can use an unmodified Apache Ranger.

We highly recommend implementing Ranger for global access control. This allows you to use Ranger policies for all configured catalogs.

Note

When used for global access control, the Starburst Ranger integration extends the basic functionality of Ranger with the Starburst Ranger plugin. It allows Ranger to provide access control for all data sources defined by a catalog in Starburst Enterprise, and all other data sources supported by SEP.

Key concepts#

The concepts and features described in the following section apply to all Ranger usage.

Policies#

A policy is a combination of set of resources and the associated privileges. Ranger provides a user interface, or optionally a REST API, to create and manage these access control policies.

Resource sets#

A resource set includes one or more resources of different resource types. Wildcard characters are supported to select a number of resources based on a pattern.

catalog
catalog - schema
catalog - schema - table
catalog - schema - table - column
catalog - schema - function-kind
catalog - schema - function-kind - catalog-schema-function
catalog - schema - procedure
catalog - session property
data-product
data-product - data-product-domain
function
system session property
query
user

As you can see from the list above, some resources are hierarchically organized within a catalog and below. This allows you for example to restrict access to a complete catalog, a specific schema, or table or even down to a column, procedure, or function within a schema.

For example, you can define a set of resources that allows you to restrict access to two tables credit-info and cards-info in all schemas in the hdfs catalog.

Catalog: hdfs
Schema: *
Table: credit-info, cards-info

A set of resource works as a primary key for a policy. It needs to be unique. Multiple policies however may cover a single resource because of the wildcard.

It is best to create fine grained resource sets, especially when using column masking and row filtering. Using policies with wildcards can create hard to understand, or even unpredictable behavior, when there are multiple policies that apply to the same resource. For example, both *-schema-table-column and catalog-*-table-column apply to column in table in catalog. The second definition is more specific and therefore preferred to keep your configuration easier to understand.

Privilege sets#

A set of privileges consists of one or more user groups, roles and users, and a set of access types for the specified resource set. Privileges can allow or deny operations.

The catalog, schema, table and column resources, which grant access to resources for queries, have the following access types.

SELECT to read data from the resource
INSERT to add data to the resource
UPDATE to change data in the resource
DELETE to remove data from the resource
CREATE to create a resource
ALTER to alter a resource
DROP to remove a resource
SHOW to show information about a resource
PUBLISH to publish a data product
OWNERSHIP to claim ownership of the resource, which provides complete access
IMPERSONATE to impersonate another user, and therefore use the privileges of that user

In addition there are privileges that determine access to queries and their usage, and are therefore of a more general nature.

SELECT to list queries.
EXECUTE to initiate processing of any query. Without this privilege user action is extremely limited.
KILL to stop processing of any query.

Users, groups, and roles#

Users, groups, and roles are sourced from your configured authentication system, ideally a connected LDAP directory, and are used the target users for each policy.

Column-level authorization#

SEP enforces column-level privileges granted to roles. For example, if a user is only granted access to a subset of table columns, they are only able to query from these columns. If they execute an SQL statement that refers to other columns, the query fails with an error.

Column masking#

SEP’s Apache Ranger integration supports most of the column masking methods that are supported in Hive with Ranger. SEP does not distinguish upper case, lower case and digital characters when masking. x is used for all mentioned character types.

Note

In the case of usage of any unsupported column masking, MASK_NULL is used.

Service and catalog integrations#

In addition to enforcing the policies in Apache Ranger, SEP integrates with the Apache Ranger Key Management Service, and has support for AWS Glue Data Catalog, row level filtering and tag-based policies.

Location privileges#

The SEP integration with Ranger allows you to set location privileges to ensure the correct users have access to create objects in specific object storage locations. Location privileges support CREATE TABLE and CREATE SCHEMA operations, as well as CALL system.register_partion for Hive catalogs.

To enable Ranger location privileges, create a location-access-control.properties file in your etc directory with the following attributes, replacing the example text with your own:

location-access-control.name=ranger
ranger.policy-rest-url=some_url
ranger.service-name=service_name
ranger.username=name
ranger.password=pass

For Kubernetes deployments, define a new etcFiles.properties.location-access-control.properties section of the top-level coordinator node in the values.yaml file:

coordinator:
  etcFiles:
    properties:
      location-access-control.properties: |
        location-access-control.name=ranger
        ranger.policy-rest-url=some_url
        ranger.service-name=service_name
        ranger.username=name
        ranger.password=pass

In Ranger, you must create the appropriate policies as locations are denied by default. Location privileges support recursive or non-recursive policies. For example, if you have a recursive policy with the location /tmp/allow then /tmp/allow/nested is valid.

Additionally, policies can contain wildcards, such as /tmp/*/my_table.

Features and use cases#

The following features and use cases are applicable with all Ranger usage.

Hive and other catalog authorization set up#

The Ranger integrations replace any other authorization setup for the data source.

For example, you have to treat is as a replacement for authorization by the user configured for the connection to the data source, or any restrictions in the data source utilized by user impersonation or credential pass-through. It is important to avoid these other configurations, and let Ranger manage all access to keep the overall setup simple and manageable.

When catalogs use the Hive connector, make sure authorization checks are disabled in each catalog properties file. Edit the catalog properties file with the following configuration:

hive.security=allow-all

Controlling access to User Defined Functions with Ranger#

You can use the Ranger system access control to enforce User Defined Function (UDF) policies. A UDF in SEP is deployed as a plugin (Functions) and stored in the SEP global namespace. This global namespace is managed at the system access control level.

This is independent of the global and Hive access control with Ranger and the Privacera Platform.

The Ranger resource hierarchy for all UDF policies requires an associated database (or schema) namespace when creating the policy. Because the global namespace is independent of any connector namespace, this poses a slight challenge to control access to UDFs using Ranger. To overcome this you must specify $sep as the database name in Ranger. This keeps all SEP functions under the $sep database in Ranger resource hierarchy.

To configure Ranger system access control for UDFs, you need to add the following to a system access control property file e.g. named etc/access-control-ranger-udf.properties:

access-control.name=ranger-system-access-control
ranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hive

ranger.authentication-type=KERBEROS
ranger.kerberos-principal=sep-server/sep-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/sep/conf/sep-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml

This additional configuration is needed because the Ranger system access control uses an independent Ranger client from the Hive access control. Only one Ranger system access control can be defined, while Hive access control can be configured separated for each Hive catalog. In the scenario where there are multiple Hive catalogs and multiple Ranger services, only one of those Ranger services can be used to manage the UDF policies.

Note

All Ranger configuration properties supported for global access control with Ranger are supported for Hive access control. Ranger properties related to row filtering or column masking are unsupported in global access control.

Audit#

When Ranger audit is implemented, whenever access is granted or denied through Ranger, an audit event is logged if auditing is enabled in a given resource policy.

Ranger audit is configured in the Ranger-specific file /etc/hive/conf/ranger-hive-audit.xml. Configuring Ranger audit is complex, and outside the scope of Starburst documentation; please refer to your Ranger documentation to learn how to set up audit optimally for your environment.

For Audit to work with SEP, the location of the file must be specified in your catalog properties file:

ranger.config-resources=/etc/hive/conf/ranger-hive-audit.xml

Caveat regarding performance

Ranger audits are performed by accessing the internal table system.runtime.queries. Any access to the table is logged.

The Trino web UI makes heavy use of the queries table. The property ranger.audit.system-runtime-queries.enabled is set to true by default and controls this logging behavior. Using the web interface causes a flood of audit events. Setting the property to false disables this audit logging.

Caching#

Caching is used to improve performance and reduce the number of requests to the Ranger service. Caching is enabled through configuration properties, which can be found in the Ranger installation and configuration page.

Authorization limitations#

Authorization information cannot be accessed by querying the following tables such as information_schema.roles, information_schema.applicable_roles, information_schema.enabled_roles, and information_schema.table_privileges.

Configuration properties#

The properties listed in this table apply to Ranger-related configurations in system access control properties files as well as catalog files using the Hive connector for Hive access control with Apache Ranger or the Privacera Platform.

Note

Some properties, such as ranger.row-filtering.enabled, are unsupported when Ranger is configured for global access control.

Ranger properties#
Property name	Description
`ranger.policy-rest-url`	URL address of the Ranger REST service, required to use HTTPS with Kerberos `authenticationpolicy-rest-url`.
`ranger.service-name`	SEP Ranger plugin service name.
`ranger.authentication-type`	Authentication type for SEP connecting to Ranger, `BASIC` (default) or `KERBEROS`.
`ranger.username`	SEP Ranger plugin user name. This property is used when `ranger.authentication-type=BASIC` is set.
`ranger.password`	SEP Ranger plugin user password. This property is used when `ranger.authentication-type=BASIC` is set.
`ranger.kerberos-principal`	Ranger service kerberos principal.
`ranger.kerberos-keytab`	Path to the Ranger service kerberos keytab file.
`ranger.plugin-policy-ssl-config-file`	Path to Ranger plugin SSL configuration.
`ranger.policy-cache-dir`	Path to ranger cache dir for policies. This allows loading policies from cache on startup, even though Ranger Policy Admin was not available at the moment.
`ranger.policy-refresh-interval`	Interval determining how often authorization polices are refreshed. The highest latency after which changes in Ranger authorization policies are visible in SEP. Default is `30s`.
`ranger.policy-connection-timeout`	Ranger service connection timeout. Default is `120s`.
`ranger.policy-read-timeout`	Ranger service read timeout. Default is `30s`.
`ranger.user-group-source`	Source of user/group information, `RANGER` (default) or `STARBURST`. Only supports LDAP group provider and File group provider for group information with `STARBURST`, see details in SEP user synchronization.
`ranger.cache-ttl`	Period for how long group mapping information is cached in SEP. `0ms` disables the cache. If `ranger.user-group-source` is `STARBURST`, controls the period between user sync operations for a single user. Default is `30s`.
`ranger.cache-refresh-interval`	Period for how long group mapping information is refreshed in SEP. Any value greater than `ranger.cache-ttl` disables it. Default is disabled, `0ms`.
`ranger.row-filtering.enabled`	To enable row filtering, set this flag to `true`. This setting is not supported when Ranger is configured for global access control (where row filtering is always enabled), and causes cluster startup to fail if set. Note that there are semantic differences between the SEP and HiveQL SQL variants. Default is `false`.
`ranger.wild-card-resource-matching-for-row-filtering`	To enable resource wildcard matching for row filtering, set this flag to `true`. When two policies match a single resource, the one without wildcards is used. When multiple wildcard policies match, it is undetermined which one is used. This property is ignored when Ranger is configured for global access control. Default is `false`.
`ranger.wild-card-resource-matching-for-column-masking`	To enable resource wildcard matching for column masking, set this flag to `true`. When two policies match a single resource, the one without wildcards is used. When multiple wildcard policies match, it is undetermined which one is used. This property is ignored when Ranger is configured for global access control. Default is `false`.
`ranger.config-resources`	Additional XML configuration files which are read before applying your SEP Ranger configuration. Useful for reusing existing HIVE-LEVEL RANGER configuration with things like Ranger Audit configuration.
`ranger.sql.enabled`	Enable Ranger policy management with SQL as supported for Hive access control only. Default is `true`.
`ranger.catalog-session-properties.skip-authorization-check`	Skip authorization check when setting catalog session properties. This property can only be used in Hive and Delta Lake catalog access control, not global access control. Defaults to `false`.
`ranger.auth-to-local.config-file`	Path to an auth-to-local translations file to configure username translation for Ranger.
`ranger.access-check-on-external-location.enabled`	To enable URL policies on external table location, set this flag to `true`. This setting is not supported when Ranger is configured for global access control (where URL policies are not supported). Note that URL policies are enforced only during CREATE TABLE operation. Defaults to `false`.

Ensure Ranger works with TLS#

If your organization implements TLS for network traffic between SEP and Ranger, you must ensure that both are correctly configured. You must add the SEP certificate to a JKS keystore and configure it in the Ranger SSL configuration file:

All catalogs accessing Ranger must define the ranger.plugin-policy-ssl-config-file property and point to the XML configuration file:

ranger.plugin-policy-ssl-config-file=/etc/starburst/ranger-policymgr-ssl.xml

If Ranger and SEP use globally trusted certificates, you can use the following Ranger SSL configuration file:

<configuration>
<!--  The following properties are used for 2-way SSL client server validation -->
  <property>
    <name>xasecure.policymgr.clientssl.keystore</name>
    <value>/etc/starburst/sb-admin-keystore.jks</value><!--coordinator's cert goes here-->
  </property>
  <property>
    <name>xasecure.policymgr.clientssl.keystore.credential.file</name>
    <value>jceks://file/etc/starburst/sb-admin-keystore.jceks</value><!--coordinators jks file password store-->
  </property>
</configuration>

Without globally trusted certificates, you need to add Ranger’s certificate to a JKS truststore and link it in the XML file:

<configuration>
    <!--  The following properties are used for 2-way SSL client server validation -->
    <property>
        <name>xasecure.policymgr.clientssl.keystore</name>
        <!--This a certificate. Store the file with the coordinator certificate and private key. -->
        <value>/etc/hive/conf/ranger-plugin-keystore.jks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.truststore</name>
        <!--This a certificate. Store the file with the coordinator certificate and private key. -->
        <value>/etc/hive/conf/ranger-plugin-truststore.jks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.keystore.credential.file</name>
        <!-- This file holds the password from Starburst keystore -->
        <value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.truststore.credential.file</name>
        <value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
    </property>
</configuration>

More information about working with certificates is available in JKS files and PEM files. Avoid renaming JCEKS files after generating them, since that invalidates them.

When using the SEP Helm charts, you have to configure the XML file and the access control file inline of the YAML files.

When using the Starburst Ranger plugin, you also need to configure TLS for the connection between Ranger and SEP.

Ensuring Ranger works with your authentication service#

You need to configure SEP to work with the authentication service used by Ranger. Ranger needs to have information about your users, groups, and roles in your authentication system. There are two ways of getting that information into Ranger:

SEP user sync - SEP pushes authenticated user data to Ranger directly. This happens whenever SEP needs to check user permissions with Ranger, but is cached per user so it does not happen too often. This is simpler to set up, as it only requires setting the ranger.user-group-source configuration property to STARBURST. In addition to authentication, you need to have a user group provider configured in SEP using group-provider.properties, for example: LDAP group provider, to get a list of groups for each logged in user. It only works if user groups in Ranger policies have the same names that SEP user groups have. If it is not configured when a user is created, they are not assigned to any groups in Ranger.
Ranger user sync - a separate ETL process from authentication service to Ranger. Requires separate setup of the user sync process.

While SEP does offer Kerberos support, SEP encourages the use of LDAP. The following configuration property is provided:

LDAP#

If your organization uses LDAP system for user and group information, Ranger can use that information to define role-based access to catalogs using any connector, as well as a number of other system resources. Policies in Ranger define access and authorization, and are created with the Ranger user interface. Users, groups, and roles are sourced from your connected LDAP directory and are used to target users for a Ranger policy. Each policy combines user and group information with a resource and access rights to the resource.

With the K8s and AWS installation methods, all details are already configured. For existing Ranger usage or manual installation, you must ensure that Ranger has data from your LDAP directory provider, and that a synchronization process (either SEP or Ranger user sync) is in place.

The process of connecting your existing Ranger installation depends on your particular LDAP implementation as well as your Ranger configuration. Learn more about that in the LDAP Authentication page.

SEP user synchronization#

SEP can sync user names and user groups to Ranger when using LDAP, LDAPS or OAuth2 for authentication. This requires setting ranger.user-group-source in Configuration properties to STARBURST.

Note

When using sync mode with ranger.user-group-source, you do not need to set up Configure Ranger user synchronization.

This mode requires a configured LDAP group provider or File group provider for groups information in SEP using group-provider.properties. No other methods of resolving group membership are supported. It only works if user groups in Ranger policies have the same names that SEP user groups have. If it is not configured when a user is created, they are not assigned to any groups in Ranger.

Alternatively, Kubernetes users can use the alternative LDAP and Ranger user sync support.

Kerberos#

SEP can use Kerberos authentication page, and the Ranger integration also support Kerberos.

Warning

Most organizations that use Kerberos also use LDAP. We strongly encourage you to use LDAP instead of Kerberos, due to the relative unreliability of Kerberos servers, their lack of clear error messaging, and their rigid OS and JVM dependencies.

A sys admin Ranger user (user with role ROLE_SYS_ADMIN) must exist that matches SEP Kerberos principal ranger.kerberos-principal when Kerberos auth is used, or SEP Ranger plugin username ranger.username and password ranger.password if BASIC auth is used.

The SEP Kerberos principal is translated to Ranger user name via auth-to-local hadoop rules from core-site.xml.

Note

Ranger version 2.1.0 removes the possibility to connect to Kerberized Ranger using basic user and password authentication. You have to add the following configuration to your Ranger core-site.xml file to restores this possibility by allowing unauthenticated access:

<property>
  <name>ranger.admin.allow.unauthenticated.access</name>
  <value>true</value>
</property>

Alternatively, you can configure SEP to authenticate to Ranger using Kerberos.

Starburst Ranger CLI#

You can use the Starburst Ranger CLI to manage integration of SEP with Apache Ranger or the Privacera Platform for the following tasks:

Service definition setup for initial configuration
Service definition setup for plugin upgrade
Ranger user group management
Ranger user management

The command line application is an executable Java archive, that requires Java 17 or higher available on the system path. You can download it from Starburst and install it with the following steps on Linux or macOS.

Ensure the computer is able to reach the Ranger server via HTTP, since the CLI interacts with the REST API. This can be the coordinator, or worker in the cluster or any other computer.
Verify Java with java -version
Move the binary to a directory in your path, such as ~/bin and rename it.
```
mv starburst-ranger-cli-*-executable.jar ~/bin/starburst-ranger-cli
```
Verify the folder is on the path.
```
echo $PATH
```
If necessary, add the folder.
```
export PATH=~/bin:$PATH
```
Now you can run the help command to verify the CLI works.
```
starburst-ranger-cli help
```

The resulting output is similar to the following:

Starburst Ranger command line interface
USAGE:
starburst-ranger-cli [--properties=<configFile>] [-p=<String=String>]... [COMMAND]
...

Help commands#

The help command can provide details about the other commands and their specific options, if you append help to the desired command, with a few examples shown in the following block:

starburst-ranger-cli help
starburst-ranger-cli user help
starburst-ranger-cli service-definition help
starburst-ranger-cli group create help
starburst-ranger-cli user create help

Windows installation is supported as well and requires similar commands. You can also run the application directly with Java on Linux, macOS or Windows.

java -jar starburst-ranger-cli-*-executable.jar

You have to supply the connection details from SEP to Ranger in a properties file. Typically you can simply use the Ranger access control properties file by copying it to the computer running the CLI. Alternatively you can use individual properties as command line options.

Use the --properties to specify the full path to a .properties file that contains one or more key=value pairs on each line
Use the -p option for each property separately with the format -p=key=value.

In the following examples these properties are usually omitted, but they are necessary to find the Ranger endpoint.

Ranger user group management#

You can manage user groups in Ranger with the CLI. Properties are used to provide the details for Ranger access.

The following operations are available:

create a group
get a list of all groups
get a list of all groups a certain user belongs to
delete a group

It uses uses access control properties and positional parameters to pass group names using the following syntax:

starburst-ranger-cli group get [username]
starburst-ranger-cli group create group1 [group2] ...
starburst-ranger-cli group delete group1 [group2] ...

The following complete examples gets all groups in Ranger specified by the properties file and displays them:

starburst-ranger-cli group get --properties=ranger-access-control.properties

If a username is specified, only the groups of the user are displayed:

starburst-ranger-cli group get –properties=ranger-access-control.properties myusername

You can create one or multiple groups, and the identifier of each created group is displayed as confirmation:

starburst-ranger-cli group create group1 [group2] ...

Deleting groups is similar:

starburst-ranger-cli group delete group1 [group2] ...

Ranger user management#

You can manage users in Ranger with the CLI. Properties are used to provide the details for Ranger access.

The following operations are available:

create a user
get user details
delete a user

It uses uses access control properties and a mixture of positional parameters and options to pass user information using the following syntax:

starburst-ranger-cli user get
starburst-ranger-cli user create
starburst-ranger-cli user delete

A full example to get a user can look like this:

starburst-ranger-cli user get --properties=ranger-access-control.properties username

Creating a user can be done in two ways:

a basic user created from a default template:

starburst-ranger-cli user create [--groups=group1,group2,...] -- user1 [user2] ...

using a JSON file, such as alice.json, with the following syntax:

{
  "name": "alice",
  "firstName": "Alice",
  "lastName": "Wonderland",
  "emailAddress": "alice@example.com",
  "password": "not@trivialP225w0rd",
  "description": "She went down the rabbit hole.",
  "groups": ["admin", "finance"],
  "roles": ["user", "account_owner"]
}

The file is passed with the -f or --from-file option:

starburst-ranger-cli user create -f=alice.json

If any group from groups doesn’t exist, it is automatically created.

It’s also possible to create multiple users using a file with a list of user definitions:

{
  "users": [{
    "name": "alice",
    "firstName": "Alice",
    "lastName": "Wonderland",
    "emailAddress": "alice@example.com",
    "password": "not@trivialP225w0rd",
    "description": "She went down the rabbit hole.",
    "groups": ["admin", "finance"],
    "roles": ["user", "account_owner"]
  }, {
    "name": "bob",
    "firstName": "Bob's firstName",
    "lastName": "Bob's lastName",
    "emailAddress": "bob@bobiverse.com",
    "password": "ForW3AreM@ny"
  }]
}

Service definition management#

You can find information about creating and overriding the service definition in the sections about installing and upgrading the SEP Ranger plugin.

Ranger REST API#

Apache Ranger includes a REST API that can be used for automating and troubleshooting your configuration and setup. Use it with caution and reference the API documentation as needed.