Apache Ranger overview#
Apache Ranger is a tool to manage access control policies for Hadoop/Hive and related object storage systems such as Delta Lake. It provides a simple and intuitive web-based console for creating and managing policies controlling access to the data.
The Privacera Platform, powered by Apache Ranger is an extended commercial distribution of Apache Ranger, that can also be used.
Starburst Enterprise platform (SEP) can be integrated with Ranger as an access control system. When a query is submitted to SEP, SEP parses and analyzes the query to understand the privileges required by the user to access objects such as schemas and tables. Once a list of these objects is created, SEP communicates with the Ranger service to determine if the request is valid. If the request is valid, the query continues to execute. If the request is invalid, because the user does not have the necessary privileges to query an object, an error is returned. Ranger policies are cached in SEP to improve performance.
Authentication is handled outside of Ranger, for example using LDAP, and Ranger uses the authenticated user and user groups to associate with the policy definition.
Note
SEP integration with Ranger requires a valid Starburst Enterprise license.
For more information on integrating Apache Ranger with SEP, see the following pages:
Requirements#
Before you configure SEP for any integration with Apache Ranger or Privacera Platform, verify the following prerequisites:
The SEP coordinator and workers have the appropriate network access to communicate with the Ranger service. Typically this is port
6080
or6182
, if SSL is used.Apache Ranger 2.0.0 or higher must be used
Privacera Platform version 4.7.0.3 is recommended
A policy covering all users that provides read access to
system.metadata
,system.jdbc
, andsystem.runtime
. Access to thesystem.jdbc
schema is granted automatically.
Ranger usage options#
SEP offers the following different integrations with Ranger:
Global access control with Apache Ranger, which works for any catalog using any connector and requires the SEP Ranger plugin to be installed with Ranger.
Hive and Delta Lake access control with Apache Ranger, which works for any catalog using the Hive connector and can use an unmodified Apache Ranger.
We highly recommend implementing Ranger for global access control. This allows you to use Ranger policies for all configured catalogs.
Note
When used for global access control, the Starburst Ranger integration extends the basic functionality of Ranger with the Starburst Ranger plugin. It allows Ranger to provide access control for all data sources defined by a catalog in Starburst Enterprise, and all other data sources supported by SEP.
Key concepts#
The concepts and features described in the following section apply to all Ranger usage.
Policies#
A policy is a combination of set of resources and the associated privileges. Ranger provides a user interface, or optionally a REST API, to create and manage these access control policies.
Resource sets#
A resource set includes one or more resources of different resource types. Wildcard characters are supported to select a number of resources based on a pattern.
catalog
catalog - schema
catalog - schema - table
catalog - schema - table - column
catalog - schema - function-kind
catalog - schema - function-kind - catalog-schema-function
catalog - schema - procedure
catalog - session property
data-product
data-product - data-product-domain
function
system session property
query
user
As you can see from the list above, some resources are hierarchically organized within a catalog and below. This allows you for example to restrict access to a complete catalog, a specific schema, or table or even down to a column, procedure, or function within a schema.
For example, you can define a set of resources that allows you to restrict
access to two tables credit-info
and cards-info
in all schemas
in the hdfs
catalog.
Catalog: hdfs
Schema: *
Table: credit-info, cards-info
A set of resource works as a primary key for a policy. It needs to be unique. Multiple policies however may cover a single resource because of the wildcard.
It is best to create fine grained resource sets, especially when using column
masking and row filtering. Using policies with wildcards can create hard to
understand, or even unpredictable behavior, when there are multiple policies
that apply to the same resource. For example, both *-schema-table-column
and
catalog-*-table-column
apply to column
in table
in catalog
. The
second definition is more specific and therefore preferred to keep your
configuration easier to understand.
Privilege sets#
A set of privileges consists of one or more user groups, roles and users, and a set of access types for the specified resource set. Privileges can allow or deny operations.
The catalog, schema, table and column resources, which grant access to resources for queries, have the following access types.
SELECT
to read data from the resourceINSERT
to add data to the resourceUPDATE
to change data in the resourceDELETE
to remove data from the resourceCREATE
to create a resourceALTER
to alter a resourceDROP
to remove a resourceSHOW
to show information about a resourcePUBLISH
to publish a data productOWNERSHIP
to claim ownership of the resource, which provides complete accessIMPERSONATE
to impersonate another user, and therefore use the privileges of that user
In addition there are privileges that determine access to queries and their usage, and are therefore of a more general nature.
SELECT
to list queries.EXECUTE
to initiate processing of any query. Without this privilege user action is extremely limited.KILL
to stop processing of any query.
Users, groups, and roles#
Users, groups, and roles are sourced from your configured authentication system, ideally a connected LDAP directory, and are used the target users for each policy.
Column masking#
SEP’s Apache Ranger integration supports most of the column masking methods
that are supported in Hive with Ranger. SEP does not distinguish upper case,
lower case and digital characters when masking. x
is used for all mentioned
character types.
Note
In the case of usage of any unsupported column masking, MASK_NULL
is used.
Service and catalog integrations#
In addition to enforcing the policies in Apache Ranger, SEP integrates with the Apache Ranger Key Management Service, and has support for AWS Glue Data Catalog, row level filtering and tag-based policies.
Location privileges#
The SEP integration with Ranger allows you to set location privileges to
ensure the correct users have access to create objects in specific object
storage locations. Location privileges support CREATE TABLE
and CREATE SCHEMA
operations, as well as CALL system.register_partion
for Hive catalogs.
To enable Ranger location privileges, create a
location-access-control.properties
file in your etc
directory with the
following attributes, replacing the example text with your own:
location-access-control.name=ranger
ranger.policy-rest-url=some_url
ranger.service-name=service_name
ranger.username=name
ranger.password=pass
For Kubernetes deployments, define a new
etcFiles.properties.location-access-control.properties
section of the
top-level coordinator
node in the values.yaml
file:
coordinator:
etcFiles:
properties:
location-access-control.properties: |
location-access-control.name=ranger
ranger.policy-rest-url=some_url
ranger.service-name=service_name
ranger.username=name
ranger.password=pass
In Ranger, you must create the appropriate policies as locations are denied by
default. Location privileges support recursive or non-recursive policies. For
example, if you have a recursive policy with the location /tmp/allow
then
/tmp/allow/nested
is valid.
Additionally, policies can contain wildcards, such as /tmp/*/my_table
.
Features and use cases#
The following features and use cases are applicable with all Ranger usage.
Controlling access to User Defined Functions with Ranger#
You can use the Ranger system access control to enforce User Defined Function (UDF) policies. A UDF in SEP is deployed as a plugin (Functions) and stored in the SEP global namespace. This global namespace is managed at the system access control level.
This is independent of the global and Hive access control with Ranger and the Privacera Platform.
The Ranger resource hierarchy for all UDF policies requires an associated
database (or schema) namespace when creating the policy. Because the global
namespace is independent of any connector namespace, this poses a slight
challenge to control access to UDFs using Ranger. To overcome this you must
specify $sep
as the database name in Ranger. This keeps all SEP
functions under the $sep
database in Ranger resource hierarchy.
To configure Ranger system access control for UDFs, you need to add the
following to a system access control property file e.g. named
etc/access-control-ranger-udf.properties
:
access-control.name=ranger-system-access-control
ranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hive
ranger.authentication-type=KERBEROS
ranger.kerberos-principal=sep-server/sep-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/sep/conf/sep-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml
This additional configuration is needed because the Ranger system access control uses an independent Ranger client from the Hive access control. Only one Ranger system access control can be defined, while Hive access control can be configured separated for each Hive catalog. In the scenario where there are multiple Hive catalogs and multiple Ranger services, only one of those Ranger services can be used to manage the UDF policies.
Note
All Ranger configuration properties supported for global access control with Ranger are supported for Hive access control. Ranger properties related to row filtering or column masking are unsupported in global access control.
Audit#
When Ranger audit is implemented, whenever access is granted or denied through Ranger, an audit event is logged if auditing is enabled in a given resource policy.
Ranger audit is configured in the Ranger-specific file
/etc/hive/conf/ranger-hive-audit.xml
. Configuring Ranger audit is complex,
and outside the scope of Starburst documentation; please refer to your Ranger
documentation to learn how to set up audit optimally for your environment.
For Audit to work with SEP, the location of the file must be specified in your catalog properties file:
ranger.config-resources=/etc/hive/conf/ranger-hive-audit.xml
Caveat regarding performance
Ranger audits are performed by accessing the internal table
system.runtime.queries
. Any access to the table is logged.
The Trino web UI makes heavy use of the queries table. The
property ranger.audit.system-runtime-queries.enabled
is set to true
by
default and controls this logging behavior. Using the web interface causes a
flood of audit events. Setting the property to false
disables this audit
logging.
Caching#
Caching is used to improve performance and reduce the number of requests to the Ranger service. Caching is enabled through configuration properties, which can be found in the Ranger installation and configuration page.
Configuration properties#
The properties listed in this table apply to Ranger-related configurations in system access control properties files as well as catalog files using the Hive connector for Hive access control with Apache Ranger or the Privacera Platform.
Note
Some properties, such as ranger.row-filtering.enabled
, are unsupported
when Ranger is configured for global access control.
Property name |
Description |
---|---|
|
URL address of the Ranger REST service, required to use HTTPS with
Kerberos |
|
SEP Ranger plugin service name. |
|
Authentication type for SEP connecting to Ranger, |
|
SEP Ranger plugin user name. This property is used when
|
|
SEP Ranger plugin user password. This property is used when
|
|
Ranger service kerberos principal. |
|
Path to the Ranger service kerberos keytab file. |
|
Path to Ranger plugin SSL configuration. |
|
Path to ranger cache dir for policies. This allows loading policies from cache on startup, even though Ranger Policy Admin was not available at the moment. |
|
Interval determining how often authorization polices are refreshed. The
highest latency after which changes in Ranger authorization policies are
visible in SEP. Default is |
|
Ranger service connection timeout. Default is |
|
Ranger service read timeout. Default is |
|
Source of user/group information, |
|
Period for how long group mapping information is cached in SEP. |
|
Period for how long group mapping information is refreshed in SEP. Any
value greater than |
|
To enable row filtering, set this flag to |
|
To enable resource wildcard matching for row filtering, set this flag to
|
|
To enable resource wildcard matching for column masking, set this flag to
|
|
Additional XML configuration files which are read before applying your SEP Ranger configuration. Useful for reusing existing HIVE-LEVEL RANGER configuration with things like Ranger Audit configuration. |
|
Enable Ranger policy management with SQL as
supported for Hive access control only. Default is |
|
Skip authorization check when setting catalog session properties. This
property can only be used in Hive and Delta Lake catalog access control,
not global access control. Defaults to |
|
Path to an auth-to-local translations file to configure username translation for Ranger. |
|
To enable URL policies on external table location, set this flag to
|
Ensure Ranger works with TLS#
If your organization implements TLS for network traffic between SEP and Ranger, you must ensure that both are correctly configured. You must add the SEP certificate to a JKS keystore and configure it in the Ranger SSL configuration file:
All catalogs accessing Ranger must define the
ranger.plugin-policy-ssl-config-file
property and point to the XML
configuration file:
ranger.plugin-policy-ssl-config-file=/etc/starburst/ranger-policymgr-ssl.xml
If Ranger and SEP use globally trusted certificates, you can use the following Ranger SSL configuration file:
<configuration>
<!-- The following properties are used for 2-way SSL client server validation -->
<property>
<name>xasecure.policymgr.clientssl.keystore</name>
<value>/etc/starburst/sb-admin-keystore.jks</value><!--coordinator's cert goes here-->
</property>
<property>
<name>xasecure.policymgr.clientssl.keystore.credential.file</name>
<value>jceks://file/etc/starburst/sb-admin-keystore.jceks</value><!--coordinators jks file password store-->
</property>
</configuration>
Without globally trusted certificates, you need to add Ranger’s certificate to a JKS truststore and link it in the XML file:
<configuration>
<!-- The following properties are used for 2-way SSL client server validation -->
<property>
<name>xasecure.policymgr.clientssl.keystore</name>
<!--This a certificate. Store the file with the coordinator certificate and private key. -->
<value>/etc/hive/conf/ranger-plugin-keystore.jks</value>
</property>
<property>
<name>xasecure.policymgr.clientssl.truststore</name>
<!--This a certificate. Store the file with the coordinator certificate and private key. -->
<value>/etc/hive/conf/ranger-plugin-truststore.jks</value>
</property>
<property>
<name>xasecure.policymgr.clientssl.keystore.credential.file</name>
<!-- This file holds the password from Starburst keystore -->
<value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
</property>
<property>
<name>xasecure.policymgr.clientssl.truststore.credential.file</name>
<value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
</property>
</configuration>
More information about working with certificates is available in JKS files and PEM files. Avoid renaming JCEKS files after generating them, since that invalidates them.
When using the SEP Helm charts, you have to configure the XML file and the access control file inline of the YAML files.
When using the Starburst Ranger plugin, you also need to configure TLS for the connection between Ranger and SEP.
Ensuring Ranger works with your authentication service#
You need to configure SEP to work with the authentication service used by Ranger. Ranger needs to have information about your users, groups, and roles in your authentication system. There are two ways of getting that information into Ranger:
SEP user sync - SEP pushes authenticated user data to Ranger directly. This happens whenever SEP needs to check user permissions with Ranger, but is cached per user so it does not happen too often. This is simpler to set up, as it only requires setting the
ranger.user-group-source
configuration property toSTARBURST
. In addition to authentication, you need to have a user group provider configured in SEP usinggroup-provider.properties
, for example: LDAP group provider, to get a list of groups for each logged in user. It only works if user groups in Ranger policies have the same names that SEP user groups have. If it is not configured when a user is created, they are not assigned to any groups in Ranger.Ranger user sync - a separate ETL process from authentication service to Ranger. Requires separate setup of the user sync process.
While SEP does offer Kerberos support, SEP encourages the use of LDAP. The following configuration property is provided:
LDAP#
If your organization uses LDAP system for user and group information, Ranger can use that information to define role-based access to catalogs using any connector, as well as a number of other system resources. Policies in Ranger define access and authorization, and are created with the Ranger user interface. Users, groups, and roles are sourced from your connected LDAP directory and are used to target users for a Ranger policy. Each policy combines user and group information with a resource and access rights to the resource.
With the K8s and AWS installation methods, all details are already configured. For existing Ranger usage or manual installation, you must ensure that Ranger has data from your LDAP directory provider, and that a synchronization process (either SEP or Ranger user sync) is in place.
The process of connecting your existing Ranger installation depends on your particular LDAP implementation as well as your Ranger configuration. Learn more about that in the LDAP Authentication page.
SEP user synchronization#
SEP can sync user names and user groups to Ranger when using LDAP, LDAPS
or OAuth2 for authentication. This requires setting
ranger.user-group-source
in Configuration properties to
STARBURST
.
Note
When using sync mode with ranger.user-group-source
, you do not need to set
up Configure Ranger user synchronization.
This mode requires a configured LDAP group provider or
File group provider for groups information in SEP using
group-provider.properties
. No other methods of resolving group membership
are supported. It only works if user groups in Ranger policies have the same
names that SEP user groups have. If it is not configured when a user is
created, they are not assigned to any groups in Ranger.
Alternatively, Kubernetes users can use the alternative LDAP and Ranger user sync support.
Kerberos#
SEP can use Kerberos authentication page, and the Ranger integration also support Kerberos.
Warning
Most organizations that use Kerberos also use LDAP. We strongly encourage you to use LDAP instead of Kerberos, due to the relative unreliability of Kerberos servers, their lack of clear error messaging, and their rigid OS and JVM dependencies.
A sys admin Ranger user (user with role ROLE_SYS_ADMIN
) must exist that
matches SEP Kerberos principal ranger.kerberos-principal
when Kerberos
auth is used, or SEP Ranger plugin username ranger.username
and password
ranger.password
if BASIC
auth is used.
The SEP Kerberos principal is translated to Ranger user name via
auth-to-local
hadoop rules from core-site.xml
.
Note
Ranger version 2.1.0 removes the possibility to connect to Kerberized Ranger
using basic user and password authentication. You have to add the following
configuration to your Ranger core-site.xml
file to restores this
possibility by allowing unauthenticated access:
<property>
<name>ranger.admin.allow.unauthenticated.access</name>
<value>true</value>
</property>
Alternatively, you can configure SEP to authenticate to Ranger using Kerberos.
Starburst Ranger CLI#
You can use the Starburst Ranger CLI to manage integration of SEP with Apache Ranger or the Privacera Platform for the following tasks:
Service definition setup for initial configuration
Service definition setup for plugin upgrade
The command line application is an executable Java archive, that requires Java 17 or higher available on the system path. You can download it from Starburst and install it with the following steps on Linux or macOS.
Ensure the computer is able to reach the Ranger server via HTTP, since the CLI interacts with the REST API. This can be the coordinator, or worker in the cluster or any other computer.
Verify Java with
java -version
Move the binary to a directory in your path, such as
~/bin
and rename it.mv starburst-ranger-cli-*-executable.jar ~/bin/starburst-ranger-cli
Verify the folder is on the path.
echo $PATH
If necessary, add the folder.
export PATH=~/bin:$PATH
Now you can run the
help
command to verify the CLI works.starburst-ranger-cli help
The resulting output is similar to the following:
Starburst Ranger command line interface USAGE: starburst-ranger-cli [--properties=<configFile>] [-p=<String=String>]... [COMMAND] ...
Help commands#
The help
command can provide details about the other commands and their
specific options, if you append help to the desired command, with a few examples
shown in the following block:
starburst-ranger-cli help
starburst-ranger-cli user help
starburst-ranger-cli service-definition help
starburst-ranger-cli group create help
starburst-ranger-cli user create help
Windows installation is supported as well and requires similar commands. You can also run the application directly with Java on Linux, macOS or Windows.
java -jar starburst-ranger-cli-*-executable.jar
You have to supply the connection details from SEP to Ranger in a properties file. Typically you can simply use the Ranger access control properties file by copying it to the computer running the CLI. Alternatively you can use individual properties as command line options.
Use the
--properties
to specify the full path to a.properties
file that contains one or morekey=value
pairs on each lineUse the
-p
option for each property separately with the format-p=key=value
.
In the following examples these properties are usually omitted, but they are necessary to find the Ranger endpoint.
Ranger user group management#
You can manage user groups in Ranger with the CLI. Properties are used to provide the details for Ranger access.
The following operations are available:
create a group
get a list of all groups
get a list of all groups a certain user belongs to
delete a group
It uses uses access control properties and positional parameters to pass group names using the following syntax:
starburst-ranger-cli group get [username]
starburst-ranger-cli group create group1 [group2] ...
starburst-ranger-cli group delete group1 [group2] ...
The following complete examples gets all groups in Ranger specified by the properties file and displays them:
starburst-ranger-cli group get --properties=ranger-access-control.properties
If a username is specified, only the groups of the user are displayed:
starburst-ranger-cli group get –properties=ranger-access-control.properties myusername
You can create one or multiple groups, and the identifier of each created group is displayed as confirmation:
starburst-ranger-cli group create group1 [group2] ...
Deleting groups is similar:
starburst-ranger-cli group delete group1 [group2] ...
Ranger user management#
You can manage users in Ranger with the CLI. Properties are used to provide the details for Ranger access.
The following operations are available:
create a user
get user details
delete a user
It uses uses access control properties and a mixture of positional parameters and options to pass user information using the following syntax:
starburst-ranger-cli user get
starburst-ranger-cli user create
starburst-ranger-cli user delete
A full example to get a user can look like this:
starburst-ranger-cli user get --properties=ranger-access-control.properties username
Creating a user can be done in two ways:
a basic user created from a default template:
starburst-ranger-cli user create [--groups=group1,group2,...] -- user1 [user2] ...
using a JSON file, such as
alice.json
, with the following syntax:
{
"name": "alice",
"firstName": "Alice",
"lastName": "Wonderland",
"emailAddress": "alice@example.com",
"password": "not@trivialP225w0rd",
"description": "She went down the rabbit hole.",
"groups": ["admin", "finance"],
"roles": ["user", "account_owner"]
}
The file is passed with the -f
or --from-file
option:
starburst-ranger-cli user create -f=alice.json
If any group from groups
doesn’t exist, it is automatically created.
It’s also possible to create multiple users using a file with a list of user definitions:
{
"users": [{
"name": "alice",
"firstName": "Alice",
"lastName": "Wonderland",
"emailAddress": "alice@example.com",
"password": "not@trivialP225w0rd",
"description": "She went down the rabbit hole.",
"groups": ["admin", "finance"],
"roles": ["user", "account_owner"]
}, {
"name": "bob",
"firstName": "Bob's firstName",
"lastName": "Bob's lastName",
"emailAddress": "bob@bobiverse.com",
"password": "ForW3AreM@ny"
}]
}
Service definition management#
You can find information about creating and overriding the service definition in the sections about installing and upgrading the SEP Ranger plugin.
Ranger REST API#
Apache Ranger includes a REST API that can be used for automating and troubleshooting your configuration and setup. Use it with caution and reference the API documentation as needed.