AWS Lake Formation access control support#

Starburst Enterprise platform (SEP) provides support for using an existing AWS Lake Formation access control system.

Requirements#

To use AWS Lake Formation integration with Starburst Enterprise, you need:

  • An existing AWS Lake Formation configuration and AWS credentials that allow interacting with its API.

  • A valid Starburst Enterprise license.

Overview#

AWS Lake Formation provides a single place to manage access controls policies. You can define security policies that restrict access to data at database, table, column, row and cell levels.

AWS Lake Formation access control support is only available for catalogs that use the Hive connector because it utilizes HDFS file system support.

AWS Lake Formation can be enabled alongside built-in access control as long as the two security systems are securing mutually exclusive entities. Using both AWS Lake Formation and built-in access control for authorization on the same catalog is not supported.

Configure AWS Lake Formation#

Each catalog that needs to be controlled with AWS Lake Formation must have the catalog properties file configured to use the lake-formation Hive security:

hive.security=lake-formation

The following is a more complex example of a catalog properties file that is configured to use AWS Lake Formation for authorization with the Hive connector.

connector.name=hive
hive.security=lake-formation
hive.metastore=glue
hive.metastore.glue.region=us-east-2
hive.metastore.glue.default-warehouse-dir=s3://data-lake-bucket
hive.metastore.glue.iam-role=arn:aws:iam::<account_id>:role/<admin-role>
lake-formation.authorized-caller-tag=starburst-enterprise
lake-formation.security-mapping.config-file=etc/lakeformation-security-mapping.json

More information on lake formation security mapping can be found later in this topic.

Lake Formation data lake locations#

Depending on your intentions with AWS Lake Formation, you may need to register some number of S3 locations:

  • To run INSERT statements on a table, its S3 location must be registered in Lake Formation.

  • When credential vending is enabled, all S3 locations used must be registered in Lake Formation.

You can then use AWS Lake Formation permissions for access control to objects that point to your data lake location, and to the underlying data in the location.

Warning

Do not use the AWSServiceRoleForLakeFormationDataAccess service-linked role for registering data locations.

Trust relationships of the IAM role registered for a location in AWS Lake Formation must allow the Lake Formation AWS service to assume the role. Below is an example of a Trust relationships statement for the data access role.

{
  "Effect": "Allow",
  "Principal": {
    "Service": [
      "lakeformation.amazonaws.com",
      "glue.amazonaws.com"
    ]
  },
  "Action": "sts:AssumeRole"
}

The IAM role registered for a location in AWS Lake Formation should have at least the following S3 API actions granted for all S3 paths that it is registered for: s3:ListBucket, s3::GetObject, s3::PutObject, s3::DeleteObject.

For more information on registering data locations in AWS Lake Formation, refer to Register data lake AWS documentation.

Configuration properties#

AWS Lake Formation configuration properties#

Property

Description

lake-formation.authorized-caller-tag

The value of LakeFormationAuthorizedCaller registered for SEP in third-party query engine integration.

lake-formation.max-connections

The maximum number of concurrent connections to the AWS Lake Formation client. Defaults to 30.

lake-formation.max-error-retries

Maximum number of error retries for the AWS Lake Formation client. Defaults to 10.

lake-formation.region

AWS region of the data catalog that AWS Lake Formation is securing. Must be configured when not running in EC2, or when the catalog is in a different region.

lake-formation.pin-client-to-current-region

Set AWS Lake Formation API requests to the same region as the EC2 instance where SEP is running. Defaults to false.

lake-formation.endpoint-url

(Optional) URL for an AWS Lake Formation API endpoint URL, such as https://lake-formation.us-east-1.amazonaws.com.

In order for SEP to integrate with AWS Lake Formation, you must add a session tag for SEP to AWS Lake Formation external data filtering. The session tag value added to AWS Lake Formation is set in the lake-formation.authorized-caller-tag property. Permission checking for SEP fails for AWS Lake Formation if this property is not set. Read the AWS documentation for details on how to complete this configuration.

To use Lake Formation access control when accessing resources shared between different AWS accounts, you need the following prerequisites:

  • For AWS accounts that are sharing resources:

    • Lake Formation external data filtering must be enabled.

    • One of the allowed session tag values must match what is configured in SEP.

    • IDs of the accounts that are accessing resources must be listed in Lake Formation external data filtering settings, under AWS account IDs.

  • For AWS accounts that are accessing shared resources:

    • Lake Formation external data filtering must be enabled.

    • One of the allowed session tag values must match what is configured in SEP.

Access to S3 in the source accounts must be configured separately using S3 security mapping.

Write operations on Lake Formation resource links are not supported.

Lake formation credential vending integration#

SEP supports AWS credential vending with Lake Formation. SEP calls Lake Formation credential vending API operations to generate temporary credentials to determine read access to the table. S3 locations that are registered with Lake Formation can only be accessed using the role specified at the time of registering the location.

To enable AWS credential vending, add the lake-formation.credential-vending.enabled=true catalog configuration property to your hive.properties configuration file.

Note

Enabling credential vending makes the catalog read-only.

AWS Lake Formation works only with the native filesystem, therefore the fs.native-s3.enabled property must be set to true. When a Hive catalog uses credential vending, Hive S3 configuration properties are made invalid.

The following properties must be used instead:

Credential vending S3 configuration properties#

Property

Description

s3.region

Optional property to force the S3 client to connect to the specified region only.

s3.endpoint

The S3 storage endpoint server. This can be used to connect to an S3-compatible storage system instead of AWS. When using v4 signatures, it is recommended to set this to the AWS region-specific endpoint, such as http[s]://s3.<AWS-region>.amazonaws.com.

s3.path-style-access

Use path-style access for all requests to the S3-compatible storage. This is for S3-compatible storage that doesn’t support virtual-hosted-style access. Defaults to false.

s3.max-connections

Maximum number of simultaneous open connections to S3.

s3.http-proxy-secure

Proxy protocol. HTTPS.

s3.http-proxy

Proxy protocol. HTTP.

s3.streaming.part-size

The part size for S3 streaming upload. Defaults to 16MB.

s3.requester.pays

Enables Requester Pays.

s3.sse.type

The type of key management for S3 server-side encryption. Use S3 for S3 managed or KMS for KMS-managed keys, defaults to S3.

s3.sse.kms-key-id

If set, use S3 client-side encryption and use the AWS KMS to store encryption keys and use the value of this property as the KMS Key ID for newly created objects.

Credential vending configuration properties#

The following additional catalog configuration properties are available for credential vending:

Credential vending configuration properties#

Property

Description

lake-formation.credential-vending.validity

Specifies the length of time in which the generated temporary credentials are valid. Duration can be set between 15m and 6h. For example, use 15m to keep the temporary credentials valid for 15 minutes.

lake-formation.credential-vending.stale-time

Specifies the length of time until temporary credentials are marked stale and new credentials are fetched. There is no minimum value, however it should be less than the value set for lake-formation.credential-vending.validity

Lake formation security mapping#

SEP supports flexible security mapping for lake formation, which associates SEP users or groups with AWS security entities like IAM roles according to a JSON mappings file. The IAM role for a specific query can be selected from a list of allowed roles using SHOW ROLE GRANTS FROM <catalog> and SET ROLE <user-role> IN <catalog> sql statements.

Each security mapping entry may specify one or more match criteria. If multiple criteria are specified, all criteria must match. The following match criteria are available:

  • "user": - Regular expression to match against username. For example: alice|bob to match either SEP users “alice” and “bob”.

  • "group": - Regular expression to match against any of the groups that the user belongs to. For example: finance|sales to match either the finance or sales groups in SEP.

Each SEP match criteria can be mapped to one or more of the following AWS security entities:

  • "iamRole": IAM role to use if no user provided role is specified. This overrides any globally configured IAM role.

  • "roleSessionName": (Optional) Only valid when iamRole is specified. If roleSessionName includes the string ${USER}, then the ${USER} portion of the string will be replaced with the current session’s username. If roleSessionName is not specified, it defaults to trino-session.

  • "allowedIamRoles": Comma-separated list of IAM roles that specified AWS account users are limited to.

The security mapping entries are processed in the order listed in the JSON mapping. More specific mapping entries should thus be specified before less specific mapping entries. For example, the mapping list might have a "group": entry for “salesnorth” followed by an entry for “sales” to allow to apply a more specific lake formation security mapping to the north sales team, before applying a more broad security mapping to the whole sales department.

You can set a default mapping by adding an entry to the end of the file that does not specify an SEP match criteria. If no mapping entry matches and no default is configured, access is denied with a “Cannot set role NONE” error.

The JSON mapping can either be retrieved from a file or REST-endpoint specified via the lake-formation.security-mapping.config-file config property.

The following example JSON mapping applies SEP user and group mappings to security entities in AWS lake formation:

{
  "mappings": [
    {
      "user": "bob|charlie",
      "iamRole": "arn:aws:iam::123456789101:role/test_default",
      "allowedIamRoles": [
        "arn:aws:iam::123456789101:role/test_default"
        "arn:aws:iam::123456789101:role/test1",
        "arn:aws:iam::123456789101:role/test2",
        "arn:aws:iam::123456789101:role/test3"
      ]
    },
    {
      "user": "salesnorth",
      "iamRole": "arn:aws:iam::123456789101:role/sales_north_users"
    },
    {
      "group": "sales*",
      "iamRole": "arn:aws:iam::123456789101:role/sales_all_users"
    },
    {
      "iamRole": "arn:aws:iam::123456789101:role/default"
    }
  ]
}

Security mapping configuration properties#

Security mapping configuration properties#

Property

Description

lake-formation.security-mapping.config-file

Path and filename of the JSON mapping file, or REST-endpoint URI containing security mappings.

lake-formation.security-mapping.refresh-period

How often to refresh the security mapping configuration. For example, use 5m to direct SEP to refresh security mappings every 5 minutes against the JSON mapping.

The following example shows the lake formation security mapping configuration properties:

lake-formation.authorized-caller-tag=starburst-enterprise
lake-formation.security-mapping.config-file=etc/example-lake-formation-security-mapping.json
lake-formation.security-mapping.refresh-period=5m

Security mapping role requirements#

AWS Lake Formation permissions are read from AWS using two different sets of assumed role credentials when executing queries against catalogs protected by AWS Lake Formation security policies:

  • admin - Identified by hive.metastore.glue.iam-role configuration property.

  • user - Selected according to Security Mapping rules.

You must configure the following in AWS IAM:

  • admin role must be configured to assume user role in AWS trust relationships. Read the AWS trust relationship documentation for more information.

  • sts:TagSession and sts:AssumeRole actions must both be allowed.

  • For read access:

    • glue:GetDatabases, glue:GetDatabase, glue:GetTables, glue:GetTable, glue:GetPartitions, glue:GetPartition, glue:BatchGetPartition AWS Glue API actions must be granted to the user role.

  • For write access:

    • glue:GetDatabases, glue:GetDatabase, glue:GetTables, glue:GetTable, glue:GetPartitions, glue:GetPartition, glue:BatchGetPartition, glue:CreateTable, glue:DeleteTable, glue:UpdateTable, glue:BatchCreatePartition, glue:UpdatePartition, glue:DeletePartition, lakeformation:GetDataAccess AWS Glue and Lake Formation API actions must be granted to the user role.

Listing and selecting available user roles#

The following SQL statements are available to list and set roles:

  • SHOW ROLE GRANTS FROM <catalog_name> - Lists AWS roles available to the user in a catalog protected by lake formation.

  • SHOW CURRENT ROLES IN <catalog_name> - Returns currently enabled roles.

  • SET ROLE "arn:iam::..." IN <catalog_name> - Selects a specific role.

  • SET ROLE NONE IN <catalog_name> - Causes the role to default to the role defined by iamRole in the security mapping configuration, if its set.

Use the roles=<catalog_name>:arn:iam::... connection property to select specific role for a jdbc connection.

Caching#

In order to make permission checks run as fast as possible, lake formation access control caches permission data for each user. By default, permissions are stored for a maximum of 1000 users, and expire ofter 10 minutes. In the following example, the permissions cache size is reduced to 100 users, and the

permissions set to expire after two hours:

lake-formation.cache-ttl=2h
lake-formation.cache-size=100
Cache configuration properties#

Property

Description

lake-formation.cache-ttl

Time duration for which to store lake formation permission data for each user.

lake-formation.cache-size

Maximum number of users for which to store lake formation permission data.

If needed, the cache can be manually cleared using a SQL procedure call in the catalog:

CALL system.flush_access_control_cache()

Views#

Views that were created before enabling AWS Lake Formation access control or views created in older SEP versions must be manually migrated before they can be queried using an AWS Lake Formation-secured catalog:

CREATE OR REPLACE VIEW AS query

If a view is using DEFINER security mode, it must be dropped and created again in INVOKER security mode. Views with DEFINER security mode are not supported in AWS Lake Formation access control.

After migration, you can manage permissions for a view in AWS Lake Formation.

Limitations#

Creating views in DEFINER security mode is not supported in AWS Lake Formation.

If AWS Lake Formation controls access to a database and is configured to use data filters, another access control system should not be configured to control access to that same database. Overlapping policies may interact in unexpected ways.