Starburst Alteryx connector#

The Alteryx connector lets Starburst Enterprise perform bulk operations with
Alteryx Designer. Use the
connector to move large datasets from Alteryx designer to Starburst Enterprise by
performing atomic bulk inserts into target tables using generated pre-signed
PUT URLs.
Requirements#
Data files stored in the Parquet, ORC, or Avro file format on a supported file system.
A valid Starburst Enterprise license.
Configuration#
To configure the Alteryx connector, create a catalog properties file that
specifies the Alteryx connector by setting the connector.name to
alteryx.
You must select and configure one of the supported file systems.
connector.name=alteryx
alteryx.upload-location-prefix=<s3://alteryx-bucket-path>
fs.native-s3.enabled=true
s3.aws-access-key=<ACCESS_KEY>
s3.aws-secret-key=<SECRET_KEY>
s3.region=<AWS_REGION>
The identity credentials used by SEP must have access to the location
specified in alteryx.upload-location-prefix.
General configuration properties#
The following configuration property is supported:
Property name |
Description |
|---|---|
|
The prefix location on the object store where the files uploaded with
Alteryx Designer are temporarily stored. The connector creates a
subdirectory named after a |
Supported types#
The Alteryx connector supports the following Trino types:
BOOLEANTINYINTSMALLINTINTEGERBIGINTREALDOUBLEVARCHARDATETIMETIMESTAMPTIMESTAMP(3)VARBINARY
File system access configuration#
The connector supports accessing the following file system:
Functions#
The connector supports the following functions.
object_upload_urls#
Generates pre-signed URLs to upload the URL objects to the object store using
HTTP PUT requests. The generated PUT object URLs are placed under the
directory prefix defined by the alteryx.upload-location-prefix configuration
property, followed by a UUID and the SHA-256 hash of the Starburst Enterprise user
making the request. The function returns the generated URLs as ARRAY(VARCHAR).
SELECT system.object_upload_urls(1, '5m');
The function takes the following parameters:
The number of URLs to generate. This must be equal to the number of chunks the file is split into.
The duration for which each pre-signed URL remains valid.
object_delete_urls#
Generates pre-signed URLs to delete the objects previously uploaded to the
object store. The function returns the generated URLs as ARRAY(VARCHAR).
SELECT system.object_delete_urls('s3://bucket/alteryx/uuid/username-sha-256-hash', '5m');
The function takes the following parameters:
The prefix location of the objects to delete.
The duration for which each pre-signed URL remains valid.
Table functions#
The connector supports the following table function.
LOAD#
Selects and loads content from the data files previously uploaded to the object store by Alteryx Designer.
SELECT
*
FROM
TABLE(
system.load(
location => 's3://bucket/alteryx/uuid/username-sha-256-hash',
format => 'PARQUET',
columns => DESCRIPTOR("regionkey" BIGINT, "name" VARCHAR(25))
)
);
The function takes the following required parameters:
locationPath of the schema where the uploaded files are.
formatFile format of the uploaded data files. Supported values are
PARQUET,ORC, andAVRO.
columnsA descriptor of the columns to be read. The descriptor must be in the following format:
DESCRIPTOR(column_name1 type1, column_name2 type2, ...).
Upload generated file chunks#
For each URL string returned by object_upload_urls, you can execute an HTTP
PUT request with the corresponding file chunk as the data. Include the
following HTTP header in the PUT request: If-None-Match: *.
curl -v -X PUT -H "If-None-Match: *" -T /tmp/upload/iris.parquet {https://s3.amazonaws.com/domain-user-region/alteryx/SHA-256-hash123456}
Example#
This example demonstrates how to perform bulk upload operations.
Generate pre-signed URLs to be used to PUT the objects on the object store
using the object_upload_urls table function:
SELECT alteryx.system.object_upload_urls(1, '5m')
The function returns a PUT object URL. The URL is placed under the directory
prefix defined by the alteryx.upload-location-prefix configuration property,
followed by a UUID and the SHA-256 hash of the user making the request.
{https://s3.amazonaws.com/domain-user-region/alteryx/SHA-256-hash123456}
Upload the generated URL data file with Alteryx Designer to your object store:
curl -v -X PUT -H "If-None-Match: *" -T /tmp/upload/iris.parquet {https://s3.amazonaws.com/domain-user-region/alteryx/SHA-256-hash123456}
Select and load content from the data files previously uploaded to the object store by Alteryx Designer:
SELECT
*
FROM
TABLE(
system.load(
location => 's3://bucket/alteryx/uuid/username-sha-256-hash',
format => 'PARQUET',
columns => DESCRIPTOR("sepal.length" double, "sepal.width" double)
)
);
Output:
sepal.length | sepal.width |
-------------+-------------+
5.1 | 3.5 |
4.9 | 3.0 |
4.7 | 3.2 |
You can use the loaded data in SQL statements such as INSERT INTO ... SELECT FROM ... to write into a target table:
INSERT INTO alteryx.schema_name.target_table
SELECT * FROM
TABLE(alteryx.system.load('s3://bucket/alteryx/uuid/username-sha-256-hash', 'PARQUET', DESCRIPTOR("sepal.length" double,"sepal.width" double)));
After completing the DML operation, run the object_delete_urls function to
generate a pre-signed URL for deleting the objects previously uploaded to the
object store:
SELECT system.object_delete_urls('s3://bucket/alteryx/uuid/username-sha-256-hash', '5m');
Output:
's3://bucket/alteryx/uuid/username-delete-sha-256-hash'
In Alteryx Designer, delete the temporarily uploaded content on the object store:
curl -v -X DELETE 's3://bucket/alteryx/uuid/username-delete-sha-256-hash'