The Schema discovery pane on the catalog level of the catalog explorer lets you examine the metadata of the specified object storage location. Schema discovery is for catalogs that connect object storage data sources only.
Use schema discovery to identify and register tables or views that are newly added to a known schema location. For example, a logging process might drop a new log file every hour, rolling over from the previous hour’s log file. The purpose of schema discovery is to find the newly added files to make sure Starburst Galaxy knows how to query them.
Schema discovery requires the catalog’s metastore to have Allow creating external tables enabled.
Schema discovery is only available to catalogs that support write operations.
If you are running schema discovery for the first time, click Run schema discovery to analyze a root object in an object storage location and return the structure of any discovered tables. If you have previously performed schema discovery for the specified location, click Run discovery:
In the Catalog location URL field, enter the URL of the bucket and directory to scan.
schema/table/<files/partition>
. It cannot run on a file. For example,
s3://my-s3-bucket/my_csv_file.csv
does not work.A role in your current active role set must have the location privilege for the specified location. For this reason, Add location privilege is pre-selected to automatically grant the location privilege if not already present.
Enter the name of a schema in the Set default schema field. This is a backup schema name in which to place any discovered tables that are not already part of a schema.
Optionally, under Advanced settings, select the maximum sample file lines, and the maximum files per table.
Results for an Incremental discovery from last run populate a list with the following information:
Select the tables you would like to register, then click Create selected tables to go to the log events pane.
Results for Full discovery populate a list with useful information for your discoveries:
The log events pane lets you view a list of log entries for each discovery related event. The Summary dialog gives you the number of successful query executions, and the number of errors that occurred during the discovery run.
The list of log events includes the following information:
CREATE TABLE
, or
CREATE SCHEMA
. Click the text to view the full query.The discovery results pane lists tables found from the source during discovery:
Click Create all tables to navigate to the log events pane and to see each table being created. You can view your discovered schema in the schemas pane.
Schema discovery identifies the Iceberg, Delta Lake, and Hive table formats supported by Starburst Galaxy’s Great Lakes connectivity. Schema discovery does not identify Hudi tables.
register_table
procedure. For Hive
tables, schema discovery registers tables using the table metadata.Schema discovery identifies tables and views that are saved in the following file formats:
JSON
CSV
ORC
PARQUET
Schema discovery identifies tables and views that use the following compression codecs:
ZSTD
LZ4
SNAPPY
GZIP
DEFLATE
BZIP2
LZO
LZOP
Schema discovery locates certain file formats as described on the file formats page.
Is the information on this page helpful?
Yes
No