The PyStarburst library implements the standard Python DataFrame API, which uses a data structure called a DataFrame to analyze and manipulate two-dimensional data. Use PyStarburst to query and transform data in Starburst Galaxy and Starburst Enterprise platform (SEP) clusters in a data pipeline using Python syntax.
With PyStarburst, you can create complex transformation pipelines, build data apps, and interact with data using Python without moving data to the system where your application code runs.
PyStarburst provides familiar syntax for writing and running production-grade ETL pipelines and data transformations. This makes it possible to not only build new pipelines but also to migrate existing PySpark or Snowpark workloads to Starburst Galaxy and SEP.
To install PyStarburst and its dependencies, run the following pip
command
from your command prompt:
pip install pystarburst
Use your preferred local development environment to connect to a Starburst Galaxy cluster. Establish a session using the same connection parameters you use to log into Starburst Galaxy.
Specify these settings in a dictionary that associates parameter names with
values. Then pass this dictionary to the Session.builder.configs
method and
call the create
method to establish your session:
import trino
from pystarburst import Session
db_parameters = {
"host": "<host>",
"port": <port>,
"http_scheme": "https",
"catalog": "sample",
"schema": "burstbank"
"auth": trino.auth.BasicAuthentication("<user>", "<password>")
}
session = Session.builder.configs(db_parameters).create()
To determine the values for the connection parameters host
, port
, and
user
:
To enable PyStarburst in SEP, set the following configuration
property to true
in your SEP coordinator:
starburst.dataframe-api-enabled
After you have established a connection with a cluster, use Python to construct DataFrames and query tables. PyStarburst has a number of methods to perform DataFrame operations on your data.
View technical documentation for PyStarburst’s API methods at: https://pystarburst.eng.starburstdata.net/.
Try out PyStarburst using the example Jupyter notebook in the starburstdata/pystarburst-demo GitHub repository.
Is the information on this page helpful?
Yes
No