Python clients #

Starburst Galaxy, Starburst Enterprise platform (SEP), and Trino fully support client access from Python code and Python-based client apps.

Python libraries and clients #

The following data query libraries and clients take advantage of the Trino Python client package.

  • trino-python-client is an open source library for querying clusters managed by Trino, Starburst Galaxy, and Starburst Enterprise. This package is described on this page in Trino Python client.

  • PyStarburst is Starburst Galaxy’s library that supports the Python DataFrame API.

  • Ibis is a portable Python DataFrame library, with support for Trino connections.

  • dbt is a data transformation workflow development framework that lets teams quickly and collaboratively deploy analytics code. Starburst provides a supported adapter. The dbt client page describes the steps to use the adapter and dbt with Trino, Starburst Enterprise, or Starburst Galaxy.

  • Apache Superset is a data exploration and visualization platform. Connections to clusters use the SQLAlchemy-Trino package in conjunction with the Trino Python client package. The Superset client page describes the steps to use Superset with Trino, Starburst Enterprise, or Starburst Galaxy.

  • Querybook is a browser-based data analysis tool that turns SQL queries into natural language reports and graphs called DataDocs. The Querybook client page describes the steps to use Querybook with Trino, Starburst Enterprise, or Starburst Galaxy.

Trino Python client #

The client supports running queries within transactions, as described in the GitHub project’s README.

Setup #

The Python client package requires Python 3.6 or later, or PyPy 3.

To use the package directly in your Python code, install it locally with pip install trino (or use pip3 if your system is so configured). Thereafter, import trino into your code.

To use one of the Python-based clients, follow the setup instructions for that client, which incorporates the trino package internally.

Authentication methods #

The Python client package supports the following Trino authentication methods:

Package comparison #

The Python Database API Specification (DBAPI) defines a standard way for Python clients to access databases. The Trino Python client is a direct implementation of the DBAPI specification.

SQLAlchemy is a toolkit whose core component provides a SQL abstraction layer over many DBAPI implementations. Several Python clients use SQLAlchemy along with the trino-python-client package to provide SQL access to Trino clusters.

Python clients that use the Trino DBAPI implementation directly, or that use SQLAlchemy along with the Trino DBAPI package, are the most direct path to querying Trino, Starburst Enterprise, and Starburst Galaxy clusters.

Several alternative Python access methods are not as direct, and are not recommended:

  • PySpark requires Spark JARs as well as a JDBC driver. This leaves your SQL query two layers removed from a direct DBAPI implementation.

  • PyJDBC does implement DBAPI, but also inserts the requirement of a JDBC driver in the path of your query.

  • PyHive implements DBAPI, can support use with SQLAlchemy, and has support for the Trino client package. However, it is designed to use the Hive query language, and not SQL. While both languages are similar, they are not identical and using the PyHive library can therefore result in unexpected query results or failures.

Examples #

The following example shows how to use the Python API to connect to a local cluster running without security to submit a single query and return the results.

import trino
conn = trino.dbapi.connect(
    host='localhost',
    port=8080,
    user='sep-user',
    catalog='system',
    schema='runtime',
)
cur = conn.cursor()
cur.execute('SELECT * FROM nodes')
rows = cur.fetchall()
for row in rows:
    print(row)

The next example runs the same query on a remote cluster secured with LDAP authentication. The user parameter is not needed for LDAP because you specify the username in the auth parameter. The catalog and schema parameters are not required for this query format, which specifies the entire catalog.schema.table path:

import trino
conn = trino.dbapi.connect(
    host='cluster.example.com',
    port=8443,
    http_scheme='https',
    auth=trino.auth.BasicAuthentication("ldap-username", "ldap-password"),
)
cur = conn.cursor()
cur.execute('SELECT * FROM system.runtime.nodes')
rows = cur.fetchall()
for row in rows:
    print(row)

The next example runs a query on a Starburst Galaxy cluster secured using HTTPS and the default port 433. This example uses username and password credentials for authentication and is appropriate for establishing a connection to any cluster that relies on basic authentication.

import trino
conn = trino.dbapi.connect(
    host='cluster.trino.galaxy.starburst.io',
    port=443,
    http_scheme='https',
    auth=trino.auth.BasicAuthentication("username", "password"),
)
cur = conn.cursor()
cur.execute('SELECT nationkey, name FROM tpch.sf1.nation')
rows = cur.fetchall()
for row in rows:
    print(row)