Starburst Galaxy

  •  Get started

  •  Working with data

  •  Data engineering

  •  Developer tools

  •  Cluster administration

  •  Troubleshooting

  • Galaxy status

  •  Reference
  • Learn SQL basics #

    Welcome to the Learn SQL basics tutorial, where you explore useful ways to craft your SQL statements using the Sample dataset. The SQL statements start with SELECT expressions that return every row, and increase in complexity with the addition of clauses and operations like WHERE, GROUP BY, and ORDER BY.

    Requirements #

    To complete the tutorial using Starburst Galaxy, you must create a free trial or activate your user account.

    Once you login to Starburst Galaxy, a cluster named sample or free-cluster, containing a catalog named sample, is usually pre-configured and ready for querying.

    • If you do not see the sample catalog automatically available, create a Sample dataset catalog with the name sample.
    • If you do not see the free-cluster or sample cluster automatically available, create a cluster named sample and add the sample catalog.

    In the query editor, navigate to the Cluster explorer.

    1. In the navigation menu, expand the free-cluster or sample cluster to view its catalogs.
    2. Select the sample catalog.

        Query editor sample cluster explorer

    In the location drop-down menus, select the cluster and the catalog sample in order to run the queries without having to specify the full table path location in each query.

      Query editor use sample catalog button

    The namespace for a table is typically specified as catalog_name.schema_name.table_name. For the predefined sample dataset, this configuration is as follows:

    sample.demo.<table_name>
    

    If you choose to name your catalog differently, either adjust the queries in the tutorials accordingly or select the appropriate catalog in the location drop-down menus.

    If you choose to name your cluster differently, select the appropriate cluster in the location drop-down menus.

    SELECT statement #

    The SELECT clause specifies the output of the query. There are multiple Select expressions that can be utilized depending on the query requirements.

    Use the SELECT * expression and query the astronauts table to view the data available for analysis.

    SELECT
      *
    FROM
      sample.demo.astronauts;
    

    The astronauts table displays all the astronauts that are associated with a mission, which creates duplicate entries of the same astronaut because each row represents not only the astronaut but each mission the astronaut has completed.

    Use the SELECT * expression and query the missions table to view the data available for analysis.

    SELECT
      *
    FROM
      sample.demo.missions;
    

    Instead of using the SELECT * expression to return all the columns in the table, only return columns that are beneficial for your analysis of the astronauts table.

    SELECT
      name,
      nationality,
      mission_title,
      mission_number,
      hours_mission
    FROM
      sample.demo.astronauts;
    

    Instead of using the SELECT * expression to return all the columns in the table, only return columns that are beneficial for your analysis of the missions table.

    SELECT
      company_name,
      status_rocket,
      cost,
      status_mission
    FROM
      sample.demo.missions;
    

    WHERE clause #

    The WHERE clause is used to specify a condition and only returns records that satisfy the conditional criteria.

    There are many astronauts from the U.S. and U.S.S.R, so use the WHERE clause to alter the query and only view the astronauts of other countries.

    SELECT
      name,
      nationality,
      mission_title,
      mission_number,
      hours_mission
    FROM
      sample.demo.astronauts
    WHERE
      nationality NOT LIKE 'U.S.%';
    

    Use the WHERE clause to query the table and only return missions that are classified as a success.

    SELECT
      company_name,
      status_rocket,
      cost,
      status_mission
    FROM
      sample.demo.missions
    WHERE
        status_mission = 'Success';
    

    ORDER BY clause #

    The ORDER BY clause is used to sort a result set by one or more output expressions.

    Add a sort to the astronauts query to view the results in order of nationality and name.

    SELECT
      name,
      nationality,
      mission_title,
      mission_number,
      hours_mission
    FROM
      sample.demo.astronauts
    WHERE
      nationality NOT LIKE 'U.S.%'
    ORDER BY
      nationality,
      name;
    

    A gentle reminder that the astronauts table includes duplicate entries for selected astronauts; however, these astronauts have a unique identifier for each completed mission.

      Sample astronaut query result

    Aggregate functions #

    Aggregate functions operate on a set of values to compute a single result.

    Use one query to find three different aggregate values. Discover the amount of trips, the longest mission in hours, and the smallest mission in hours from select countries.

    SELECT
      count() as trips,
      max(hours_mission) as longest_mission,
      min(hours_mission) as shortest_mission
    FROM
      sample.demo.astronauts
    WHERE
      nationality NOT LIKE 'U.S.%';
    

    Instead of evaluating all the data together, increase the granularity level and evaluate the aggregates specifically for each country.

    SELECT
      nationality,
      count() AS number_trips,
      max(hours_mission) AS longest_time,
      min(hours_mission) AS shortest_time
    FROM
      sample.demo.astronauts
    WHERE
      nationality NOT LIKE 'U.S.%'
    GROUP BY
      nationality;
    

    Add an ORDER BY clause and sort the results by most trips per country. In the case of a tie, sort additionally by longest mission.

    SELECT
      nationality,
      count() AS number_trips,
      max(hours_mission) AS longest_time,
      min(hours_mission) AS shortest_time
    FROM
      sample.demo.astronauts
    WHERE
      nationality NOT LIKE 'U.S.%'
    GROUP BY
      nationality
    ORDER BY
      number_trips DESC,
      longest_time DESC;
    

    Revisit the previous missions query and add an Aggregate function to determine which company spent the most money on successful missions. Order this query by the total highest cost.

    SELECT
      company_name,
      sum(cost),
      status_rocket
    FROM
      sample.demo.missions
    WHERE
      status_mission = 'Success'
    GROUP BY
      company_name,
      status_rocket
    ORDER BY
      sum(cost) DESC;
    

    Next steps #

    Check out our other tutorials, or dive right into the SQL documentation and experiment with your own data.