Learn SQL basics#
Welcome to the SQL basics tutorial, which describes useful ways to craft
your SQL statements using a sample dataset. The SQL statements start with
SELECT
expressions that return every row, and increase in complexity with the
addition of clauses and operations like WHERE
, GROUP BY
, and ORDER BY
.
Note
The Sample
cluster used in the following example is not available in Starburst
Enterprise (SEP). For more information on the sample dataset used in this
tutorial, see the Starburst Galaxy Sample
dataset documentaion.
SELECT statement#
The SELECT
clause specifies the output of the query. There are multiple
Select expressions that can be utilized depending on the
query requirements.
Use the SELECT *
expression and query the astronauts
table to view the
data available for analysis.
SELECT
*
FROM
sample.demo.astronauts;
The astronauts table displays all the astronauts that are associated with a mission, which creates duplicate entries of the same astronaut because each row represents not only the astronaut but each mission the astronaut has completed.
Use the SELECT *
expression and query the missions
table to view the data
available for analysis.
SELECT
*
FROM
sample.demo.missions;
Instead of using the SELECT *
expression to return all the columns in the
table, only return columns that are beneficial for your analysis of the
astronauts
table.
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
sample.demo.astronauts;
Instead of using the SELECT *
expression to return all the columns in the
table, only return columns that are beneficial for your analysis of the
missions
table.
SELECT
company_name,
status_rocket,
cost,
status_mission
FROM
sample.demo.missions;
WHERE clause#
The WHERE
clause is used to specify a condition and only returns records that
satisfy the conditional criteria.
There are many astronauts from the U.S. and U.S.S.R, so use the WHERE
clause
to alter the query and only view the astronauts of other countries.
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
sample.demo.astronauts
WHERE
nationality NOT LIKE 'U.S.%';
Use the WHERE
clause to query the table and only return missions that are
classified as a success.
SELECT
company_name,
status_rocket,
cost,
status_mission
FROM
sample.demo.missions
WHERE
status_mission = 'Success';
ORDER BY clause#
The ORDER BY
clause is used to sort a result set by one or more output
expressions.
Add a sort to the astronauts query to view the results in order of nationality and name.
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
sample.demo.astronauts
WHERE
nationality NOT LIKE 'U.S.%'
ORDER BY
nationality,
name;
Note
A gentle reminder that the astronauts table includes duplicate entries for selected astronauts; however, these astronauts have a unique identifier for each completed mission.
Aggregate functions#
Aggregate functions operate on a set of values to compute a single result.
Use one query to find three different aggregate values. Discover the amount of trips, the longest mission in hours, and the smallest mission in hours from select countries.
SELECT
count() as trips,
max(hours_mission) as longest_mission,
min(hours_mission) as shortest_mission
FROM
sample.demo.astronauts
WHERE
nationality NOT LIKE 'U.S.%';
Instead of evaluating all the data together, increase the granularity level and evaluate the aggregates specifically for each country.
SELECT
nationality,
count() AS number_trips,
max(hours_mission) AS longest_time,
min(hours_mission) AS shortest_time
FROM
sample.demo.astronauts
WHERE
nationality NOT LIKE 'U.S.%'
GROUP BY
nationality;
Add an ORDER BY
clause and sort the results by most trips per country. In the
case of a tie, sort additionally by longest mission.
SELECT
nationality,
count() AS number_trips,
max(hours_mission) AS longest_time,
min(hours_mission) AS shortest_time
FROM
sample.demo.astronauts
WHERE
nationality NOT LIKE 'U.S.%'
GROUP BY
nationality
ORDER BY
number_trips DESC,
longest_time DESC;
Revisit the previous missions
query and add an Aggregate functions to determine which company spent the most money on
successful missions. Order this query by the total highest cost.
SELECT
company_name,
sum(cost),
status_rocket
FROM
sample.demo.missions
WHERE
status_mission = 'Success'
GROUP BY
company_name,
status_rocket
ORDER BY
sum(cost) DESC;
Other useful resources#
Next steps#
Check out our other SQL tutorials, or dive right into the SQL documentation and experiment with your own data.