SELECT

Dremio supports querying using standard SELECT statements. You can query tables and views that are contained in Dremio's space entity (including your home space), connected sources, and Arctic catalogs.

When working with Apache Iceberg tables, you can query a table's metadata as well as run queries by snapshot ID.

note:

For Apache Iceberg tables, Dremio supports only the copy-on-write storage mechanism and reads only the latest data files for each Iceberg v2 table that you run SQL commands against. Dremio does not support Iceberg v2 tables that have merge-on-read manifests.

Syntax

[ WITH ... ]
SELECT [ ALL | DISTINCT ]

   {
    { * | <column_name1>, <column_name2>, ... } FROM { <table_name> | <view_name> | TABLE( <iceberg_metadata> ( '<table_name>' ) ) }
      [ { PIVOT | UNPIVOT } `<expression>` ]
      [ WHERE <condition> ]
      [ GROUP BY `<expression>` ]
      [ ORDER BY <column_name1>, <column_name2>, ... [ DESC ] ]
      [ LIMIT <count> ]

      [ AT {
        REF[ERENCE] <reference>
        | BRANCH <branch_name>
        | TAG <tag_name>
        | COMMIT <"commit_hash">
        | SNAPSHOT '<snapshot_id>'
        | <timestamp>
        }
      ]
    }

Parameters

{{< sql-section file="data/sql/SELECT-statements.json" data="queryingATable" >}}

Examples

Query an existing table in a data lake source

SELECT *
FROM Samples."samples.dremio.com"."zips.json"

Query a specified column in an existing table

SELECT city
FROM Samples."samples.dremio.com"."zips.json"

Query a table using the DISTINCT option to eliminate duplicates from the result set

SELECT DISTINCT city
FROM Samples."samples.dremio.com"."zips.json"

Query a table and filter the results using the WHERE clause

SELECT *
FROM Samples."samples.dremio.com"."zips.json"
WHERE state = 'MA' AND city = 'AGAWAM'

Query a table and filter the result using QUALIFY with window functions in the SELECT list

SELECT passenger_count, trip_distance_mi, fare_amount,
RANK() OVER (PARTITION BY passenger_count ORDER BY trip_distance_mi) AS pc_rank
FROM "NYC-taxi-trips"
QUALIFY pc_rank = 1

Query a table and filter the result using QUALIFY with window functions in the QUALIFY clause

SELECT passenger_count, trip_distance_mi, fare_amount
FROM "NYC-taxi-trips"
QUALIFY RANK() OVER (PARTITION BY passenger_count ORDER BY trip_distance_mi) = 1

Query a table and group and order the result by the specified expression

SELECT COUNT(city), city, state
FROM Samples."samples.dremio.com"."zips.json"
GROUP BY state, CITY
ORDER BY COUNT(city) DESC

Query an existing table using a CTE clause

WITH cte_quantity (Total)
    AS (
        SELECT SUM(passenger_count) as Total
        FROM Samples."samples.dremio.com"."NYC-taxi-trips" where passenger_count > 2
        GROUP BY pickup_datetime
        )
SELECT AVG(Total) average_pass
FROM cte_quantity

Query a table using the PIVOT and UNPIVOT clauses

ALTER DATASET Samples."samples.dremio.com"."SF weather 2018-2019.csv" REFRESH METADATA auto promotion FORCE UPDATE;

SELECT * FROM (
    SELECT  EXTRACT(YEAR FROM CAST(F AS DATE)) as "YEAR",
            EXTRACT(MONTH FROM CAST(F AS DATE)) as "MONTH",
            K as MAX_TEMP
    FROM Samples."samples.dremio.com"."SF weather 2018-2019.csv"
    where F `<>` 'DATE'
)
PIVOT (
    max(MAX_TEMP) for "MONTH" in (1 as JAN, 2 as FEB, 3 as MAR, 4 as APR, 5 as MAY, 6 as JUN, 7 as JUL, 8 as AUG, 9 as SEP, 10 as OCT, 11 as NOV, 12 as "DEC")
)
UNPIVOT (
    GLOBAL_MAX_TEMP for "MONTH" in (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, "DEC")
)
ORDER BY "YEAR", "MONTH";

Examples for Iceberg Tables

Time travel query on an Iceberg table using a timestamp

SELECT *
FROM myTable AT TIMESTAMP '2022-01-01 17:30:50.000'

Time travel query on Iceberg table using a snapshot ID

SELECT *
FROM myTable AT SNAPSHOT '5393090506354317772'

Query an Iceberg table using the table's history metadata and a snapshot ID

SELECT *
FROM TABLE(table_history('myTable'))
WHERE snapshot_id = 4593468819579153853

Query an Iceberg table to find the number of snapshots for a table

SELECT count(*)
FROM TABLE(table_snapshot('myTable'))
GROUP BY snapshot_id

Column Aliasing

If you specify an alias for a column or an expression in the SELECT clause, you can refer to that alias elsewhere in the query, including in the SELECT list or in the WHERE clause.

Example 1

SELECT c_custkey AS c, lower(c)
FROM "customer.parquet"

Example 2

SELECT c_custkey AS c, lower(c)
FROM (
  SELECT c_custkey, c_mktsegment AS c  
  FROM "customer.parquet")

Example 3

SELECT  c_name AS n, n
FROM (
  SELECT c_mktsegment AS n, c_name
  FROM "customer.parquet")
AS MY_TABLE
WHERE n = 'BUILDING'

Example 4

SELECT c_custkey
FROM (
  SELECT c_custkey, c_name AS c
  FROM "customer.parquet" )
WHERE c = 'aa'

Example 5

SELECT *
FROM (
  SELECT c_custkey AS c, c_name
  FROM "customer.parquet" )
JOIN "orders.parquet" ON c = o_orderkey

Example 6

SELECT c_custkey AS c
FROM "customer.parquet"
JOIN "orders.parquet" ON c = o_orderkey

Examples for Arctic Catalogs

Query a table using a (branch) reference

SELECT *
  FROM myCatalog.demo_table AT REF main_branch

Query a view using a commit

SELECT *
  FROM myCatalog.demo_view AT COMMIT "7f643f2b9cf250ce1f5d6ff4397237b705d866fbf34d714"

Distributing Data Evenly Across Execution Engines During Joins

You can use a BROADCAST hint if a query profile indicates that data involved in a join of two tables is heavily skewed and overloading one or more execution engines. The hint forces an even distribution of the data across all execution engines.

note:

These hints are ignored for nested-loop joins.

A BROADCAST hint must be used immediately after the name of a table.

Syntax of a BROADCAST hint

/*+ BROADCAST */

Example 1

SELECT *
  FROM T1 /*+ BROADCAST */
  INNER JOIN t2 ON t1.key = t2.key
  INNER JOIN t3 ON t2.key = t3.key

Example 2

SELECT *
  FROM T1
  INNER JOIN (select key, max(cost) cost from t2 /*+ BROADCAST */) T2 ON t1.key = t2.key
  INNER JOIN t3 ON t2.key = t3.key

SELECT

Parameters​

Examples​

Examples for Iceberg Tables​

Column Aliasing​

Examples for Arctic Catalogs​

Distributing Data Evenly Across Execution Engines During Joins​