Version: 0.0.1

SELECT

Dremio supports querying using standard SELECT statements. You can query tables and views that are contained in Dremio's space entity (including your home space) and in the sources that are connected to Dremio (such as Amazon S3 and Oracle).

When working with Apache Iceberg tables, you can query a table's metadata as well as run queries by snapshot ID.

note:

For Apache Iceberg tables, Dremio supports only the copy-on-write storage mechanism and reads only the latest data files for each Iceberg v2 table that you run SQL commands against. Dremio does not support Iceberg v2 tables that have merge-on-read manifests.

Querying a Table's Data

[ WITH ... ] 
SELECT [ ALL | DISTINCT ] 

  {
    { * | <column_name1>, <column_name2>, ... } FROM { <table_name> | <view_name> | TABLE( <iceberg_metadata> ( '<table_name>' ) ) }
    [ { PIVOT | UNPIVOT } ( `<expression>` ) ]
    [ JOIN `<expression>` ]
    [ WHERE <condition> ]
    [ GROUP BY `<expression>` ]
    [ QUALIFY `<expression>` ]
    [ ORDER BY <column_name1>, <column_name2>, ... [ DESC ] ]
    [ LIMIT <count> ]

    [ AT {
      SNAPSHOT '<snapshot_id>'
      | <timestamp>
      }
    ]
  }

Parameters

{{< sql-section file="data/sql/SELECT-statements-sw.json" data="queryingATable" >}}

Examples

{{< codeheader "Query an existing table in a data lake source" >}}

SELECT * 
FROM Samples."samples.dremio.com"."zips.json"

{{< codeheader "Query a specified column in an existing table" >}}

SELECT city 
FROM Samples."samples.dremio.com"."zips.json"

{{< codeheader "Query a table using the DISTINCT option to eliminate duplicates from the result set" >}}

SELECT DISTINCT city 
FROM Samples."samples.dremio.com"."zips.json"

{{< codeheader "Query a table and filter the results using the WHERE clause" >}}

SELECT * 
FROM Samples."samples.dremio.com"."zips.json"
WHERE state = 'MA' AND city = 'AGAWAM'

{{< codeheader "Query a table and group and order the result by the specified expression" >}}

SELECT COUNT(city), city, state
FROM Samples."samples.dremio.com"."zips.json"
GROUP BY state, CITY
ORDER BY COUNT(city) DESC

{{< codeheader "Query a table and filter the result using QUALIFY with window functions in the SELECT list" >}}

SELECT passenger_count, trip_distance_mi, fare_amount,
RANK() OVER (PARTITION BY passenger_count ORDER BY trip_distance_mi) AS pc_rank
FROM "NYC-taxi-trips"
QUALIFY pc_rank = 1

{{< codeheader "Query a table and filter the result using QUALIFY with window functions in the QUALIFY clause" >}}

SELECT passenger_count, trip_distance_mi, fare_amount
FROM "NYC-taxi-trips"
QUALIFY RANK() OVER (PARTITION BY passenger_count ORDER BY trip_distance_mi) = 1

{{< codeheader "Query a table using the PIVOT and UNPIVOT clauses" >}}

ALTER DATASET Samples."samples.dremio.com"."SF weather 2018-2019.csv" REFRESH METADATA auto promotion FORCE UPDATE;

SELECT * FROM (
    SELECT  EXTRACT(YEAR FROM CAST(F AS DATE)) as "YEAR",
            EXTRACT(MONTH FROM CAST(F AS DATE)) as "MONTH",
            K as MAX_TEMP
    FROM Samples."samples.dremio.com"."SF weather 2018-2019.csv"
    where F `<>` 'DATE'
)
PIVOT (
    max(MAX_TEMP) for "MONTH" in (1 as JAN, 2 as FEB, 3 as MAR, 4 as APR, 5 as MAY, 6 as JUN, 7 as JUL, 8 as AUG, 9 as SEP, 10 as OCT, 11 as NOV, 12 as "DEC")
)
UNPIVOT (
    GLOBAL_MAX_TEMP for "MONTH" in (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, "DEC")
)
ORDER BY "YEAR", "MONTH";

SELECT
  c.c_nationkey,
  o.o_orderdate,
  o.o_shippriority
FROM
  (NAS2."customer.parquet" c
  left join
  NAS2."orders.parquet" as o
ON c.c_custkey = o.o_custkey)

SELECT
  c.c_nationkey,
  o.o_orderdate,
  o.o_shippriority
FROM
  (NAS2."customer.parquet" c
  right join
  NAS2."orders.parquet" as o
ON c.c_custkey = o.o_custkey)

SELECT
  c.c_nationkey,
  o.o_orderdate,
  o.o_shippriority
FROM
  (NAS2."customer.parquet" c
  full join
  NAS2."orders.parquet" as o
ON c.c_custkey = o.o_custkey)

SELECT
  c.c_nationkey,
  o.o_orderdate,
  o.o_shippriority
FROM
  (NAS2."customer.parquet" c
  inner join
  NAS2."orders.parquet" as o
ON c.c_custkey = o.o_custkey)

SELECT
  c.c_nationkey,
  o.o_orderdate,
  o.o_shippriority
FROM
  (NAS2."customer.parquet" c
  cross join
  NAS2."orders.parquet" as o)

Examples for Iceberg Tables

{{< codeheader "Time travel query on an Iceberg table using a timestamp" >}}

SELECT count(*) 
FROM my_table AT TIMESTAMP '2022-07-01 01:30:00.000'

{{< codeheader "Time travel query on an Iceberg table using a snapshot ID" >}}

SELECT * 
FROM myTable AT SNAPSHOT '5393090506354317772'

{{< codeheader "Query an Iceberg table using the table's history metadata and a snapshot ID" >}}

SELECT * 
FROM TABLE(table_history('myTable')) 
WHERE snapshot_id = 4593468819579153853

{{< codeheader "Query an Iceberg table to find the number of snapshots for a table" >}}

SELECT count(*) 
FROM TABLE(table_snapshot('myTable')) 
GROUP BY snapshot_id

Distributing Data Evenly Across Executor Nodes During Joins

You can use a BROADCAST hint if a query profile indicates that data involved in a join of two tables is heavily skewed and overloading one or more executor nodes. The hint forces an even distribution of the data across all executor nodes.

note:

These hints are ignored for nested-loop joins.

A BROADCAST hint must be used immediately after the name of a table.

/*+ BROADCAST */

SELECT * 
  FROM T1 /*+ BROADCAST */ 
  INNER JOIN t2 ON t1.key = t2.key 
  INNER JOIN t3 ON t2.key = t3.key

SELECT * 
  FROM T1 
  INNER JOIN (select key, max(cost) cost from t2 /*+ BROADCAST */) T2 ON t1.key = t2.key 
  INNER JOIN t3 ON t2.key = t3.key

SELECT

Querying a Table's Data​

Parameters​

Examples​

Examples for Iceberg Tables​

Distributing Data Evenly Across Executor Nodes During Joins​

Querying a Table's Data

Parameters

Examples

Examples for Iceberg Tables

Distributing Data Evenly Across Executor Nodes During Joins