Skip to main content
Version: current

Querying Apache Iceberg Tables

The SELECT command queries table data and table metadata, and can query by timestamp and snapshot ID.

note:

Dremio supports only the copy-on-write storage mechanism and reads only the latest data files for each Iceberg v2 table that you run SQL commands against. Dremio does not support Iceberg v2 tables that have merge-on-read manifests.

Querying a Table's Data

Syntax
SELECT [ <column1_name> [ , <column2_name> ... ] | * ] 
FROM <table_path>.<table_name>
[WHERE where_condition]
[constructs]

Parameters

{{< sql-section file="data/sql/apache-iceberg-tables.json" data="queryingATable" >}}

Metadata Queries

Iceberg includes helpful system-table references which provide an easy access to Iceberg-specific information on tables, including:

  • The data files for a table
  • The history of a table
  • The manifest files for a table
  • The snapshots for a table

Querying a Table's Data File Metadata

Queries use the table_files() function.

To run a SELECT command, the user must have Dremio's SELECT privilege.

Syntax
SELECT * 
FROM TABLE( table_files('<table_path>.<table_name>') )

Dremio returns records that have these fields:

ColumnData TypeDescription
file_pathVARCHARFull file path and name
file_formatVARCHARFormat, e.g. PARQUET
record_countBIGINTNumber of rows
file_size_in_bytesBIGINTSize of file
column_sizesVARCHARList of columns with size of each column
value_countsVARCHARList of columns with number of records with a value
null_value_countsVARCHARList of columns with number of records as NULL
nan_value_countsVARCHARList of columns with number of records as NaN
lower_boundsVARCHARList of columns with lower bound of each
upper_boundsVARCHARList of columns with upper bound of each
key_metadataVARCHARKey metrics
split_offsetsVARCHARSplit offsets

Parameters

{{< sql-section file="data/sql/apache-iceberg-tables.json" data="metadataQueries" >}}

Querying a Table's History Metadata

Queries use the table_history() function.

To run a SELECT command, the user must have Dremio's SELECT privilege.

Syntax
SELECT * 
FROM TABLE( table_history('<table_path>.<table_name>') )

Dremio returns records that have these fields:

ColumnData TypeDescription
made_current_atTIMESTAMPThe timestamp the Iceberg snapshot was made at
snapshot_idVARCHARThe Iceberg snapshot ID
parent_idVARCHARThe parent snapshot ID, null if not exists
is_current_ancestorBOOLEANIf the snapshot is part of the current history, shows abandoned snapshots

Example

Example with table_history()
SELECT * 
FROM TABLE( table_history('myTable'))
WHERE snapshot_id = 4593468819579153853

Querying a Table's Manifest File Metadata

Queries use the table_manifests() function.

To run a SELECT command, the user must have Dremio's SELECT privilege.

Syntax
SELECT * 
FROM TABLE( table_manifests('<table_path>.<table_name>') )

Dremio returns records that have these fields:

ColumnData TypeDescription
pathVARCHARFull path and name of manifest file
lengthBIGINTSize in bytes
partition_spec_idVARCHAR
added_snapshot_idVARCHARID of snapshot added to manifest
added_data_files_countBIGINTNumber of new data files added
existing_data_files_countBIGINTNumber of existing data files
deleted_data_files_countBIGINTNumber of files removed
partition_summariesVARCHARPartition information

Querying a Table's Snapshot Metadata

Queries use the table_snapshot() function.

To run a SELECT command, the user must have Dremio's SELECT privilege.

Syntax
SELECT * 
FROM TABLE( table_snapshot('<table_path>.<table_name>') )

Dremio returns records that have these fields:

ColumnData TypeDescription
committed_atTIMESTAMPThe timestamp the Iceberg snapshot was committed
snapshot_idVARCHARThe Iceberg snapshot ID
parent_idVARCHARThe parent snapshot ID, null if not exists
operationVARCHARThe Iceberg operation (e.g. append)
manifest_listVARCHARList of manifest files for the snapshot
summaryVARCHARAdditional attributes (records added, etc)

Example

Finds the number of snapshots for a table
SELECT count(*) 
FROM TABLE( table_snapshot('myTable'))
GROUP BY snapshot_id

Time Travel Queries

Iceberg tables support both TIMESTAMP-based references and Snapshot ID-based references to specify an earlier version of a table to read.

Time Travel by Timestamps

The query returns an error stating that the reference for this table was out of range if either of these two conditions is true:

  • The value of the timestamp is some time in the future, even by one second.
  • The value of the timestamp is older than the oldest valid snapshot for the table.

To run a SELECT command, the user must have Dremio's SELECT privilege.

Syntax
SELECT <column_name> 
FROM <table_path>.<table_name> AT <timestamp>

Parameters

{{< sql-section file="data/sql/apache-iceberg-tables.json" data="timeTravelTimestamps" >}}

Example

Time travel query on an Iceberg table using a timestamp
SELECT count(*) 
FROM my_table AT TIMESTAMP '2022-07-01 01:30:00.000'

Time Travel by Snapshot ID

To run a SELECT command, the user must have Dremio's SELECT privilege.

Syntax
SELECT <column_name> 
FROM <table_path>.<table_name> AT SNAPSHOT '<snapshot-id>'

Parameters

{{< sql-section file="data/sql/apache-iceberg-tables.json" data="timeTravelSnapshots" >}}

Example

Time travel query on an Iceberg table using a snapshot ID
SELECT *
FROM myTable AT SNAPSHOT '5393090506354317772'