Tables

The SQL commands for tables enable you to create and work with tables in the scratch directory and in data lake sources (such as Amazon S3). Apache Iceberg tables are created in AWS Glue data sources, Amazon S3 data sources, or external Nessie data sources.

Tables created in these sources can be queried, altered, analyzed, and dropped by users and groups with the appropriate permissions. For information about querying tables (including Apache Iceberg tables), see SELECT.

Apache Iceberg Tables

Prerequisites for Apache Iceberg Tables

Before you attempt to create Iceberg tables, ensure that you are using an AWS Glue, Amazon S3, or external Nessie data source. Also ensure that any existing Apache Iceberg tables that you want to use in your data source are at Apache Iceberg version 2.

Default Table Formats Used for New Tables

AWS Glue data sources added to projects default to using the Apache Iceberg table format.

Amazon S3 data sources added to projects default to using the Apache Parquet table format. Follow these steps to ensure that the default table format for new tables is Apache Iceberg:

In Dremio, click the Amazon S3 data source.
Click the gear icon in the top-right corner above the list of the data source's contents.
On the Advanced Options page of the Edit Source dialog, select ICEBERG under Default CTAS Format.
Click Save.

Locations in which Iceberg Tables are Created

Where the CREATE TABLE command creates a table depends on the type of data source being used.

Location in AWS Glue Data Sources

The root directory is assumed by default to be /user/hive/warehouse.

If you want to create tables in a different location, you must specify the S3 address of an Amazon S3 bucket in which to create them:

In Dremio Cloud, click the AWS Glue data source.
Click the gear icon in the top-right corner above the list of the data source's contents.
On the Advanced Options page of the Edit Source dialog, add this connection property: hive.metastore.warehouse.dir
Set the value to the S3 address of an S3 bucket.

To the root location are appended the schema path and table name to determine the default physical location for a new table. For example, this CREATE TABLE command creates the table table_A in the directory <rootdir>/database/schema/table_A

CREATE TABLE command

CREATE TABLE database.schema.table_A

Location in Amazon S3 Data Sources

The root physical location is the main root directory for the filesystem. From this location, the path and table name are appended to determine the physical location for a new table.

For example, this CREATE TABLE command creates the table table_A in the directory rootdir/folder1/folder2/table_A:

CREATE TABLE command

CREATE TABLE <Amazon_S3_data_source>.folder1.folder2.table_A

Location in Nessie Data Sources

Top-level Nessie schemas have a configurable physical storage. This is used as the default root physical location.

In the project store each top level Nessie schema has its own directory path. So for example in the project’s Nessie the top level schema “marketing” would be located in “project_store/marketing” and this directory would be used by default as the root physical location. From there, the same schema.table resolution as described for Hive above would apply.

Sources and Privileges

In order to use these SQL commands, the appropriate privileges must be set for each data source that you want to work with, including the spaces of other users in your project and the connected data lake sources. For information about setting privileges, see Privileges.

Creating a Table (CTAS)

Tables that are created using the CREATE TABLE AS (CTAS) command are in Parquet file format, which can be formatted to a table.

CREATE TABLE AS command

CREATE TABLE [IF NOT EXISTS] <table_path>
  [ PARTITION BY ( <column_name1>, <column_name2>, ... ) ]

  [LOCALSORT BY ( <column_name> ) ]
  AS <query>

Parameters

{{< sql-section file="data/sql/tables.json" data="creatingATable" >}}

Parameters for Iceberg Tables Only

{{< sql-section file="data/sql/tables.json" data="creatingAnIcebergTable" >}}

Optional Parameters

{{< sql-section file="data/sql/tables.json" data="creatingATableOptional" >}}

Examples

note:

The following examples use Dremio’s Samples Data Lake, which includes a number of sample datasets that you can practice with.

Create table in your project store

CREATE TABLE $scratch.demo_table AS
  SELECT *
  FROM Samples."samples.dremio.com"."zips.json";

Create a table in your project store using PARTITION BY and LOCALSORT

CREATE TABLE $scratch.demo_table2
  PARTITION BY (state)
  LOCALSORT BY (city)
  AS SELECT *
  FROM Samples."samples.dremio.com"."zips.json";

Describing a Table

The DESCRIBE TABLE command is used to provide high-level information regarding the overall column properties of an existing dataset.

Describe table syntax

DESCRIBE TABLE <table_name>;

Example

Describing a table

DESCRIBE TABLE taxistats;

Once the query is run, Dremio displays the following:

Column headers indicating the type of information being described
Rows for each table column with their associated attributes described, such as data types, properties, and policies

COLUMN_NAME	DATA_TYPE	IS_NULLABLE	NUMERIC_PRECISION	NUMERIC_SCALE	EXTENDED_PROPERTIES	MASKING_POLICY
pickup_datetime	TIMESTAMP	YES	null	null	[]	[]
passenger_count	BIGINT	YES	64	0	[]	count_hide
trip_distance_mi	DOUBLE	YES	53	null	[]	[]
fare_amount	DOUBLE	YES	53	null	[]	[]
tip_amount	DOUBLE	YES	53	null	[]	[]
total_amount	DOUBLE	YES	53	null	[]	[]

The cells containing a '[]' indicate "empty" values.

Altering a Table

The ALTER command updates a table’s schema in a new snapshot.

Alter command syntax

ALTER TABLE <table_path>
[ ADD COLUMNS ( <column_name1> <data_type1>( <size1> ), <column_name2> <data_type2>( <size2> ), ...) ]
| [ DROP COLUMN <column_name> ]
| [ { ALTER | MODIFY } COLUMN <source_column_name> <source_column_name> <new_data_type>( <size> ) ]
| [ REFRESH METADATA ] [ FOR PARTITIONS ( <partition_name> = '<value>' ) ]
    [ { AVOID | AUTO } PROMOTION ]
    [ { FORCE | LAZY } UPDATE ]
    [ { MAINTAIN | DELETE } WHEN MISSING ]
| [ FORGET METADATA ];

Parameters

{{< sql-section file="data/sql/tables.json" data="alteringATable" >}}

Optional Parameters

{{< sql-section file="data/sql/tables.json" data="alteringATableOptional" >}}

REFRESH METADATA

{{< api-section file="data/sql/tables.json" data="refreshOptions" >}}

Examples

note:

The following examples use Dremio’s Samples Data Lake, which includes a number of sample datasets that you can practice with.

Add a column

ALTER TABLE $scratch.demo_table ADD COLUMNS (county varchar)

Modify a column

ALTER TABLE $scratch.demo_table MODIFY COLUMN _id _id bigint

Refresh all the metadata for a table

ALTER TABLE $scratch.demo_table REFRESH METADATA

Refresh all the metadata for a table using optional clauses

ALTER TABLE $scratch.demo_table REFRESH METADATA
  AUTO PROMOTION
  LAZY UPDATE
  MAINTAIN WHEN MISSING

Refresh the metadata for a single partition

ALTER TABLE Samples."samples.dremio.com"."zips.json"  
  REFRESH METADATA FOR PARTITIONS (state = 'TX')

Refresh the metadata for a single partition using optional clauses

ALTER TABLE Samples."samples.dremio.com"."zips.json"  
  REFRESH METADATA FOR PARTITIONS (state = 'TX')
  AUTO PROMOTION
  LAZY UPDATE
  MAINTAIN WHEN MISSING

Forget the metadata for a table

ALTER TABLE $scratch.demo_table FORGET METADATA

Analyzing a Table

Computes and deletes statistics for tables, including:

Estimated number of distinct values
Number of rows
Number of null values

Syntax for analyzing a table

ANALYZE TABLE <table_name>
  FOR { ALL COLUMNS |  COLUMNS } ( <column_name1>, <column_name2>, ... )
  { COMPUTE | DELETE } STATISTICS

Parameters

{{< sql-section file="data/sql/tables.json" data="analyzingATable" >}}

Examples

Analyze a table example

ANALYZE TABLE $scratch.demo_table
  FOR ALL COLUMNS
  COMPUTE STATISTICS

Analyze the specified columns for a table

ANALYZE TABLE $scratch.demo_table
  FOR COLUMNS (state, city)
  COMPUTE STATISTICS

Dropping a Table

note:

When dropping an Apache Iceberg table:

For external Nessie data sources, the DROP TABLE command logically removes a table from the source. Even though the table is removed from the catalog, it still physically exists in storage until garbage collection removes it.
For Amazon S3 data sources, the DROP TABLE command logically removes a table from the source and physically removes all files associated with the table. After the table is dropped, it is permanently deleted and cannot be restored.
For AWS Glue data sources, the DROP TABLE command removes a table from the catalog. The datafiles are not deleted from the warehouse.

Drop table syntax

DROP TABLE <table_path>

Parameters

{{< sql-section file="data/sql/tables.json" data="droppingATable" >}}

Examples

Drop table example

DROP TABLE $scratch.demo_table

Tables

Apache Iceberg Tables​

Prerequisites for Apache Iceberg Tables​

Default Table Formats Used for New Tables​

Locations in which Iceberg Tables are Created​

Location in AWS Glue Data Sources​

Location in Amazon S3 Data Sources​

Location in Nessie Data Sources​

Sources and Privileges​

Creating a Table (CTAS)​

Parameters​

Parameters for Iceberg Tables Only​

Optional Parameters​

Examples​

Describing a Table​

Example​

Altering a Table​

Parameters​

Optional Parameters​

REFRESH METADATA​

Examples​

Analyzing a Table​

Parameters​

Examples​

Dropping a Table​

Parameters​

Examples​

Apache Iceberg Tables

Prerequisites for Apache Iceberg Tables

Default Table Formats Used for New Tables

Locations in which Iceberg Tables are Created

Location in AWS Glue Data Sources

Location in Amazon S3 Data Sources

Location in Nessie Data Sources

Sources and Privileges

Creating a Table (CTAS)

Parameters

Parameters for Iceberg Tables Only

Optional Parameters

Examples

Describing a Table

Example

Altering a Table

Parameters

Optional Parameters

REFRESH METADATA

Examples

Analyzing a Table

Parameters

Examples

Dropping a Table

Parameters

Examples