Skip to main content

Tables

The SQL commands for tables enable you to create and work with tables in the scratch directory and in data lake sources (such as Amazon S3). Apache Iceberg tables are created in AWS Glue data sources, Amazon S3 data sources, or external Nessie data sources.

Tables created in these sources can be queried, altered, analyzed, and dropped by users and groups with the appropriate permissions. For information about querying tables (including Apache Iceberg tables), see SELECT.

Apache Iceberg Tables

Prerequisites for Apache Iceberg Tables

Before you attempt to create Iceberg tables, ensure that you are using an AWS Glue, Amazon S3, or external Nessie data source. Also ensure that any existing Apache Iceberg tables that you want to use in your data source are at Apache Iceberg version 2.

Default Table Formats Used for New Tables

AWS Glue data sources added to projects default to using the Apache Iceberg table format.

Amazon S3 data sources added to projects default to using the Apache Parquet table format. Follow these steps to ensure that the default table format for new tables is Apache Iceberg:

  1. In Dremio, click the Amazon S3 data source.
  2. Click the gear icon in the top-right corner above the list of the data source's contents.
  3. On the Advanced Options page of the Edit Source dialog, select ICEBERG under Default CTAS Format.
  4. Click Save.

Locations in which Iceberg Tables are Created

Where the CREATE TABLE command creates a table depends on the type of data source being used.

Location in AWS Glue Data Sources

The root directory is assumed by default to be /user/hive/warehouse.

If you want to create tables in a different location, you must specify the S3 address of an Amazon S3 bucket in which to create them:

  1. In Dremio Cloud, click the AWS Glue data source.
  2. Click the gear icon in the top-right corner above the list of the data source's contents.
  3. On the Advanced Options page of the Edit Source dialog, add this connection property: hive.metastore.warehouse.dir
  4. Set the value to the S3 address of an S3 bucket.

To the root location are appended the schema path and table name to determine the default physical location for a new table. For example, this CREATE TABLE command creates the table table_A in the directory <rootdir>/database/schema/table_A

CREATE TABLE command
CREATE TABLE database.schema.table_A

Location in Amazon S3 Data Sources

The root physical location is the main root directory for the filesystem. From this location, the path and table name are appended to determine the physical location for a new table.

For example, this CREATE TABLE command creates the table table_A in the directory rootdir/folder1/folder2/table_A:

CREATE TABLE command
CREATE TABLE <Amazon_S3_data_source>.folder1.folder2.table_A

Location in Nessie Data Sources

Top-level Nessie schemas have a configurable physical storage. This is used as the default root physical location.

In the project store each top level Nessie schema has its own directory path. So for example in the project’s Nessie the top level schema “marketing” would be located in “project_store/marketing” and this directory would be used by default as the root physical location. From there, the same schema.table resolution as described for Hive above would apply.

Sources and Privileges

In order to use these SQL commands, the appropriate privileges must be set for each data source that you want to work with, including the spaces of other users in your project and the connected data lake sources. For information about setting privileges, see Privileges.

Creating a Table (CTAS)

Tables that are created using the CREATE TABLE AS (CTAS) command are in Parquet file format, which can be formatted to a table.

CREATE TABLE AS command
CREATE TABLE [IF NOT EXISTS] <table_path>
[ PARTITION BY ( <column_name1>, <column_name2>, ... ) ]



[LOCALSORT BY ( <column_name> ) ]
AS <query>

Parameters

{{< sql-section file="data/sql/tables.json" data="creatingATable" >}}

Parameters for Iceberg Tables Only

{{< sql-section file="data/sql/tables.json" data="creatingAnIcebergTable" >}}

Optional Parameters

{{< sql-section file="data/sql/tables.json" data="creatingATableOptional" >}}

Examples

note:

The following examples use Dremio’s Samples Data Lake, which includes a number of sample datasets that you can practice with.

Create table in your project store
CREATE TABLE $scratch.demo_table AS
SELECT *
FROM Samples."samples.dremio.com"."zips.json";
Create a table in your project store using PARTITION BY and LOCALSORT
CREATE TABLE $scratch.demo_table2
PARTITION BY (state)
LOCALSORT BY (city)
AS SELECT *
FROM Samples."samples.dremio.com"."zips.json";

Describing a Table

The DESCRIBE TABLE command is used to provide high-level information regarding the overall column properties of an existing dataset.

Describe table syntax
DESCRIBE TABLE <table_name>;

Example

Describing a table
DESCRIBE TABLE taxistats;

Once the query is run, Dremio displays the following:

  • Column headers indicating the type of information being described
  • Rows for each table column with their associated attributes described, such as data types, properties, and policies
COLUMN_NAMEDATA_TYPEIS_NULLABLENUMERIC_PRECISIONNUMERIC_SCALEEXTENDED_PROPERTIESMASKING_POLICY
pickup_datetimeTIMESTAMPYESnullnull[][]
passenger_countBIGINTYES640[]count_hide
trip_distance_miDOUBLEYES53null[][]
fare_amountDOUBLEYES53null[][]
tip_amountDOUBLEYES53null[][]
total_amountDOUBLEYES53null[][]

The cells containing a '[]' indicate "empty" values.

Altering a Table

The ALTER command updates a table’s schema in a new snapshot.

Alter command syntax
ALTER TABLE <table_path>
[ ADD COLUMNS ( <column_name1> <data_type1>( <size1> ), <column_name2> <data_type2>( <size2> ), ...) ]
| [ DROP COLUMN <column_name> ]
| [ { ALTER | MODIFY } COLUMN <source_column_name> <source_column_name> <new_data_type>( <size> ) ]
| [ REFRESH METADATA ] [ FOR PARTITIONS ( <partition_name> = '<value>' ) ]
[ { AVOID | AUTO } PROMOTION ]
[ { FORCE | LAZY } UPDATE ]
[ { MAINTAIN | DELETE } WHEN MISSING ]
| [ FORGET METADATA ];

Parameters

{{< sql-section file="data/sql/tables.json" data="alteringATable" >}}

Optional Parameters

{{< sql-section file="data/sql/tables.json" data="alteringATableOptional" >}}

REFRESH METADATA

{{< api-section file="data/sql/tables.json" data="refreshOptions" >}}

Examples

note:

The following examples use Dremio’s Samples Data Lake, which includes a number of sample datasets that you can practice with.

Add a column
ALTER TABLE $scratch.demo_table ADD COLUMNS (county varchar)
Modify a column
ALTER TABLE $scratch.demo_table MODIFY COLUMN _id _id bigint
Refresh all the metadata for a table
ALTER TABLE $scratch.demo_table REFRESH METADATA
Refresh all the metadata for a table using optional clauses
ALTER TABLE $scratch.demo_table REFRESH METADATA
AUTO PROMOTION
LAZY UPDATE
MAINTAIN WHEN MISSING
Refresh the metadata for a single partition
ALTER TABLE Samples."samples.dremio.com"."zips.json"  
REFRESH METADATA FOR PARTITIONS (state = 'TX')
Refresh the metadata for a single partition using optional clauses
ALTER TABLE Samples."samples.dremio.com"."zips.json"  
REFRESH METADATA FOR PARTITIONS (state = 'TX')
AUTO PROMOTION
LAZY UPDATE
MAINTAIN WHEN MISSING
Forget the metadata for a table
ALTER TABLE $scratch.demo_table FORGET METADATA

Analyzing a Table

Computes and deletes statistics for tables, including:

  • Estimated number of distinct values
  • Number of rows
  • Number of null values
Syntax for analyzing a table
ANALYZE TABLE <table_name>
FOR { ALL COLUMNS | COLUMNS } ( <column_name1>, <column_name2>, ... )
{ COMPUTE | DELETE } STATISTICS

Parameters

{{< sql-section file="data/sql/tables.json" data="analyzingATable" >}}

Examples

Analyze a table example
ANALYZE TABLE $scratch.demo_table
FOR ALL COLUMNS
COMPUTE STATISTICS
Analyze the specified columns for a table
ANALYZE TABLE $scratch.demo_table
FOR COLUMNS (state, city)
COMPUTE STATISTICS

Dropping a Table

note:

When dropping an Apache Iceberg table:

  • For external Nessie data sources, the DROP TABLE command logically removes a table from the source. Even though the table is removed from the catalog, it still physically exists in storage until garbage collection removes it.
  • For Amazon S3 data sources, the DROP TABLE command logically removes a table from the source and physically removes all files associated with the table. After the table is dropped, it is permanently deleted and cannot be restored.
  • For AWS Glue data sources, the DROP TABLE command removes a table from the catalog. The datafiles are not deleted from the warehouse.
Drop table syntax
DROP TABLE <table_path>

Parameters

{{< sql-section file="data/sql/tables.json" data="droppingATable" >}}

Examples

Drop table example
DROP TABLE $scratch.demo_table