Tables
The SQL commands for tables enable you to create and work with tables in the scratch directory and in data lake sources (such as Amazon S3). Apache Iceberg tables are created in AWS Glue data sources, Amazon S3 data sources, or external Nessie data sources.
Tables created in these sources can be queried, altered, analyzed, and dropped by users and groups with the appropriate permissions. For information about querying tables (including Apache Iceberg tables), see SELECT.
Apache Iceberg Tables
Prerequisites for Apache Iceberg Tables
Before you attempt to create Iceberg tables, ensure that you are using an AWS Glue, Amazon S3, or external Nessie data source. Also ensure that any existing Apache Iceberg tables that you want to use in your data source are at Apache Iceberg version 2.
Default Table Formats Used for New Tables
AWS Glue data sources added to projects default to using the Apache Iceberg table format.
Amazon S3 data sources added to projects default to using the Apache Parquet table format. Follow these steps to ensure that the default table format for new tables is Apache Iceberg:
- In Dremio, click the Amazon S3 data source.
- Click the gear icon in the top-right corner above the list of the data source's contents.
- On the Advanced Options page of the Edit Source dialog, select ICEBERG under Default CTAS Format.
- Click Save.
Locations in which Iceberg Tables are Created
Where the CREATE TABLE command creates a table depends on the type of data source being used.
Location in AWS Glue Data Sources
The root directory is assumed by default to be /user/hive/warehouse.
If you want to create tables in a different location, you must specify the S3 address of an Amazon S3 bucket in which to create them:
- In Dremio Cloud, click the AWS Glue data source.
- Click the gear icon in the top-right corner above the list of the data source's contents.
- On the Advanced Options page of the Edit Source dialog, add this connection property:
hive.metastore.warehouse.dir - Set the value to the S3 address of an S3 bucket.
To the root location are appended the schema path and table name to determine the default physical location for a new table. For example, this CREATE TABLE command creates the table table_A in the directory <rootdir>/database/schema/table_A
CREATE TABLE database.schema.table_A
Location in Amazon S3 Data Sources
The root physical location is the main root directory for the filesystem. From this location, the path and table name are appended to determine the physical location for a new table.
For example, this CREATE TABLE command creates the table table_A in the directory rootdir/folder1/folder2/table_A:
CREATE TABLE <Amazon_S3_data_source>.folder1.folder2.table_A
Location in Nessie Data Sources
Top-level Nessie schemas have a configurable physical storage. This is used as the default root physical location.
In the project store each top level Nessie schema has its own directory path. So for example in the project’s Nessie the top level schema “marketing” would be located in “project_store/marketing” and this directory would be used by default as the root physical location. From there, the same schema.table resolution as described for Hive above would apply.
Sources and Privileges
In order to use these SQL commands, the appropriate privileges must be set for each data source that you want to work with, including the spaces of other users in your project and the connected data lake sources. For information about setting privileges, see Privileges.
Creating a Table (CTAS)
Tables that are created using the CREATE TABLE AS (CTAS) command are in Parquet file format, which can be formatted to a table.
CREATE TABLE [IF NOT EXISTS] <table_path>
[ PARTITION BY ( <column_name1>, <column_name2>, ... ) ]
[LOCALSORT BY ( <column_name> ) ]
AS <query>
Parameters
{{< sql-section file="data/sql/tables.json" data="creatingATable" >}}
Parameters for Iceberg Tables Only
{{< sql-section file="data/sql/tables.json" data="creatingAnIcebergTable" >}}
Optional Parameters
{{< sql-section file="data/sql/tables.json" data="creatingATableOptional" >}}
Examples
The following examples use Dremio’s Samples Data Lake, which includes a number of sample datasets that you can practice with.
CREATE TABLE $scratch.demo_table AS
SELECT *
FROM Samples."samples.dremio.com"."zips.json";
CREATE TABLE $scratch.demo_table2
PARTITION BY (state)
LOCALSORT BY (city)
AS SELECT *
FROM Samples."samples.dremio.com"."zips.json";
Describing a Table
The DESCRIBE TABLE command is used to provide high-level information regarding the overall column properties of an existing dataset.
DESCRIBE TABLE <table_name>;
Example
Describing a tableDESCRIBE TABLE taxistats;
Once the query is run, Dremio displays the following:
- Column headers indicating the type of information being described
- Rows for each table column with their associated attributes described, such as data types, properties, and policies
| COLUMN_NAME | DATA_TYPE | IS_NULLABLE | NUMERIC_PRECISION | NUMERIC_SCALE | EXTENDED_PROPERTIES | MASKING_POLICY |
|---|---|---|---|---|---|---|
| pickup_datetime | TIMESTAMP | YES | null | null | [] | [] |
| passenger_count | BIGINT | YES | 64 | 0 | [] | count_hide |
| trip_distance_mi | DOUBLE | YES | 53 | null | [] | [] |
| fare_amount | DOUBLE | YES | 53 | null | [] | [] |
| tip_amount | DOUBLE | YES | 53 | null | [] | [] |
| total_amount | DOUBLE | YES | 53 | null | [] | [] |
The cells containing a '[]' indicate "empty" values.
Altering a Table
The ALTER command updates a table’s schema in a new snapshot.
ALTER TABLE <table_path>
[ ADD COLUMNS ( <column_name1> <data_type1>( <size1> ), <column_name2> <data_type2>( <size2> ), ...) ]
| [ DROP COLUMN <column_name> ]
| [ { ALTER | MODIFY } COLUMN <source_column_name> <source_column_name> <new_data_type>( <size> ) ]
| [ REFRESH METADATA ] [ FOR PARTITIONS ( <partition_name> = '<value>' ) ]
[ { AVOID | AUTO } PROMOTION ]
[ { FORCE | LAZY } UPDATE ]
[ { MAINTAIN | DELETE } WHEN MISSING ]
| [ FORGET METADATA ];
Parameters
{{< sql-section file="data/sql/tables.json" data="alteringATable" >}}
Optional Parameters
{{< sql-section file="data/sql/tables.json" data="alteringATableOptional" >}}
REFRESH METADATA
{{< api-section file="data/sql/tables.json" data="refreshOptions" >}}
Examples
The following examples use Dremio’s Samples Data Lake, which includes a number of sample datasets that you can practice with.
ALTER TABLE $scratch.demo_table ADD COLUMNS (county varchar)
ALTER TABLE $scratch.demo_table MODIFY COLUMN _id _id bigint
ALTER TABLE $scratch.demo_table REFRESH METADATA
ALTER TABLE $scratch.demo_table REFRESH METADATA
AUTO PROMOTION
LAZY UPDATE
MAINTAIN WHEN MISSING
ALTER TABLE Samples."samples.dremio.com"."zips.json"
REFRESH METADATA FOR PARTITIONS (state = 'TX')
ALTER TABLE Samples."samples.dremio.com"."zips.json"
REFRESH METADATA FOR PARTITIONS (state = 'TX')
AUTO PROMOTION
LAZY UPDATE
MAINTAIN WHEN MISSING
ALTER TABLE $scratch.demo_table FORGET METADATA
Analyzing a Table
Computes and deletes statistics for tables, including:
- Estimated number of distinct values
- Number of rows
- Number of null values
ANALYZE TABLE <table_name>
FOR { ALL COLUMNS | COLUMNS } ( <column_name1>, <column_name2>, ... )
{ COMPUTE | DELETE } STATISTICS
Parameters
{{< sql-section file="data/sql/tables.json" data="analyzingATable" >}}
Examples
Analyze a table exampleANALYZE TABLE $scratch.demo_table
FOR ALL COLUMNS
COMPUTE STATISTICS
ANALYZE TABLE $scratch.demo_table
FOR COLUMNS (state, city)
COMPUTE STATISTICS
Dropping a Table
When dropping an Apache Iceberg table:
- For external Nessie data sources, the DROP TABLE command logically removes a table from the source. Even though the table is removed from the catalog, it still physically exists in storage until garbage collection removes it.
- For Amazon S3 data sources, the DROP TABLE command logically removes a table from the source and physically removes all files associated with the table. After the table is dropped, it is permanently deleted and cannot be restored.
- For AWS Glue data sources, the DROP TABLE command removes a table from the catalog. The datafiles are not deleted from the warehouse.
DROP TABLE <table_path>
Parameters
{{< sql-section file="data/sql/tables.json" data="droppingATable" >}}
Examples
Drop table exampleDROP TABLE $scratch.demo_table