Skip to main content
Version: 0.0.1

Creating Apache Iceberg Tables

The CREATE TABLE command creates Apache Iceberg tables in Amazon Glue datasources, Amazon S3 datasources, or external Nessie datasources.

Prerequisites

Before you attempt to create Iceberg tables, ensure that you are using an Amazon Glue, Amazon S3, or external Nessie datasource.

Default Table Formats Used for New Tables

Beginning with Dremio v22.0:

  • Amazon Glue datasources added to projects default to using the Apache Iceberg table format.

  • Amazon S3 datasources added to projects default to using the Apache Parquet table format. Follow these steps to ensure that the default table format for new tables is Apache Iceberg:

    1. In Dremio, click the Amazon S3 datasource.
    2. Click the gear icon in the top-right corner above the list of the datasource's contents.
    3. On the Advanced Options page of the Edit Source dialog, select ICEBERG under Default CTAS Format.
    4. Click Save.

Amazon Glue datasources added to projects before this date are modified by Dremio to default to the Apache Iceberg table format.

Amazon S3 datasources added to projects before this date continue to use the Parquet table format for tables. For the SQL commands that you can use to create and query tables in such datasources, see Tables.

Locations in which Iceberg Tables are Created

Where the CREATE TABLE command creates a table depends on the type of datasource being used.

Location in Amazon Glue Datasources

The root directory is assumed by default to be /user/hive/warehouse.

If you want to create tables in a different location, you must specify the S3 address of an Amazon S3 bucket in which to create them:

  1. In Dremio, click the Amazon Glue datasource.
  2. Click the gear icon in the top-right corner above the list of the datasource's contents.
  3. On the Advanced Options page of the Edit Source dialog, add this connection property: hive.metastore.warehouse.dir
  4. Set the value to the S3 address of an S3 bucket.

To the root location are appended the schema path and table name to determine the default physical location for a new table. For example, this CREATE TABLE command creates the table table_A in the directory <rootdir>/database/schema/table_A

{{< codeheader "CREATE TABLE example" >}}

CREATE TABLE database.schema.table_A

Location in Amazon S3 Datasources

The root physical location is the main root directory for the filesystem. From this location, the path and table name are appended to determine the physical location for a new table.

For example, this CREATE TABLE command creates the table table_A in the directory rootdir/folder1/folder2/table_A:

{{< codeheader "CREATE TABLE example" >}}

CREATE TABLE <Amazon_S3_datasource>.folder1.folder2.table_A

Location in Nessie Datasources

Top-level Nessie schemas have a configurable physical storage. This is used as the default root physical location.

In the project store each top level Nessie schema has its own directory path. So for example in the project’s Nessie the top level schema “marketing” would be located in “project_store/marketing” and this directory would be used by default as the root physical location. From there, the same schema.table resolution as described for Hive above would apply.

Syntax

{{< codeheader "CREATE TABLE syntax" >}}

CREATE TABLE [IF NOT EXISTS] <table_path>.<table_name>
( <column_spec [, ...]> )
[ PARTITION BY ( [<column_name>|<partition_transform>] [ , ... ] ) ]
[ LOCALSORT BY (<column_name>) ]

Parameters

{{< sql-section file="data/sql/apache-iceberg-tables.json" data="createTable" >}}

Examples

{{< codeheader "Creating a table as SELECT * from another table" >}}

create table myAmazonS3Source.myFolder.myTable as select * from myAmazonS3Source.anotherFolder.anotherTable

{{< codeheader "Creating a table and partitioning it by month" >}}

create table myTable (col1 int, col2 date) partition by (month(col2))