COPY INTO <table>
You can load data from CSV or JSON files that are in a source into an existing Apache Iceberg table.
For information about how the COPY INTO <table> operation works, as well as the supported storage locations and requirements, see Copying Data Into Apache Iceberg Tables.
Syntax
SyntaxCOPY INTO <table_name>
FROM '@<storage_location_name>[/<path>[/<file_name>] ]'
[ FILES ( '<file_name>' [ , ... ] ) |
REGEX '<regex_pattern>' ]
[ FILE_FORMAT 'csv' | 'json' ]
[ ( [csv_format_options] | [json_format_options] ) ]
DATE_FORMAT '<string>' | TIME_FORMAT '<string>' | TIMESTAMP_FORMAT '<string>' | TRIM_SPACE [ '<boolean>' ] | NULL_IF ( '<string>' [, ...] ) | RECORD_DELIMITER '<character>' | FIELD_DELIMITER '<character>' | QUOTE_CHAR '<character>' | ESCAPE_CHAR '<escape_character>' | EMPTY_AS_NULL [ '<boolean>' ] [, ...]
DATE_FORMAT '<string>' | TIME_FORMAT '<string>' | TIMESTAMP_FORMAT '<string>' | TRIM_SPACE [ '<boolean>' ] | NULL_IF ( '<string>' [, ...] ) [, ...]
Parameters
{{< sql-section file="data/sql/copy-into-cloud.json" data="mainSyntax" >}}
CSV Format Options
{{< sql-section file="data/sql/copy-into-cloud.json" data="csvFormatOptions" >}}
JSON Format Options
{{< sql-section file="data/sql/copy-into-cloud.json" data="jsonFormatOptions" >}}
Type Coercion
| Source Data Type | Target Dremio Data Type |
|---|---|
| String | BIGINT, BOOLEAN, DATE, DECIMAL, DOUBLE, FLOAT, INT, TIME, TIMESTAMP, VARCHAR |
| Integer | BIGINT, DECIMAL, DOUBLE, FLOAT, INT, VARCHAR |
| Floating-point numbers | DECIMAL, DOUBLE, FLOAT, VARCHAR |
| Boolean | BOOLEAN, VARCHAR |
| Object (JSON only) | STRUCT |
| Array (JSON only) | LIST |
Although the TIMESTAMP data type is supported as a target schema data type, TIMESTAMP with time zone is not supported.
Output
The command returns this output: | Column name | Description | | --- | --- | | Rows Inserted | The number of rows loaded from the source data files. |
Examples
Copy all files present in a folderCOPY INTO context.MyTable
FROM '@SOURCE/bucket/path/folder/'
FILE_FORMAT 'json'
COPY INTO context.myTable
FROM '@SOURCE/bucket/path/folder/file1.json'
COPY INTO context.myTable
FROM '@SOURCE/bucket/path/folder'
REGEX '.*.csv'
COPY INTO 'context.myTable'
FROM '@SOURCE/bucket/path/folder'
REGEX '^2020-11-1[2-3]/.*/'
FILE_FORMAT 'json'
COPY INTO context.myTable
FROM '@SOURCE/bucket/path/folder'
REGEX 'part.*'
FILE_FORMAT 'json'
COPY INTO context.myTable
FROM '@SOURCE/bucket/path/folder'
FILES ('foo.csv', 'dir/bar.csv')
COPY INTO context.myTable
FROM '@SOURCE/bucket/path/folder'
FILES ('date_time_data_file1.csv', 'date_time_data_file2.csv')
(DATE_FORMAT 'DD-MM-YYYY', TIME_FORMAT 'HH24:MI:SS')
COPY INTO context.myTable
FROM '@SOURCE/bucket/path/folder'
FILES ('file1.csv', 'file2.csv', 'file3.csv')
(NULL_IF ('None', 'NA'))
COPY INTO context.myTable
FROM '@SOURCE/bucket/path/folder'
(TRIM_SPACE 'true', EMPTY_AS_NULL 'true')
COPY INTO context.myTable
FROM '@SOURCE/bucket/path/folder'
FILE_FORMAT 'csv'
(RECORD_DELIMITER '\n', FIELD_DELIMITER '\t')
COPY INTO context.myTable
FROM '@SOURCE/bucket/path/folder'
FILES ('fileName.csv')
(ESCAPE_CHAR '|')