Migrate Hive tables to Iceberg in Trino

March 29, 2023

Trino version 411 introduced the migrate procedure in the Iceberg connector. This procedure converts existing Hive tables in ORC, Parquet, or Avro format to Iceberg tables. This article explains how the procedure works. If you currently use the CREATE TABLE AS SELECT statement to convert Hive tables to Iceberg, I recommend trying this procedure instead. It is much faster because it doesn’t rewrite the data files.

The procedure takes three arguments: schema_name, table_name, and an optional recursive_directory. The possible values for recursive_directory are true, false, and fail. The default is fail, which throws an exception if there are nested directories under the table or partition location.

CALL iceberg.system.migrate(
  schema_name => 'testdb', 
  table_name => 'customer_orders', 
  recursive_directory => 'true'
);

Here’s how the procedure works:

It generates an Iceberg schema object based on the Hive table definition.
It scans the table or partition location to create Iceberg metadata files:
1. Builds Iceberg metrics.
2. Builds Iceberg DataFile objects.
It updates the table definition in the metastore.

All the logic can be found in MigrateProcedure.java if you want to review the code.

Limitations

The procedure scans files sequentially. If the target table has many files, the migration may take a long time.
Work is in progress to support migrating Delta Lake tables to Iceberg: https://github.com/trinodb/trino/pull/17131. There are currently no plans to support other formats (such as Hudi).