Skip to content

Convert Existing Table to Havasu Table

By default, values in GEOMETRY columns were stored as EWKB (Extended Well-Known Binary) format in the data files. The GEOMETRY column is defined as BINARY column with some Havasu specific metadata in the table metadata, so that other implementations of Apache Iceberg could still read the data as binary values. The benefit of using GEOMETRY column to store geospatial data is that it enforces the data in the geometry column be to valid geospatial data, and Havasu will collect statistics on the geometry data in the column, which could be used to optimize spatial range queries (see Cluster by geospatial fields for faster queries).

Havasu also supports storing geometry data in other formats, such as WKT, WKB, GeoJSON, etc. Though user cannot explicitly specify the format of the geometry column to formats other than EWKB. These geometry formats are designed solely for easier data migration from already existing Iceberg tables or parquet files to Havasu. If user has an Iceberg table containing serialized geometry data, they can convert the data to GEOMETRY column without re-writing the table. Plain parquet files containing serialized geometry data can also be directly loaded into Havasu without scanning the data files. Please refer to Migrating Parquet Files to Havasu for details.

If you have an existing Iceberg table containing spatial data, you can upgrade it to use GEOMETRY column by running the following command:

ALTER TABLE <table_name> SET GEOMETRY FIELDS <column_name> AS <geometry_format>;

Supported column types and geometry formats are:

Column Type Geometry Format
BINARY wkb
BINARY ewkb
STRING wkt
STRING geojson

For example, we can upgrade a table containing WKT values in geom column to use GEOMETRY column:

ALTER TABLE wherobots.db.table_name SET GEOMETRY FIELDS wkt AS 'wkt';

This command will only attach Havasu-specific metadata to the converted column and won’t rewrite the data. The data will remain in its original format (STRING column storing WKT in this case), and any geospatial data written into this table will be serialized as WKT, which make the table backward compatible. Again, the benefit is that user get strong enforcements on the data in the geometry column. Havasu will raise an error if invalid data was encountered when reading the data, and writing plain string values into the geometry column will not be allowed. If user want to write data to the upgraded Havasu table, they need to write geometry objects into the geometry column.

INSERT INTO wherobots.db.table_name (`wkt`, `id`) VALUES (ST_GeomFromWKT('POINT (1 1)'), 1);

Havasu will write new data to migrated table in its original serialized geometry format. In this example, new geometry values written to wherobots.db.table_name will also be serialized as WKT. This is for being compatible with the original data type of the geometry column.

If you want to revert the upgrade, you can run the following command:

ALTER TABLE wherobots.db.table_name SET GEOMETRY FIELDS wkt AS 'none';

The wkt column will be converted back to STRING column, and user could write any plain string values into this column.

Note

The GEOMETRY column is converted in this way does not have any Havasu-specific spatial statistics metadata, so Havasu will not be able to optimize spatial range queries on this column. If you want Havasu to be able to optimize spatial range queries on this column, you need to rewrite the table using Iceberg rewrite_data_files procedure or CREATE SPATIAL INDEX FOR statement. Please refer to Cluster by geospatial fields for faster queries for details.


Last update: October 3, 2023 07:45:18