Skip to content

Raster Support in Havasu

Havasu supports storing raster data as in-db or out-db rasters, and allows users to use raster functions in Sedona for manipulating raster data. This document describes how to use raster data type and raster functions in Havasu.

Raster as a primitive type

Besides the primitive types supported by Apache Iceberg, Havasu introduced a new data type raster to represent raster data. For instance, user can create a table with a raster column using SQL:

sedona.sql("""
CREATE TABLE IF NOT EXISTS wherobots.test_db.test_table (
    id bigint,
    data string,
    rast raster
)""")
sedona.sql("""
CREATE TABLE IF NOT EXISTS wherobots.test_db.test_table (
    id bigint,
    data string,
    rast raster
)""")
sedona.sql(
    "CREATE TABLE IF NOT EXISTS wherobots.test_db.test_table (" +
    "id bigint," +
    "data string," +
    "rast raster" +
    ")"
);

This will create an empty table with a raster column rast. We can inspect the table schema using DESCRIBE command:

sedona.sql("DESCRIBE TABLE wherobots.test_db.test_table").show()
sedona.sql("DESCRIBE TABLE wherobots.test_db.test_table").show()
sedona.sql("DESCRIBE TABLE wherobots.test_db.test_table").show();
+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
|      id|   bigint|   null|
|    data|   string|   null|
|    rast|   raster|   null|
+--------+---------+-------+

Or using .printSchema() function:

sedona.table("wherobots.test_db.test_table").printSchema()
sedona.table("wherobots.test_db.test_table").printSchema()
sedona.table("wherobots.test_db.test_table").printSchema();
root
 |-- id: long (nullable = true)
 |-- data: string (nullable = true)
 |-- rast: raster (nullable = true)

Creating table with raster column

As mentioned above, user can create a Havasu table using SQL:

sedona.sql("""
CREATE TABLE IF NOT EXISTS wherobots.test_db.test_table (
    id bigint,
    data string,
    rast raster
)""")
sedona.sql("""
CREATE TABLE IF NOT EXISTS wherobots.test_db.test_table (
    id bigint,
    data string,
    rast raster
)""")
sedona.sql(
    "CREATE TABLE IF NOT EXISTS wherobots.test_db.test_table (" +
    "id bigint," +
    "data string," +
    "rast raster" +
    ")"
);

If user has a DataFrame containing raster data, they can also create a Havasu table using the DataFrame:

# Create a DataFrame containing raster data
df_binary = sedona.read.format("binaryFile")\
    .option("pathGlobFilter", "*.tif")\
    .option("recursiveFileLookup", "true")\
    .load('s3a://wherobots-examples/data/eurosat_small')
df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))\
    .selectExpr("rast", "path AS data")

# Create a Havasu table using the DataFrame
df_geotiff.writeTo("wherobots.test_db.test_table").create()
// Create a DataFrame containing raster data
val df_binary = sedona.read.format("binaryFile")
    .option("pathGlobFilter", "*.tif")
    .option("recursiveFileLookup", "true")
    .load("s3a://wherobots-examples/data/eurosat_small")

val df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
    .selectExpr("rast", "path AS data")

// Create a Havasu table using the DataFrame
df_geotiff.writeTo("wherobots.test_db.test_table").create()
// Create a DataFrame containing raster data
Dataset<Row> df_binary = sedona.read().format("binaryFile")
    .option("pathGlobFilter", "*.tif")
    .option("recursiveFileLookup", "true")
    .load("s3a://wherobots-examples/data/eurosat_small");

Dataset<Row> df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
    .selectExpr("rast", "path AS data");

// Create a Havasu table using the DataFrame
df_geotiff.writeTo("wherobots.test_db.test_table").create();

Writing data

Writing DataFrame to Havasu table

Users can write a DataFrame containing geometry data to a Havasu table:

df_geotiff.writeTo("wherobots.test_db.test_table").append()
df_geotiff.writeTo("wherobots.test_db.test_table").append()
df_geotiff.writeTo("wherobots.test_db.test_table").append();

ACID Transactions

Havasu supports all table update statements supported by Apache Iceberg, such as UPDATE, DELETE and MERGE INTO. The syntax is identical to the open source Apache Iceberg, please refer to Apache Iceberg - Write with SQL for more information.

Querying data

User can load data from a Havasu table using sedona.table(...):

df = sedona.table("wherobots.test_db.test_table")
val df = sedona.table("wherobots.test_db.test_table")
Dataset<Row> df = sedona.table("wherobots.test_db.test_table");

You can apply some configurations for reading the table, such as the split size if you want to read the table into a DataFrame with more partitions:

df = sedona.read.option('split-size', '1000').table("wherobots.test_db.test_table")
val df = sedona.read.option('split-size', '1000').table("wherobots.test_db.test_table")
Dataset<Row> df = sedona.read().option('split-size', '1000').table("wherobots.test_db.test_table");

You can run spatial range query on the table using Sedona SQL:

df = sedona.sql("""
SELECT * FROM wherobots.test_db.test_table
WHERE RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))
""")
val df = sedona.sql("""
SELECT * FROM wherobots.test_db.test_table
WHERE RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))
""")
Dataset<Row> df = sedona.sql(
    "SELECT * FROM wherobots.test_db.test_table " +
    "WHERE RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))"
);

Or using the DataFrame API:

df = sedona.table("wherobots.test_db.test_table")\
    .where("RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))")
val df = sedona.table("wherobots.test_db.test_table")
    .where("RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))")
Dataset<Row> df = sedona.table("wherobots.test_db.test_table")
    .where("RS_Intersects(rast, ST_GeomFromText('POLYGON ((0 0, 0 2, 2 2, 2 0, 0 0))'))");

Users can also load a Havasu table by specifying the name of the data source explicitly using .format("havasu.iceberg"), this will load an isolated table reference that will not automatically refresh tables used by queries.

df = sedona.read.format("havasu.iceberg").load("wherobots.test_db.test_table")
val df = sedona.read.format("havasu.iceberg").load("wherobots.test_db.test_table")
Dataset<Row> df = sedona.read().format("havasu.iceberg").load("wherobots.test_db.test_table");

Havasu is capable of optimizing raster spatial range queries using data skipping. This feature allows user to skip reading data files that don’t contain data that satisfy the spatial filter. Please refer to Spatial Filter Push-down for Rasters for more information.

Working with raster data

Data in columns with raster type will be loaded as RasterUDT values in Sedona, user can use any RS_ funtions provided by Sedona to manipulate the raster data. For example, user can use RS_MapAlgebra to create a new raster by applying a map algebra expression to the raster column:

sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM wherobots.test_db.test_table").show()
sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM wherobots.test_db.test_table").show()
sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM wherobots.test_db.test_table").show();

The resulting DataFrame can also be written back to a Havasu table using a CTAS statement:

sedona.sql("""
CREATE TABLE wherobots.test_db.test_table_mapalg AS
SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM wherobots.test_db.test_table
""")
sedona.sql("""
CREATE TABLE wherobots.test_db.test_table_mapalg AS
SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM wherobots.test_db.test_table
""")
sedona.sql(
    "CREATE TABLE wherobots.test_db.test_table_mapalg AS " +
    "SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM wherobots.test_db.test_table"
);

Or simply call writeTo function on the resulting DataFrame of the query:

sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM wherobots.test_db.test_table")\
    .writeTo("wherobots.test_db.test_table_mapalg").create()
sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM wherobots.test_db.test_table")
    .writeTo("wherobots.test_db.test_table_mapalg").create()
sedona.sql("SELECT RS_MapAlgebra(rast, NULL, 'out = rast[0] * 0.5;') FROM wherobots.test_db.test_table")
    .writeTo("wherobots.test_db.test_table_mapalg").create()

Further reading

Havasu is based on Apache Iceberg, all features of Apache Iceberg except MOR tables are supported in Sedona-DB. Please refer to Apache Iceberg documentation for Spark for more information.

Out-db rasters usually requires tuning to achieve good performance. To learn more about out-db rasters, please refer to Out-DB Rasters.

Havasu supports spatial filter push-down for raster tables. Please refer to Spatial Filter Push-down for Rasters to learn how to take advantage of this feature.


Last update: February 9, 2024 03:04:11