Skip to content

Working with Raster Data

A raster is a regular grid of numeric values with geo-referencing metadata that describes the location and resolution of the grid. Raster data is commonly used to represent terrain, elevation, and satellite imagery. Havasu supports raster data type and allows users to store and query raster data. This tutorial describes how to use raster data type and raster functions in Sedona for manipulating raster data.

Create Table using CREATE TABLE Command

To create a table with raster column in Sedona, use sedona.sql(...) to run a CREATE TABLE command:

sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("CREATE TABLE wherobots.test_db.test_table (data string, rast raster)")
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("CREATE TABLE wherobots.test_db.test_table (data string, rast raster)")
sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db");
sedona.sql("CREATE TABLE wherobots.test_db.test_table (data string, rast raster)");

This will create an empty table. Notice that Havasu introduced a new data type raster to represent raster data. Data in columns with rast type will be loaded as RasterUDT values in Sedona, user can use any RT_ funtions provided by Sedona to manipulate the raster data.

Create Table using DataFrame

User can also create a table using a DataFrame. The raster column in the DataFrame will go into the raster column in the created geospatial table.

For example,

  • Load GeoTIFF files as DataFrame using sedona.read.format("binaryFile") and specify file patterns and lookup behavior through .option. (Genereic File Source Options)
  • Convert binary data to raster format using RS_FromGeoTiff(content).
  • Select relevant columns for the table. The .selectExpr method is used to select the newly created raster column "rast" and file paths as "data" from the DataFrame.
df_binary = sedona.read.format("binaryFile")\
    .option("pathGlobFilter", "*.tif")\
    .option("recursiveFileLookup", "true")\
    .load('s3a://wherobots-examples/data/eurosat_small')

df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))\
    .selectExpr("rast", "path AS data")
val df_binary = sedona.read.format("binaryFile")
    .option("pathGlobFilter", "*.tif")
    .option("recursiveFileLookup", "true")
    .load("s3a://wherobots-examples/data/eurosat_small")

val df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
    .selectExpr("rast", "path AS data")
Dataset<Row> df_binary = sedona.read().format("binaryFile")
    .option("pathGlobFilter", "*.tif")
    .option("recursiveFileLookup", "true")
    .load("s3a://wherobots-examples/data/eurosat_small");

Dataset<Row> df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
    .selectExpr("rast", "path AS data");

Then we can write this DataFrame to a Havasu table:

df_geotiff.writeTo("wherobots.test_db.test_table").create()
df_geotiff.writeTo("wherobots.test_db.test_table").create()
df_geotiff.writeTo("wherobots.test_db.test_table").create();

Havasu supports a full range of SQL DDL and DML commands which Apache Iceberg supports, such as ALTER TABLE, DROP TABLE, etc.

Writing Raster Data

Raster data in Havasu is similar to PostGIS. It provides 2 options for storing raster data:

  • in-db raster: raster data completely stored in a Havasu table.
  • out-db raster: raster data is stored in an external file (such as GeoTIFF file on S3) and the Havasu table stores the path to the file and geo-referencing metadata of that raster.

In-DB Raster

To construct in-db raster data and write them to a Havasu table, user can use RS_FromGeoTiff to construct a raster value and write the DataFrame using .writeTo(...).append()

# Read GeoTIFF files using RS_FromGeoTiff
df_binary = sedona.read.format("binaryFile")\
    .option("pathGlobFilter", "*.tif")\
    .option("recursiveFileLookup", "true")\
    .load('s3a://wherobots-examples/data/eurosat_small')
df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))\
    .selectExpr("rast", "path AS data")

# Writing the data into Havasu table
df_geotiff.writeTo("wherobots.test_db.test_table").append()
// Read GeoTIFF files using RS_FromGeoTiff
val df_binary = sedona.read.format("binaryFile")
    .option("pathGlobFilter", "*.tif")
    .option("recursiveFileLookup", "true")
    .load("s3a://wherobots-examples/data/eurosat_small")

val df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
    .selectExpr("rast", "path AS data")

// Writing the data into Havasu table
df_geotiff.writeTo("wherobots.test_db.test_table").append()
// Read GeoTIFF files using RS_FromGeoTiff
Dataset<Row> df_binary = sedona.read().format("binaryFile")
    .option("pathGlobFilter", "*.tif")
    .option("recursiveFileLookup", "true")
    .load("s3a://wherobots-examples/data/eurosat_small");

Dataset<Row> df_geotiff = df_binary.withColumn("rast", expr("RS_FromGeoTiff(content)"))
    .selectExpr("rast", "path AS data");

// Writing the data into Havasu table
df_geotiff.writeTo("wherobots.test_db.test_table").append();

Or user can first load the GeoTIFF file as out-db raster using RS_FromPath, then use RS_AsInDB to convert the out-db raster to in-db raster:

df_paths = spark.createDataFrame([
    ("s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",),
    ("s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif",)
], ["path"])

# Read GeoTIFF files as out-db rasters using RS_FromPath, then convert to in-db rasters using RS_AsInDB
df_rast = df_paths.withColumn("rast", expr("RS_AsInDB(RS_FromPath(path))"))

# Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()
val df_paths = Seq(
    "s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",
    "s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif"
).toDF("path")

// Read GeoTIFF files as out-db rasters using RS_FromPath, then convert to in-db rasters using RS_AsInDB
val df_rast = df_paths.withColumn("rast", expr("RS_AsInDB(RS_FromPath(path))"))

// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()
List<Row> data = Arrays.asList(
    RowFactory.create("s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif"),
    RowFactory.create("s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif")
);
List<StructField> fields = Arrays.asList(
    DataTypes.createStructField("path", DataTypes.StringType, true)
);
StructType schema = DataTypes.createStructType(fields);
Dataset<Row> df_paths = spark.createDataFrame(data, schema);

// Read GeoTIFF files as out-db rasters using RS_FromPath, then convert to in-db rasters using RS_AsInDB
Dataset<Row> df_rast = df_paths.withColumn("rast", expr("RS_AsInDB(RS_FromPath(path))"));

// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append();

Out-DB Raster

To construct out-db raster data and write them to a Havasu table, user can use RS_FromPath to construct a raster value and insert it into the table:

df_paths = spark.createDataFrame([
    ("s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",),
    ("s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif",)
], ["path"])

# Read GeoTIFF files as out-db rasters using RS_FromPath
df_rast = df_paths.withColumn("rast", expr("RS_FromPath(path)"))

# Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()
val df_paths = Seq(
    "s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif",
    "s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif"
).toDF("path")

// Read GeoTIFF files as out-db rasters using RS_FromPath
val df_rast = df_paths.withColumn("rast", expr("RS_FromPath(path)"))

// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append()

List<Row> data = Arrays.asList(
    RowFactory.create("s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_10.tif"),
    RowFactory.create("s3a://wherobots-examples/data/eurosat_small/AnnualCrop/AnnualCrop_11.tif")
);
List<StructField> fields = Arrays.asList(
    DataTypes.createStructField("path", DataTypes.StringType, true)
);
StructType schema = DataTypes.createStructType(fields);
Dataset<Row> df_paths = spark.createDataFrame(data, schema);

// Read GeoTIFF files as out-db rasters using RS_FromPath
Dataset<Row> df_rast = df_paths.withColumn("rast", expr("RS_FromPath(path)"));

// Writing the data into Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append();
```

Using STAC files with RS_FromPath for Out-DB Raster

This method is useful when dealing with large raster datasets stored in external storage systems like Amazon S3, as it allows the Havasu table to store only the path to the raster file along with geo-referencing metadata.

The following examples shows how to read a STAC file using the GeoJSON reader, load raster data from the STAC file and write it as an out-db raster to a Havasu table.

# Read STAC file using GeoJSON reader
df = sedona.read.format("geojson").load("stac-file-path")

# Read GeoTIFF files as out-db rasters using RS_FromPath
df_rast = df.withColumn("rast", expr("RS_FromPath(assets.analytic.href)"))

# Writing the data into Havasu table
df_rast.writeTo("wherobots.test_db.test_table").append()
// Read STAC file using GeoJSON reader
val df = sedona.read.format("geojson").load("stac-file-path")

// Read GeoTIFF files as out-db rasters using RS_FromPath
val df_rast = df.withColumn("rast", expr("RS_FromPath(assets.analytic.href)"))

// Write the out-db raster data into a Havasu table
df_rast.writeTo("wherobots.test_db.test_table").append()
// Read STAC file using GeoJSON reader
Dataset<Row> df = sedona.read.format("geojson").load("stac-file-path");

// Read GeoTIFF files as out-db rasters using RS_FromPath
Dataset<Row> df_rast = df.withColumn("rast", expr("RS_FromPath(assets.analytic.href)"));

// Write the out-db raster data into a Havasu table
df_rast.selectExpr("rast", "path AS data").writeTo("wherobots.test_db.test_table").append();

We can use RS_BandPath to show the path of the raster file referenced by an out-db raster:

df_rast.where("RS_BandPath(rast) IS NOT NULL").show()
df_rast.where("RS_BandPath(rast) IS NOT NULL").show()
df_rast.where("RS_BandPath(rast) IS NOT NULL").show();
+--------------------+--------------------+
|                path|                rast|
+--------------------+--------------------+
|s3a://wherobots-e...|OutDbGridCoverage...|
|s3a://wherobots-e...|OutDbGridCoverage...|
+--------------------+--------------------+

Havasu also supports a range of table update operations, such as UPDATE, DELETE, MERGE INTO, etc. Please refer to Apache Iceberg - Spark Writes for details.

Reading Raster Table

Havasu tables can be queried using SQL SELECT statements:

df = sedona.sql("SELECT * FROM wherobots.db_name.table_name")
df = sedona.sql("SELECT * FROM wherobots.db_name.table_name")
Dataset<Row> df = sedona.sql("SELECT * FROM wherobots.db_name.table_name");

Or using DataFrame API:

df = sedona.table("wherobots.db_name.table_name")
df = sedona.table("wherobots.db_name.table_name")
Dataset<Row> df = sedona.table("wherobots.db_name.table_name");

Havasu also supports time travel and inspecting metadata of table. Please refer to Apache Iceberg - Queries for details.

Processing Raster Data using Sedona SQL

Users can use any RT_ functions provided by Sedona to manipulate raster data read from Havasu.

# Read and manipulate geospatial data in a Havasu table
df = sedona.table("wherobots.test_db.test_table")
df_mapalg = df.withColumn("rast", expr("RS_MapAlgebra(rast, NULL, 'out = 0.5 * (rast[0] + rast[1]);')"))

# Write geospatial data back to a Havasu table
df_mapalg.writeTo("wherobots.test_db.test_table_2").create()
// Read and manipulate geospatial data in a Havasu table
val df = sedona.table("wherobots.test_db.test_table")
val df_mapalg = df.withColumn("rast", expr("RS_MapAlgebra(rast, NULL, 'out = 0.5 * (rast[0] + rast[1]);')"))

// Write geospatial data back to a Havasu table
df_mapalg.writeTo("wherobots.test_db.test_table_2").create()
// Read and manipulate geospatial data in a Havasu table
Dataset<Row> df = sedona.table("wherobots.test_db.test_table");
Dataset<Row> df_mapalg = df.withColumn("rast", expr("RS_MapAlgebra(rast, NULL, 'out = 0.5 * (rast[0] + rast[1]);')"));

// Write geospatial data back to a Havasu table
df_mapalg.writeTo("wherobots.test_db.test_table_2").create();

Further Reading

For more information, please refer to Raster Support in Havasu.


Last update: February 9, 2024 03:04:11