Skip to content

Raster Loaders

Note

Sedona loader are available in Scala, Java and Python and have the same APIs.

Load any raster to Raster format

The raster loader of Sedona leverages Spark built-in binary data source and works with several RS constructors to produce Raster type. Each raster is a row in the resulting DataFrame and stored in a Raster format.

By default, these functions uses lon/lat order.

Load raster to a binary DataFrame

You can load any type of raster data using the code below. Then use the RS constructors below to create a Raster DataFrame.

sedona.read.format("binaryFile").load("/some/path/*.asc")

RS_FromArcInfoAsciiGrid

Introduction: Returns a raster geometry from an Arc Info Ascii Grid file.

Format: RS_FromArcInfoAsciiGrid(asc: ARRAY[Byte])

SQL example:

var df = sedona.read.format("binaryFile").load("/some/path/*.asc")
df = df.withColumn("raster", f.expr("RS_FromArcInfoAsciiGrid(content)"))

RS_FromGeoTiff

Introduction: Returns a raster geometry from a GeoTiff file.

Format: RS_FromGeoTiff(asc: ARRAY[Byte])

SQL example:

var df = sedona.read.format("binaryFile").load("/some/path/*.tiff")
df = df.withColumn("raster", f.expr("RS_FromGeoTiff(content)"))

RS_MakeEmptyRaster

Introduction: Returns an empty raster geometry. Every band in the raster is initialized to 0.0.

Format:

RS_MakeEmptyRaster(numBands: Integer, bandDataType: String = 'D', width: Integer, height: Integer, upperleftX: Double, upperleftY: Double, cellSize: Double)
  • NumBands: The number of bands in the raster. If not specified, the raster will have a single band.
  • BandDataType: Optional parameter specifying the data types of all the bands in the created raster. Accepts one of:
    1. "D" - 64 bits Double
    2. "F" - 32 bits Float
    3. "I" - 32 bits signed Integer
    4. "S" - 16 bits signed Short
    5. "US" - 16 bits unsigned Short
    6. "B" - 8 bits Byte
  • Width: The width of the raster in pixels.
  • Height: The height of the raster in pixels.
  • UpperleftX: The X coordinate of the upper left corner of the raster, in terms of the CRS units.
  • UpperleftY: The Y coordinate of the upper left corner of the raster, in terms of the CRS units.
  • Cell Size (pixel size): The size of the cells in the raster, in terms of the CRS units.

It uses the default Cartesian coordinate system.

Format:

RS_MakeEmptyRaster(numBands: Integer, bandDataType: String = 'D', width: Integer, height: Integer, upperleftX: Double, upperleftY: Double, scaleX: Double, scaleY: Double, skewX: Double, skewY: Double, srid: Integer)
  • NumBands: The number of bands in the raster. If not specified, the raster will have a single band.
  • BandDataType: Optional parameter specifying the data types of all the bands in the created raster. Accepts one of:
    1. "D" - 64 bits Double
    2. "F" - 32 bits Float
    3. "I" - 32 bits signed Integer
    4. "S" - 16 bits signed Short
    5. "US" - 16 bits unsigned Short
    6. "B" - 8 bits Byte
  • Width: The width of the raster in pixels.
  • Height: The height of the raster in pixels.
  • UpperleftX: The X coordinate of the upper left corner of the raster, in terms of the CRS units.
  • UpperleftY: The Y coordinate of the upper left corner of the raster, in terms of the CRS units.
  • ScaleX (pixel size on X): The size of the cells on the X axis, in terms of the CRS units.
  • ScaleY (pixel size on Y): The size of the cells on the Y axis, in terms of the CRS units.
  • SkewX: The skew of the raster on the X axis, in terms of the CRS units.
  • SkewY: The skew of the raster on the Y axis, in terms of the CRS units.
  • SRID: The SRID of the raster. Use 0 if you want to use the default Cartesian coordinate system. Use 4326 if you want to use WGS84.

Note

If any other value than the accepted values for the bandDataType is provided, RS_MakeEmptyRaster defaults to double as the data type for the raster.

SQL example 1 (with 2 bands):

SELECT RS_MakeEmptyRaster(2, 10, 10, 0.0, 0.0, 1.0)

Output:

+--------------------------------------------+
|rs_makeemptyraster(2, 10, 10, 0.0, 0.0, 1.0)|
+--------------------------------------------+
|                        GridCoverage2D["g...|
+--------------------------------------------+

SQL example 2 (with 2 bands and dataType):

SELECT RS_MakeEmptyRaster(2, 'I', 10, 10, 0.0, 0.0, 1.0) - Create a raster with integer datatype

Output:

+--------------------------------------------+
|rs_makeemptyraster(2, 10, 10, 0.0, 0.0, 1.0)|
+--------------------------------------------+
|                        GridCoverage2D["g...|
+--------------------------------------------+

SQL example 3 (with 2 bands, scale, skew, and SRID):

SELECT RS_MakeEmptyRaster(2, 10, 10, 0.0, 0.0, 1.0, -1.0, 0.0, 0.0, 4326)

Output:

+------------------------------------------------------------------+
|rs_makeemptyraster(2, 10, 10, 0.0, 0.0, 1.0, -1.0, 0.0, 0.0, 4326)|
+------------------------------------------------------------------+
|                                              GridCoverage2D["g...|
+------------------------------------------------------------------+

SQL example 4 (with 2 bands, scale, skew, and SRID):

SELECT RS_MakeEmptyRaster(2, 'F', 10, 10, 0.0, 0.0, 1.0, -1.0, 0.0, 0.0, 4326) - Create a raster with float datatype

Output:

+------------------------------------------------------------------+
|rs_makeemptyraster(2, 10, 10, 0.0, 0.0, 1.0, -1.0, 0.0, 0.0, 4326)|
+------------------------------------------------------------------+
|                                              GridCoverage2D["g...|
+------------------------------------------------------------------+

RS_FromPath

Introduction: Creates an out-of-database (out-db) raster from a remote file path, typically used for managing large raster datasets stored externally, such as on cloud storage platforms.

Ideal for scenarios requiring efficient handling of remote raster data, allowing the database to store only the path and metadata, significantly reducing storage requirements.

Format:

RS_FromPath(path: String)

SQL Example:

SELECT path, RS_FromPath(path) AS raster_outdb FROM Table

Output:

+----------------------+------------------------------------------------------------+
|path                  |raster_outdb                                                |
+----------------------+------------------------------------------------------------+
|/Users/.../test1.tiff |OutDbGridCoverage2D["", GeneralEnvelope[(-1.3095817809482...|
|/Users/.../test2.tiff |OutDbGridCoverage2D["", GeneralEnvelope[(-1.3095817809482...|
|/Users/.../test3.tiff |OutDbGridCoverage2D["", GeneralEnvelope[(382240.0, 615266...|
|/Users/.../test4.tiff |OutDbGridCoverage2D["", GeneralEnvelope[(-180.0, -90.0), ...|
|/Users/.../test5.tiff |OutDbGridCoverage2D["", GeneralEnvelope[(223586.236519645...|
+----------------------+------------------------------------------------------------+

RS_AsInDB

Introduction: Converts an out-of-database (out-db) raster to an in-database (in-db) raster, facilitating raster data management within the database.

This function is useful for scenarios where raster data initially stored outside the database needs to be managed within the database, enhancing data integrity and access efficiency.

Format:

RS_AsInDB(raster: Raster)

SQL example:

SELECT path, raster_outdb, RS_AsInDB(raster_outdb) As raster FROM Table

Output:

+----------------------+------------------------------------------------------------+-------------------------------------------------------+
|path                  |raster_outdb                                                |raster                                                 |
+----------------------+------------------------------------------------------------+-------------------------------------------------------+
|/Users/.../test1.tiff |OutDbGridCoverage2D["", GeneralEnvelope[(-1.3095817809482...|GridCoverage2D["", GeneralEnvelope[(-1.3095817809482...|
|/Users/.../test2.tiff |OutDbGridCoverage2D["", GeneralEnvelope[(-1.3095817809482...|GridCoverage2D["", GeneralEnvelope[(-1.3095817809482...|
|/Users/.../test3.tiff |OutDbGridCoverage2D["", GeneralEnvelope[(382240.0, 615266...|GridCoverage2D["", GeneralEnvelope[(382240.0, 615266...|
|/Users/.../test4.tiff |OutDbGridCoverage2D["", GeneralEnvelope[(-180.0, -90.0), ...|GridCoverage2D["", GeneralEnvelope[(-180.0, -90.0), ...|
|/Users/.../test5.tiff |OutDbGridCoverage2D["", GeneralEnvelope[(223586.236519645...|GridCoverage2D["", GeneralEnvelope[(223586.236519645...|
+----------------------+------------------------------------------------------------+-------------------------------------------------------+

RS_BandPath

Introduction: Retrieves the file path of an out-of-database (out-db) raster, providing a link to the external raster file it references. Primarily used with out-db rasters to access their storage location.

Useful in scenarios involving out-db rasters, where only the raster path and geo-referencing metadata are stored in the database.

Format:

RS_BandPath(raster: Raster)

SQL Example:

SELECT raster_outdb, RS_BandPath(raster_outdb) AS band_path FROM Table

Output:

+------------------------------------------------------------+----------------------+
|raster_outdb                                                |band_path             |
+------------------------------------------------------------+----------------------+
|OutDbGridCoverage2D["", GeneralEnvelope[(-1.3095817809482...|/Users/.../test1.tiff |
|OutDbGridCoverage2D["", GeneralEnvelope[(-1.3095817809482...|/Users/.../test2.tiff |
|OutDbGridCoverage2D["", GeneralEnvelope[(382240.0, 615266...|/Users/.../test3.tiff |
|OutDbGridCoverage2D["", GeneralEnvelope[(-180.0, -90.0), ...|/Users/.../test4.tiff |
|OutDbGridCoverage2D["", GeneralEnvelope[(223586.236519645...|/Users/.../test5.tiff |
+------------------------------------------------------------+----------------------|

RS_FromNetCDF

Introduction: Returns a raster geometry representing the given record variable short name from a NetCDF file. This API reads the array data of the record variable in memory along with all its dimensions Since the netCDF format has many variants, the reader might not work for your test case, if that is so, please report this using the public forums.

This API has been tested for netCDF classic (NetCDF 1, 2, 5) and netCDF4/HDF5 files.

This API requires the name of the record variable. It is assumed that a variable of the given name exists, and its last 2 dimensions are 'lat' and 'lon' dimensions respectively.

If this assumption does not hold true for your case, you can choose to pass the lonDimensionName and latDimensionName explicitly.

You can use RS_NetCDFInfo to get the details of the passed netCDF file (variables and its dimensions).

Format 1: RS_FromNetCDF(netCDF: ARRAY[Byte], recordVariableName: String)

Format 2: RS_FromNetCDF(netCDF: ARRAY[Byte], recordVariableName: String, lonDimensionName: String, latDimensionName: String)

SQL Example:

val df = sedona.read.format("binaryFile").load("/some/path/test.nc")
df = df.withColumn("raster", f.expr("RS_FromNetCDF(content, 'O3')"))
val df = sedona.read.format("binaryFile").load("/some/path/test.nc")
df = df.withColumn("raster", f.expr("RS_FromNetCDF(content, 'O3', 'lon', 'lat')"))

RS_NetCDFInfo

Introduction: Returns a string containing names of the variables in a given netCDF file along with its dimensions.

Format: RS_NetCDFInfo(netCDF: ARRAY[Byte])

SQL Example:

val df = sedona.read.format("binaryFile").load("/some/path/test.nc")
recordInfo = df.selectExpr("RS_NetCDFInfo(content) as record_info").first().getString(0)
print(recordInfo)

Output:

O3(time=2, z=2, lat=48, lon=80)

NO2(time=2, z=2, lat=48, lon=80)

Last update: January 8, 2024 17:42:15