Skip to content

Load data from external storage

Assume we have a single raster data file called rasterData.tiff, at this path.

Use the following code to load the data and create a Sedona Dataframe.

rawDf = sedona.read.format("binaryFile").load(path_to_raster_data)
rawDf.createOrReplaceTempView("rawdf")
rawDf.show()
var rawDf = sedona.read.format("binaryFile").load(path_to_raster_data)
rawDf.createOrReplaceTempView("rawdf")
rawDf.show()
Dataset<Row> rawDf = sedona.read.format("binaryFile").load(path_to_raster_data)
rawDf.createOrReplaceTempView("rawdf")
rawDf.show()

The output will look like this:

|                path|    modificationTime|length|             content|
+--------------------+--------------------+------+--------------------+
|file:/Download/ra...|2023-09-06 16:24:...|174803|[49 49 2A 00 08 0...|

For multiple raster data files use the following code to load the data from this path and create a Sedona DataFrame.

Note

The above code works too for loading multiple raster data files. if the raster files are in separate directories and the option also makes sure that only .tif or .tiff files are being loaded.

rawDf = sedona.read.format("binaryFile").option("recursiveFileLookup", "true").option("pathGlobFilter", "*.tif*").load(path_to_raster_data_folder)
rawDf.createOrReplaceTempView("rawdf")
rawDf.show()
var rawDf = sedona.read.format("binaryFile").option("recursiveFileLookup", "true").option("pathGlobFilter", "*.tif*").load(path_to_raster_data_folder)
rawDf.createOrReplaceTempView("rawdf")
rawDf.show()
Dataset<Row> rawDf = sedona.read.format("binaryFile").option("recursiveFileLookup", "true").option("pathGlobFilter", "*.tif*").load(path_to_raster_data_folder);
rawDf.createOrReplaceTempView("rawdf");
rawDf.show();

The output will look like this:

|                path|    modificationTime|length|             content|
+--------------------+--------------------+------+--------------------+
|file:/Download/ra...|2023-09-06 16:24:...|209199|[4D 4D 00 2A 00 0...|
|file:/Download/ra...|2023-09-06 16:24:...|174803|[49 49 2A 00 08 0...|
|file:/Download/ra...|2023-09-06 16:24:...|174803|[49 49 2A 00 08 0...|
|file:/Download/ra...|2023-09-06 16:24:...|  6619|[49 49 2A 00 08 0...|

The content column in the raster table is still in the raw form, binary form.

Note

Recursive file lookup will also work similarly for raster datasets in .asc file formats.

Create a Raster type column

All raster operations in SedonaSQL require Raster type objects. Therefore, this should be the next step after loading the data.

From Geotiff

SELECT RS_FromGeoTiff(content) AS rast, modificationTime, length, path FROM rawdf

To verify this, use the following code to print the schema of the DataFrame:

rasterDf.printSchema()

The output will be like this:

root
 |-- rast: raster (nullable = true)
 |-- modificationTime: timestamp (nullable = true)
 |-- length: long (nullable = true)
 |-- path: string (nullable = true)

From Arc Grid

The raster data is loaded the same way as tiff file, but the raster data is stored with the extension .asc, ASCII format. The following code creates a Raster type objects from binary data:

SELECT RS_FromArcInfoAsciiGrid(content) AS rast, modificationTime, length, path FROM rawdf

Verify Raster Geo-referencing attributes

Metadata

SedonaSQL exposes RS_Metadata, which returns an array of metadata, containing the raster's geo-referencing attributes, dimensions, CRS and number of bands.

SELECT RS_MetaData(rast) FROM rasterDf

Output for the following function will be:

[-1.3095817809482181E7, 4021262.7487925636, 512.0, 517.0, 72.32861272132695, -72.32861272132695, 0.0, 0.0, 3857.0, 1.0]

Please refer to the function documentation for more details.

World File

There are two kinds of georeferences, GDAL and ESRI as seen in world files. For more information please refer to RS_GeoReference.

SELECT RS_GeoReference(rast, "ESRI") FROM rasterDf

The Output will be as follows:

72.328613
0.000000
0.000000
-72.328613
-13095781.64517
4021226.584486

World files are used to geo-reference and geo-locate images by establishing an image-to-world coordinate transformation that assigns real-world geographic coordinates to the pixels of the image.


Last update: February 9, 2024 03:04:11