Skip to content

Save geospatial data

Save as txt files

To save a Spatial DataFrame to some permanent storage such as Hive tables and HDFS, you can simply convert each geometry in the Geometry type column back to a plain String and save the plain DataFrame to wherever you want.

Use the following code to convert the Geometry column in a DataFrame back to a WKT string column:

SELECT ST_AsText(countyshape)
FROM polygondf

Then you can use any Spark writer to save this DataFrame.

df.write.format("YOUR_FORMAT").save("YOUR_PATH")

Note

SedonaSQL provides lots of functions to save the Geometry column, please read SedonaSQL API.

Save as GeoParquet

Sedona can directly save a DataFrame with the Geometry column as a GeoParquet file. You need to specify geoparquet as the write format. The Geometry type will be preserved in the GeoParquet file.

df.write.format("geoparquet").save(geoparquetoutputlocation + "/GeoParquet_File_Name.parquet")

To maximize the performance of Sedona GeoParquet filter pushdown, we suggest that you sort the data by their geohash values (see ST_GeoHash) and then save as a GeoParquet file. An example is as follows:

SELECT col1, col2, geom, ST_GeoHash(geom, 5) as geohash
FROM spatialDf
ORDER BY geohash

Save as Single Line GeoJSON Features

Sedona can save a Spatial DataFrame to a single-line GeoJSON feature file.

Adapter.toSpatialRdd(df, "geometry").saveAsGeoJSON("YOUR/PATH.json")
from sedona.utils.adapter import Adapter

Adapter.toSpatialRdd(df, "geometry").saveAsGeoJSON("YOUR/PATH.json")

The structure of the generated file will be like this:

{"type":"Feature","geometry":{"type":"Point","coordinates":[102.0,0.5]},"properties":{"prop0":"value0"}}
{"type":"Feature","geometry":{"type":"LineString","coordinates":[[102.0,0.0],[103.0,1.0],[104.0,0.0],[105.0,1.0]]},"properties":{"prop0":"value1"}}
{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[100.0,0.0],[101.0,0.0],[101.0,1.0],[100.0,1.0],[100.0,0.0]]]},"properties":{"prop0":"value2"}}

Save to PostGIS

Unfortunately, the Spark SQL JDBC data source doesn't support creating geometry types in PostGIS using the 'createTableColumnTypes' option. Only the Spark built-in types are recognized. This means that you'll need to manage your PostGIS schema separately from Spark. One way to do this is to create the table with the correct geometry column before writing data to it with Spark. Alternatively, you can write your data to the table using Spark and then manually alter the column to be a geometry type afterward.

Postgis uses EWKB to serialize geometries. If you convert your geometries to EWKB format in Sedona you don't have to do any additional conversion in Postgis.

Step 1: In PostGIS

my_postgis_db# create table my_table (id int8, geom geometry);

Step 2: In Spark

df.withColumn("geom", expr("ST_AsEWKB(geom)")
 .write.format("jdbc")
 .option("truncate","true") // Don't let Spark recreate the table.
 // Other options.
 .save()

Step 3 (optional): In PostGIS

If you didn't create the table before writing you can change the type afterward.

my_postgis_db# alter table my_table alter column geom type geometry;

Save to GeoPandas

Sedona DataFrame can be directly converted to a GeoPandas DataFrame.

import geopandas as gpd

df = spatialDf.toPandas()
gdf = gpd.GeoDataFrame(df, geometry="geometry")

You can then plot the GeoPandas DataFrame using many tools in the GeoPandas ecosystem.

gdf.plot(
    figsize=(10, 8),
    column="value",
    legend=True,
    cmap='YlOrBr',
    scheme='quantiles',
    edgecolor='lightgray'
)

Last update: August 30, 2023 18:56:20