Getting started

SedonaSQL provides various SQL APIs to work with vector and raster data. For vector data, SedonaSQL follows SQL/MM Part3 Spatial SQL Standard.

Create SedonaContext¶

To start using SedonaDB, we need to launch a SedonaContext in Wherobots cloud:

PythonScalaJava

from sedona.spark import SedonaContext

config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)

import org.apache.sedona.spark.SedonaContext

val config = SedonaContext.builder.getOrCreate()
val sedona = SedonaContext(config)

import org.apache.spark.sql.SparkSession;
import org.apache.sedona.spark.SedonaContext;

SparkSession config = SedonaContext.builder().getOrCreate();
SparkSession sedona = SedonaContext.create(config);

All the SQL APIs can be accessed via:

PythonScalaJava

myDataFrame = sedona.sql("YOUR_SQL")
myDataFrame.createOrReplaceTempView("spatialDf")

var myDataFrame = sedona.sql("YOUR_SQL")
myDataFrame.createOrReplaceTempView("spatialDf")

Dataset<Row> myDataFrame = sedona.sql("YOUR_SQL")
myDataFrame.createOrReplaceTempView("spatialDf")

Alternatively, expr and selectExpr can be used:

myDataFrame.withColumn("geometry", expr("ST_*")).selectExpr("ST_*")

myDataFrame.withColumn("raster", expr("RS_*")).selectExpr("RS_*")

Detailed SedonaSQL APIs for vector and raster can be found at Vector SQL APIs and Raster SQL APIs respectively. References of Havasu table format can be found at Havasu APIs

Quick start¶

After starting SedonaContext, users can start working with SedonaDB on Wherobots cloud without any setup.

Create a SedonaDB Havasu table¶

Wherobots cloud provides a set of default configurations for the users to store data under the S3 prefix allocated for the users. Simply use wherobots as the catalog name when creating a Havasu table.

PythonScalaJava

sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("""
CREATE TABLE IF NOT EXISTS wherobots.test_db.taxi (
  pickup GEOMETRY,
  Trip_Pickup_DateTime STRING,
  Payment_Type STRING,
  Fare_Amt DECIMAL
) USING havasu.iceberg
""")

sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db")
sedona.sql("""
CREATE TABLE IF NOT EXISTS wherobots.test_db.taxi (
  pickup GEOMETRY,
  Trip_Pickup_DateTime STRING,
  Payment_Type STRING,
  Fare_Amt DECIMAL
)

sedona.sql("CREATE DATABASE IF NOT EXISTS wherobots.test_db");
sedona.sql(
  "CREATE TABLE IF NOT EXISTS wherobots.test_db.taxi (" +
  "pickup GEOMETRY," +
  "Trip_Pickup_DateTime STRING," +
  "Payment_Type STRING," +
  "Fare_Amt DECIMAL" +
")");

Read data from external storage¶

User can ingest and query data using SedonaDB. For example, users can load an example CSV file:

PythonScalaJava

taxidf = sedona.read.format('csv').option("header","true").option("delimiter", ",").load("s3a://wherobots-examples/data/nyc-taxi-data.csv")
taxidf = taxidf.selectExpr('ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup', 'Trip_Pickup_DateTime', 'Payment_Type', 'CAST(Fare_Amt AS DECIMAL)')
taxidf = taxidf.filter(col("pickup").isNotNull())

val taxidf = sedona.read.format("csv").option("header","true").option("delimiter", ",").load("s3a://wherobots-examples/data/nyc-taxi-data.csv")
val taxidf = taxidf.selectExpr("ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup", "Trip_Pickup_DateTime", "Payment_Type", "CAST(Fare_Amt AS DECIMAL)")
val taxidf = taxidf.filter(col("pickup").isNotNull)

Dataset<Row> taxidf = sedona.read().format("csv").option("header","true").option("delimiter", ",").load("s3a://wherobots-examples/data/nyc-taxi-data.csv");
Dataset<Row> taxidf = taxidf.selectExpr("ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup", "Trip_Pickup_DateTime", "Payment_Type", "CAST(Fare_Amt AS DECIMAL)");
Dataset<Row> taxidf = taxidf.filter(col("pickup").isNotNull());

Insert data to a SedonaDB Havasu table¶

User can insert the data above to an existing SedonaDB table.

PythonScalaJava

taxidf.writeTo("wherobots.test_db.taxi").append()

taxidf.writeTo("wherobots.test_db.taxi").append()

taxidf.writeTo("wherobots.test_db.taxi").append();

Read from a SedonaDB Havasu table¶

User can immediately query the data stored in Havasu table.

PythonScalaJava

df_result = sedona.sql("SELECT ST_Buffer(pickup, 0.001) pickup_buf FROM wherobots.test_db.taxi WHERE ST_Within(pickup, ST_PolygonFromEnvelope(-74, 40, -73, 41))")
df_result.show()

val df_result = sedona.sql("SELECT ST_Buffer(pickup, 0.001) pickup_buf FROM wherobots.test_db.taxi WHERE ST_Within(pickup, ST_PolygonFromEnvelope(-74, 40, -73, 41))")
df_result.show()

Dataset<Row> df_result = sedona.sql("SELECT ST_Buffer(pickup, 0.001) pickup_buf FROM wherobots.test_db.taxi WHERE ST_Within(pickup, ST_PolygonFromEnvelope(-74, 40, -73, 41))");
df_result.show();

Limitations¶

Concurrent Writing: Currently, Havasu stores catalog data on S3, which is not safe for concurrent writing. There is possibility that concurrent writing could overwrite each other's changes. We are working on a solution to address this issue.
MOR Tables: Havasu does not support MOR tables. We will add MOR support to Havasu in the future.
File Formats: Havasu only supports storing geometry or raster values in parquet files. Other formats such as ORC or Avro are not supported.

Last update: February 9, 2024 03:04:11