Skip to content

Wherobots open data

Wherobots collects open datasets from various data sources, then cleans and transforms them to Havasu format to enable linking enterprise data to the real physical world.

All datasets are provided for free (except AWS data transfer fee). Certain datasets are only accessible by our Pro Edition users. If you are interested in upgrading your plan, please contact us.

Open data catalog

Dataset name Availability in Wherobots Type Count Description
Overture Maps buildings/building Community edition Polygon 785 million Any human-made structures with roofs or interior spaces
Overture Maps places/place Community edition Point 59 million Any business or point of interest within the world
Overture Maps admins/administrativeBoundary Community edition LineString 96 thousand Any officially defined border between two Administrative Localities
Overture Maps admins/locality Community edition Point 2948 Countries and hierarchical subdivisions of countries
Overture Maps transportation/connector Community edition Point 330 million Points of physical connection between two or more segments
Overture Maps transportation/segment Community edition LineString 294 million Center-line of a path which may be traveled
Google & Microsoft open buildings Professional edition Polygon 2.5 billion Google & Microsoft Open Buildings, combined by VIDA
LandSAT surface temperature Professional edition Raster (GeoTiff) 166K images, 10 TB size The temperature of the Earth's surface in Kelvin, from Aug 2023 to Oct 2023
US Census ZCTA codes Professional edition Polygon 33144 ZIP Code Tabulation Areas defined in 2018
NYC TLC taxi trip records Professional edition Point 200 million NYC TLC taxi trip pickup and dropoff records per trip
Open Street Maps all nodes Professional edition Point 8 billion All the nodes of the OpenStreetMap Planet dataset
Open Street Maps postal codes Professional edition Polygon 154 thousand Boundaries of postal code areas as defined in OpenStreetMap
Weather events Professional edition Point 8.6 million Events such as rain, snow, storm, from 2016 - 2022
Wild fires Professional edition Point 1.8 million Wildfire that occurred in the United States from 1992 to 2015

Use case notebooks

We provide interesting use case notebooks to demonstrate how you can link your data to the physical world and drive insights.

Pro edition users will be able to execute these notebooks on Wherobots cloud.

Overviews of these notebooks are as follows.

Access open data

Our data can be referenced by the following format CATALOG_NAME.DATABASE_NAME.TABLE_NAME.

Users can read these tables by calling sedona.table(CATALOG_NAME.DATABASE_NAME.TABLE_NAME).show().

To connect to Wherobots open data catalog, the following settings need to be set for SedonaContext

Settings for community users

The catalog name for community users is wherobots_examples.

from sedona.spark import *

config = SedonaContext.builder(). \
    config("spark.sql.catalog.wherobots_examples.type", "hadoop"). \
    config("spark.sql.catalog.wherobots_examples", "org.apache.iceberg.spark.SparkCatalog"). \
    config("spark.sql.catalog.wherobots_examples.warehouse", "s3://wherobots-examples-prod/havasu/warehouse"). \
    config("spark.sql.catalog.wherobots_examples.io-impl", "org.apache.iceberg.aws.s3.S3FileIO"). \
    getOrCreate()

sedona = SedonaContext.create(config)

Settings for pro users

The catalog name for community users is wherobots_open_data.

from sedona.spark import *

config = SedonaContext.builder().\
    config("spark.sql.catalog.wherobots_open_data.type", "hadoop"). \
    config("spark.sql.catalog.wherobots_open_data", "org.apache.iceberg.spark.SparkCatalog"). \
    config("spark.sql.catalog.wherobots_open_data.warehouse", "s3://wherobots-open-data-prod/havasu/warehouse"). \
    config("spark.sql.catalog.wherobots_open_data.io-impl", "org.apache.iceberg.aws.s3.S3FileIO"). \
    getOrCreate()

sedona = SedonaContext.create(config)

Inspect open data catalog

You can inspect the existing databases and tables in a catalog as follows:

Show database names

sedona.sql("SHOW SCHEMAS IN wherobots_open_data").show()
+----------------+
|       namespace|
+----------------+
|google_microsoft|
|         landsat|
|        nyc_taxi|
|             osm|
|       us_census|
|         weather|
+----------------+

Show table names

Use weather database as an example:

sedona.sql("SHOW TABLES IN wherobots_open_data.weather").show()
+---------+--------------+-----------+
|namespace|     tableName|isTemporary|
+---------+--------------+-----------+
|  weather|weather_events|      false|
|  weather|    wild_fires|      false|
+---------+--------------+-----------+

Show table schema and content

Use weather.weather_events as an example:

sedona.table("wherobots_open_data.weather.weather_events").printSchema()
root
 |-- EventId: string (nullable = true)
 |-- Type: string (nullable = true)
 |-- Severity: string (nullable = true)
 |-- StartTime(UTC): string (nullable = true)
 |-- EndTime(UTC): string (nullable = true)
 |-- Precipitation(in): string (nullable = true)
 |-- TimeZone: string (nullable = true)
 |-- AirportCode: string (nullable = true)
 |-- LocationLat: string (nullable = true)
 |-- LocationLng: string (nullable = true)
 |-- City: string (nullable = true)
 |-- County: string (nullable = true)
 |-- State: string (nullable = true)
 |-- ZipCode: string (nullable = true)
 |-- geometry: geometry (nullable = true)

Last update: October 22, 2023 10:35:47