Wherobots open data
Wherobots collects open datasets from various data sources, then cleans and transforms them to Havasu format to enable linking enterprise data to the real physical world.
All datasets are provided for free (except AWS data transfer fee). Certain datasets are only accessible by our Pro Edition
users. If you are interested in upgrading your plan, please contact us.
Open data catalog¶
Dataset name | Availability in Wherobots | Type | Count | Description |
---|---|---|---|---|
Overture Maps buildings/building | Community edition | Polygon | 785 million | Any human-made structures with roofs or interior spaces |
Overture Maps places/place | Community edition | Point | 59 million | Any business or point of interest within the world |
Overture Maps admins/administrativeBoundary | Community edition | LineString | 96 thousand | Any officially defined border between two Administrative Localities |
Overture Maps admins/locality | Community edition | Point | 2948 | Countries and hierarchical subdivisions of countries |
Overture Maps transportation/connector | Community edition | Point | 330 million | Points of physical connection between two or more segments |
Overture Maps transportation/segment | Community edition | LineString | 294 million | Center-line of a path which may be traveled |
Google & Microsoft open buildings | Professional edition | Polygon | 2.5 billion | Google & Microsoft Open Buildings, combined by VIDA |
LandSAT surface temperature | Professional edition | Raster (GeoTiff) | 166K images, 10 TB size | The temperature of the Earth's surface in Kelvin, from Aug 2023 to Oct 2023 |
US Census ZCTA codes | Professional edition | Polygon | 33144 | ZIP Code Tabulation Areas defined in 2018 |
NYC TLC taxi trip records | Professional edition | Point | 200 million | NYC TLC taxi trip pickup and dropoff records per trip |
Open Street Maps all nodes | Professional edition | Point | 8 billion | All the nodes of the OpenStreetMap Planet dataset |
Open Street Maps postal codes | Professional edition | Polygon | 154 thousand | Boundaries of postal code areas as defined in OpenStreetMap |
Weather events | Professional edition | Point | 8.6 million | Events such as rain, snow, storm, from 2016 - 2022 |
Wild fires | Professional edition | Point | 1.8 million | Wildfire that occurred in the United States from 1992 to 2015 |
Use case notebooks¶
We provide interesting use case notebooks to demonstrate how you can link your data to the physical world and drive insights.
Pro edition users will be able to execute these notebooks on Wherobots cloud.
Overviews of these notebooks are as follows.
Access open data¶
Our data can be referenced by the following format CATALOG_NAME.DATABASE_NAME.TABLE_NAME
.
Users can read these tables by calling sedona.table(CATALOG_NAME.DATABASE_NAME.TABLE_NAME).show()
.
To connect to Wherobots open data catalog, the following settings need to be set for SedonaContext
Settings for community users¶
The catalog name for community users is wherobots_examples
.
from sedona.spark import *
config = SedonaContext.builder(). \
config("spark.sql.catalog.wherobots_examples.type", "hadoop"). \
config("spark.sql.catalog.wherobots_examples", "org.apache.iceberg.spark.SparkCatalog"). \
config("spark.sql.catalog.wherobots_examples.warehouse", "s3://wherobots-examples-prod/havasu/warehouse"). \
config("spark.sql.catalog.wherobots_examples.io-impl", "org.apache.iceberg.aws.s3.S3FileIO"). \
getOrCreate()
sedona = SedonaContext.create(config)
Settings for pro users¶
The catalog name for community users is wherobots_open_data
.
from sedona.spark import *
config = SedonaContext.builder().\
config("spark.sql.catalog.wherobots_open_data.type", "hadoop"). \
config("spark.sql.catalog.wherobots_open_data", "org.apache.iceberg.spark.SparkCatalog"). \
config("spark.sql.catalog.wherobots_open_data.warehouse", "s3://wherobots-open-data-prod/havasu/warehouse"). \
config("spark.sql.catalog.wherobots_open_data.io-impl", "org.apache.iceberg.aws.s3.S3FileIO"). \
getOrCreate()
sedona = SedonaContext.create(config)
Inspect open data catalog¶
You can inspect the existing databases and tables in a catalog as follows:
Show database names¶
sedona.sql("SHOW SCHEMAS IN wherobots_open_data").show()
+----------------+
| namespace|
+----------------+
|google_microsoft|
| landsat|
| nyc_taxi|
| osm|
| us_census|
| weather|
+----------------+
Show table names¶
Use weather
database as an example:
sedona.sql("SHOW TABLES IN wherobots_open_data.weather").show()
+---------+--------------+-----------+
|namespace| tableName|isTemporary|
+---------+--------------+-----------+
| weather|weather_events| false|
| weather| wild_fires| false|
+---------+--------------+-----------+
Show table schema and content¶
Use weather.weather_events
as an example:
sedona.table("wherobots_open_data.weather.weather_events").printSchema()
root
|-- EventId: string (nullable = true)
|-- Type: string (nullable = true)
|-- Severity: string (nullable = true)
|-- StartTime(UTC): string (nullable = true)
|-- EndTime(UTC): string (nullable = true)
|-- Precipitation(in): string (nullable = true)
|-- TimeZone: string (nullable = true)
|-- AirportCode: string (nullable = true)
|-- LocationLat: string (nullable = true)
|-- LocationLng: string (nullable = true)
|-- City: string (nullable = true)
|-- County: string (nullable = true)
|-- State: string (nullable = true)
|-- ZipCode: string (nullable = true)
|-- geometry: geometry (nullable = true)