Map Matching
Map matching is the process of mapping noisy GPS points to correct road segments.
Class: sedonamaps.core.MapMatching
¶
This class provides methods to perform map matching on a road network.
loadOSM
method¶
Method signature¶
Python definition:
def sedonamaps.core.MapMatching.loadOSM(osmPath: str, tagsFilter: str = "")
Scala definition:
def loadOSM(osmPath: String, tagsFilter: String = ""): DataFrame
Java definition:
public static DataFrame loadOSM(String osmPath, String tagsFilter);
public static DataFrame loadOSM(String osmPath);
Parameters:
osmPath
: Path to the OSM XML file-
tagsFilter
: Tag values of thehighway
tag to be used for filtering the OSM data. Multiple values delimited by,
can be specified. Specify empty string to preserve all the edges. Default value is empty string.There is a special value
[car]
for filtering the OSM edges for cars. This value expands to the following tags:motorway,motorway_link,trunk,trunk_link,primary,primary_link,secondary,secondary_link,tertiary,tertiary_link,unclassified,residential,living_street,service,road,track
Returns a Sedona DataFrame
.
Example¶
from sedonamaps.core import MapMatching as mm
dfEdge = mm.loadOSM(PATH_PREFIX + "data/osm2.xml", "[car]")
dfEdge.show(5)
import com.wherobots.sedonamaps.MapMatching
val dfEdge = MapMatching.loadOSM(resourceFolder + "osm2.xml", "[car]")
dfEdge.show(5)
import com.wherobots.sedonamaps.MapMatching
Dataset dfEdge = MapMatching.loadOSM(resourceFoler + "data/osm2.xml", "[car]");
dfEdge.show(5)
perform_matching
method¶
Method signature¶
Python Definition:
def sedonamaps.core.MapMatching.perform(edgesDf: DataFrame, pathsDf: DataFrame, colEdgesGeom: String, colPathsGeom: String, idFieldName: str = None)
Scala Definition:
def perform(edgesDf: DataFrame, pathsDf: DataFrame, colEdgesGeom: String, colPathsGeom: String, idFieldName: String = None): DataFrame
Java Definition:
public static DataFrame perform(DataFrame edgesDf, DataFrame pathsDf, String colEdgesGeom, String colPathsGeom);
// Or
public static DataFrame perform(DataFrame edgesDf, DataFrame pathsDf, String colEdgesGeom, String colPathsGeom, String idFieldName);
Parameters:
- edgesDf (DataFrame)
- Sedona DataFrame containing the attributes loaded from the OSM file.
- pathsDf (DataFrame)
- Sedona DataFrame containing the GPS trips or LineStrings for which map matching will be performed.
- colEdgesGeom (String)
- Name of the geometry type column in the DataFrame edgesDf
.
- colPathsGeom (String)
- Name of the geometry type column in the DataFrame pathsDf
.
- idFieldName (String)
- Optional: The column in dfPaths DataFrame that contains the unique identifier for each GPS trip. if not provided, the first non-geometry column is used.
Returns a PySpark DataFrame
object containing the results of map matching. This DataFrame includes fields such as ids
, observed_points
, matched_points
, and matched_nodes
.
Example¶
dfMmResult = mm.perform(dfEdge, dfPaths, "geometry", "geometry")
dfMmResult.show(5)
val dfMmResult = MapMatching.perform(dfEdge, dfPaths, "geometry", "geometry")
dfMmResult.show(5)
Dataset matchingResultDf = MapMatching.perform(edgesDf, pathsSpatialDf, "geometry", "geometry");
matchingResultDf.show();
Advanced Configuration¶
SedonaMaps has several advanced configs that can be set through Config
:
config = SedonaContext.builder() .\
config("sedonamaps.mm.maxdist","50.0"). \
getOrCreate()
sedona = SedonaContext.create(config)
val config = SedonaContext.builder().
config("sedonamaps.mm.maxdist","50.0").
.getOrCreate()
val sedona = SedonaContext.create(config)
SparkSession config = SedonaContext.builder()
.config("sedonamaps.mm.maxdist","50.0")
.appName("SparkSedonaExample")
.getOrCreate();
SparkSession sedona = SedonaContext.create(config);
These configurations can also be tuned when the Sedona context was already created:
sedona.conf.set("sedonamaps.mm.maxdist", "50.0")
sedona.conf.set("sedonamaps.mm.maxdist", "50.0")
sedona.conf().set("sedonamaps.mm.maxdist", "50.0");
Explanation¶
How Distributed Map-Matching Works¶
SedonaMaps runs batch map-matching on a large collection of trajectories in a distributed manner. The map-matching process is divided into two phases: distributing workloads and local map-matching. The distributing workloads phase rearranges the trajectories and the road segments near those trajectories to the same partition, and the local map-matching phase performs map-matching on each partition, where we already have trajectories and their surrounding road network co-located.
Parameters for Distributing Map-Matching Workloads¶
- sedonamaps.mm.numspatialpartitions
- Number of spatial partitions generated in the spatial join phase. This controls the parallelism of performing spatial join between trajectories and road networks. A recommended value is 10 * number of executor cores.
- Default value: None
- Possible values: any positive integer value
The Local Map-Matching Algorithm¶
The local map-matching algorithm is based on a Hidden Markov Model (HMM), which is popularized by the paper Hidden Markov Map Matching Through Noise and Sparseness. SedonaMaps implements a variation of this algorithm.
Parameters for the Local Map-Matching Algorithm¶
- sedonamaps.mm.matcher
- The algorithm for the local map matcher. The legacy mode works better for dense trajectories (high sampling rate) while the advanced mode works better for sparse trajectories (low sampling rate).
- Default value: legacy
- Possible values: legacy, advanced
- sedonamaps.mm.adv.gpsaccuracy
- The GPS accuracy of the input data, in the unit of meters. This controls the search radius of each observation. For sparse data, a higher value (e.g., 40 meters) will improve the accuracy but decrease the speed.
- Default value: 20
- Possible values: any positive integer value
- sedonamaps.mm.adv.partialmatch
- The local map matching algorithm will terminate early if it cannot find matches for every observation of a trip and the result will be a partial match of the original trip. This parameter controls if partial matches should be included in the output. If false, partial matches will become
LineString EMPTY
in the final output DataFrame. - Defalue value: false
- Possible values: true, false
- The local map matching algorithm will terminate early if it cannot find matches for every observation of a trip and the result will be a partial match of the original trip. This parameter controls if partial matches should be included in the output. If false, partial matches will become