Skip to content

Map Matching

Map matching is the process of mapping noisy GPS points to correct road segments.

Class: sedonamaps.core.MapMatching

This class provides methods to perform map matching on a road network.

loadOSM method

Method signature

Python definition:

def sedonamaps.core.MapMatching.loadOSM(osmPath: str, tagsFilter: str = "")

Scala definition:

def loadOSM(osmPath: String, tagsFilter: String = ""): DataFrame

Java definition:

public static DataFrame loadOSM(String osmPath, String tagsFilter);
public static DataFrame loadOSM(String osmPath);

Parameters:

  • osmPath: Path to the OSM XML file
  • tagsFilter: Tag values of the highway tag to be used for filtering the OSM data. Multiple values delimited by , can be specified. Specify empty string to preserve all the edges. Default value is empty string.

    There is a special value [car] for filtering the OSM edges for cars. This value expands to the following tags:

    motorway,motorway_link,trunk,trunk_link,primary,primary_link,secondary,secondary_link,tertiary,tertiary_link,unclassified,residential,living_street,service,road,track
    

Returns a Sedona DataFrame.

Example

from sedonamaps.core import MapMatching as mm
dfEdge = mm.loadOSM(PATH_PREFIX + "data/osm2.xml", "[car]")
dfEdge.show(5)
import com.wherobots.sedonamaps.MapMatchingDf
val dfEdge = MapMatchingDf.loadOSM(resourceFolder + "osm2.xml", "[car]")
dfEdge.show(5)
import com.wherobots.sedonamaps.MapMatchingDf
Dataset dfEdge = MapMatchingDf.loadOSM(resourceFoler + "data/osm2.xml", "[car]");
dfEdge.show(5)

perform_matching method

Method signature

Python Definition:

def sedonamaps.core.MapMatching.performMapMatching(edgesDf: DataFrame, pathsDf: DataFrame, colEdgesGeom: String, colPathsGeom: String)

Scala Definition:

def performMapMatching(edgesDf: DataFrame, pathsDf: DataFrame, colEdgesGeom: String, colPathsGeom: String): DataFrame

Java Definition:

public static DataFrame performMapMatching(DataFrame edgesDf, DataFrame pathsDf, String colEdgesGeom, String colPathsGeom);

Parameters: - edgesDf (DataFrame) - Sedona DataFrame containing the attributes loaded from the OSM file. - pathsDf (DataFrame) - Sedona DataFrame containing the GPS trips or LineStrings for which map matching will be performed. - colEdgesGeom (String) - Name of the geometry type column in the DataFrame edgesDf. - colPathsGeom (String) - Name of the geometry type column in the DataFrame pathsDf.

Returns a PySpark DataFrame object containing the results of map matching. This DataFrame includes fields such as ids, observed_points, matched_points, and matched_nodes.

Example

dfMmResult = mm.performMapMatching(dfEdge, dfPaths, "geometry", "geometry")
dfMmResult.show(5)
val dfMmResult = MapMatchingDf.performMapMatching(dfEdge, dfPaths, "geometry", "geometry")
dfMmResult.show(5)
Dataset matchingResultDf = MapMatchingDf.performMapMatching(edgesDf, pathsSpatialDf, "geometry", "geometry");
matchingResultDf.show();

Advanced Configuration

SedonaMaps has several advanced configs that can be set through Config:

config = SedonaContext.builder() .\
    config("sedonamaps.mm.maxdist","50.0"). \
    getOrCreate()
sedona = SedonaContext.create(config)
val config = SedonaContext.builder().
    config("sedonamaps.mm.maxdist","50.0").
    .getOrCreate()
val sedona = SedonaContext.create(config)
SparkSession config = SedonaContext.builder()
            .config("sedonamaps.mm.maxdist","50.0")
            .appName("SparkSedonaExample")
            .getOrCreate();
SparkSession sedona = SedonaContext.create(config);

These configurations can also be tuned when the Sedona context was already created:

sedona.conf.set("sedonamaps.mm.maxdist", "50.0")
sedona.conf.set("sedonamaps.mm.maxdist", "50.0")
sedona.conf().set("sedonamaps.mm.maxdist", "50.0");

Explanation

How Distributed Map-Matching Works

SedonaMaps runs batch map-matching on a large collection of trajectories in a distributed manner. The map-matching process is divided into two phases: distributing workloads and local map-matching. The distributing workloads phase rearranges the trajectories and the road segments near those trajectories to the same partition, and the local map-matching phase performs map-matching on each partition, where we already have trajectories and their surrounding road network co-located.

Parameters for Distributing Map-Matching Workloads

  • sedonamaps.mm.maxseglengthkm

    • Trajectories will be splitted into smaller segments, where each segment is at most sedonamaps.mm.maxseglengthkm long. This is for speeding up the spatial join phase of the map matching process. Long trajectory segments would duplicate the segments to multiple spatial partitions, and long segments produce join results with lots of edges, which will hurt the spatial join performance. The default value works well for most cases.
    • Default value: 0.5
    • Possible values: any double value
  • sedonamaps.mm.numpartitions

    • Number of partitions of the local map-matching phase. This controls the parallelism of performing local map-matching on local road networks. A recommended value is 10 * number of executor cores.
    • Default value: None
    • Possible values: any positive integer value
  • sedonamaps.mm.numspatialpartitions

    • Number of spatial partitions generated in the spatial join phase. This controls the parallelism of performing spatial join between trajectories and road networks. A recommended value is 10 * number of executor cores.
    • Default value: None
    • Possible values: any positive integer value

The Local Map-Matching Algorithm

The local map-matching algorithm is based on a Hidden Markov Model (HMM), which is popularized by the paper Hidden Markov Map Matching Through Noise and Sparseness. SedonaMaps implements a variation of this algorithm, which generates "non-emitting states" that are not associated with any observation. These non-emitting states are for dynamic interpolation between 2 consecutive observations, so that non-connecting road segments matched to 2 consecutive GPS coordinates could be connected. For sparse GPS data, the algorithm should be tuned to allow generating more non-emitting states to heal the gaps between observations.

Parameters for Map-Matching Algorithm

  • sedonamaps.mm.maxdist

    • During a step of map matching, when the algorithm searches for a predicted point from an observation, the predicted point needs to be within maxDist distance. Setting maxDist to a higher value results in slowing the map-matching process, but it increases the probability of finding a prediction instead of stopping the process early. In the case of low maxDist value, the algorithm runs faster, but the map matching process may terminate without finding prediction for all observations.
    • Default value: 50.0
    • Possible values: any double value
  • sedonamaps.mm.maxdistinit

    • Similar to maxDist, but only applicable to the first point in the given path or first observation. If not provided, this parameter is set to the value of maxDist.
    • Default value: 60.0
    • Possible values: any double value
  • sedonamaps.mm.minprobnorm

    • Minimum normalized probability of all the states at an observation. Similar to maxDist, this parameter is used to control when to terminate the map matching process. The lower value will increase the probability of finishing the matching for all observations successfully with a trade of which will make the process slow. Higher value of minProbNorm makes the map matching faster with the risk of terminating the matching early without finding predictions for all observations.
    • Default value: 0.1
    • Possible values: any double value ≤ 1.0
  • sedonamaps.mm.nonemittingstates

    • a boolean parameter indicating whether to allow non emitting states. A non-emitting state is a state that is not associated with an observation. Assume that it can be associated with a location in between two observations. Set this parameter to true if there are multiple road segments or edges between two observations. This case is seen in many GPS paths so the default is true.
    • Default value: true
    • Possible values: true, false
  • sedonamaps.mm.obsnoise

    • When calculating the log probability of various states from an observation, a standard deviation of noise equal to this parameter value is considered for the distance between observation and target states.
    • Default value: 20
    • Possible values: any double value
  • sedonamaps.mm.obsnoisene

    • Standard deviation of noise for non-emitting states. This value should be larger than obsNoise as it is mapped to the line between the previous and next observation, which does not necessarily run over the relevant segment. If not provided, this parameter is set to the value of obsNoise.
    • Default value: same as obsNoise
    • Possible values: any double value
  • sedonamaps.mm.nonemittinglengthfactor

    • Reduce the probability of a sequence of non-emitting states the longer it is. This can be used to prefer shorter paths. If the GPS samples are really sparse, this parameter should be set to a higher value to allow longer non-emitting states.
    • Default value: 0.5
    • Possible values: any double value ≤ 1.0
  • sedonamaps.mm.distnoise

    • Standard deviation of difference between distance between states and distance between observations.
    • Default value: 10
    • Possible values: any double value
  • sedonamaps.mm.distnoisene

    • Standard deviation of difference between distance between non-emitting states and distance between observations.
    • Default value: 1
    • Possible values: any double value
  • sedonamaps.mm.maxlatticewidth

    • Only continue from a limited number of states (thus locations) for a given observation. This possibly speeds up the matching by a lot. If there are more possible next states, the states with the best likelihood so far are selected. This parameter trades accuracy for speed. Tuning this value to 30 speeds up the local map-matching phase by 1.5x without compromising too much accuracy.
    • Default value: None
    • Possible values: any integer value

Parameter Tuning Guidelines

SedonaMaps is quite sensitive to the parameters, you have to tune the parameters to make SedonaMaps match your data. This section provides some guidelines for tuning the parameters.

Tuning for dense data

For dense GPS data with ≥ 1 Hz sampling rate, we can set smaller sedonamaps.mm.maxdist (for example, 100) to generate less non-emitting states, and set smaller sedonamaps.mm.obsnoise, sedonamaps.mm.obsnoisene, sedonamaps.mm.distnoise and sedonamaps.mm.distnoisene values (somewhere between 10 ~ 50 is usually good). Usually the default parameters are good enough to match trajectories with dense data. If you find some of your data cannot be matched, you can try to tuning the parameters mentioned above to larger values. Please note that the larger values of these parameters will hurt the accuracy of the map matching.

Tuning for sparse data

For sparse GPS data with ≤ 0.1Hz sampling rate, we need to set sedonamaps.mm.maxdist to a larger value to make SedonaMaps generate more non-emitting states for each observation, otherwise the non-emitting state may not be able to reach the next GPS sample. Setting sedonamaps.mm.maxdist to 200 is a good starting point, and please note that setting a large sedonamaps.mm.maxdist value will slow down the map-matching process. If trajectory cannot be matched when a large sedonamaps.mm.maxdist value is set, we could set sedonamaps.mm.nonemittinglengthfactor to a higher value (for example, 0.9) to apply less penalty to non-emitting states, so that the algorithm won't stop early when there are many non-emitting states.

We also want to be more tolerant to the noise of the GPS data and the difference between matched roads and GPS samples when working with sparse data. We can set larger sedonamaps.mm.obsnoise, sedonamaps.mm.obsnoisene, sedonamaps.mm.distnoise and sedonamaps.mm.distnoisene values (somewhere between 50 ~ 100 is usually good). Please note that the larger values of these parameters will hurt the accuracy of the map matching.

Map-matching will be slower for sparse data, since there will be lots of non-emitting states generated for interpolating between GPS samples, and we can set sedonamaps.mm.maxlatticewidth to prune unlikely states. 30 is usually a good starting point which balances speed and accuracy.


Last update: February 20, 2024 15:18:12