Skip to content

Map Matching

Map matching is the process of mapping noisy GPS points to correct road segments.

Class: sedonamaps.core.MapMatching

This class provides methods to perform map matching on a road network.

  • Method: loadOSM

    • Scala definition:
      def loadOSM(osmPath: String, tagsFilter: String = ""): DataFrame
    • Java definition:
      public static DataFrame loadOSM(String osmPath, String tagsFilter);
      public static DataFrame loadOSM(String osmPath);
    • Python definition:
      def sedonamaps.core.MapMatching.loadOSM(osmPath: str, tagsFilter: str = "")
    • Parameters:

      • osmPath: Path to the OSM XML file
      • tagsFilter: Tag values of the highway tag to be used for filtering the OSM data. Multiple values delimited by , can be specified. Specify empty string to preserve all the edges. Default value is empty string.

        There is a special value [car] for filtering the OSM edges for cars. This value expands to the following tags:


    • Returns: A Sedona DataFrame.

    • Example:
import com.wherobots.sedonamaps.MapMatchingDf
val dfEdge = MapMatchingDf.loadOSM(resourceFolder + "osm2.xml", "[car]")
import com.wherobots.sedonamaps.MapMatchingDf
Dataset dfEdge = MapMatchingDf.loadOSM(resourceFoler + "data/osm2.xml", "[car]");
from sedonamaps.core import MapMatching as mm
dfEdge = mm.loadOSM(PATH_PREFIX + "data/osm2.xml", "[car]")
  • Method: perform_matching
    • Scala Definition:
      def performMapMatching(edgesDf: DataFrame, pathsDf: DataFrame, colEdgesGeom: String, colPathsGeom: String): DataFrame
    • Java Definition:
      public static DataFrame performMapMatching(DataFrame edgesDf, DataFrame pathsDf, String colEdgesGeom, String colPathsGeom);
    • Python Definition:
      def sedonamaps.core.MapMatching.performMapMatching(edgesDf: DataFrame, pathsDf: DataFrame, colEdgesGeom: String, colPathsGeom: String)
    • Parameters:
      • edgesDf (DataFrame) - Sedona DataFrame containing the attributes loaded from the OSM file.
      • pathsDf (DataFrame) - Sedona DataFrame containing the GPS trips or LineStrings for which map matching will be performed.
      • colEdgesGeom (String) - Name of the geometry type column in the DataFrame edgesDf.
      • colPathsGeom (String) - Name of the geometry type column in the DataFrame pathsDf.
    • Returns:
      • A PySpark DataFrame object containing the results of map matching. This DataFrame includes fields such as ids, observed_points, matched_points, and matched_nodes.
    • Example:
val dfMmResult = MapMatchingDf.performMapMatching(dfEdge, dfPaths, "geometry", "geometry")
Dataset matchingResultDf = MapMatchingDf.performMapMatching(edgesDf, pathsSpatialDf, "geometry", "geometry");;
dfMmResult = mm.performMapMatching(dfEdge, dfPaths, "geometry", "geometry")

Advanced Configuration

SedonaMaps has several advanced configs that can be set through Config:

val config = SedonaContext.builder().
val sedona = SedonaContext.create(config)
SparkSession config = SedonaContext.builder()
SparkSession sedona = SedonaContext.create(config);
config = SedonaContext.builder() .\
    config("","50.0"). \
sedona = SedonaContext.create(config)

These configurations can also be tuned when the Sedona context was already created:

sedona.conf.set("", "50.0")
sedona.conf().set("", "50.0");
sedona.conf.set("", "50.0")


Parameters for Distributing Map-Matching Workloads


    • Trajectories will be splitted into smaller segments, where each segment is at most long. This is for speeding up the spatial join phase of the map matching process. Long trajectory segments would duplicate the segments to multiple spatial partitions, and long segments produce join results with lots of edges, which will hurt the spatial join performance. The default value works well for most cases.
    • Default value: 0.5
    • Possible values: any double value

    • Number of partitions of the local map-matching phase. This controls the parallelism of performing local map-matching on local road networks. A recommended value is 10 * number of executor cores.
    • Default value: None
    • Possible values: any positive integer value

    • Number of spatial partitions generated in the spatial join phase. This controls the parallelism of performing spatial join between trajectories and road networks. A recommended value is 10 * number of executor cores.
    • Default value: None
    • Possible values: any positive integer value

Parameters for Map-Matching Algorithm


    • During a step of map matching, when the algorithm searches for a predicted point from an observation, the predicted point needs to be within maxDist distance. Setting maxDist to a higher value results in slowing the map-matching process, but it increases the probability of finding a prediction instead of stopping the process early. In the case of low maxDist value, the algorithm runs faster, but the map matching process may terminate without finding prediction for all observations.
    • Default value: 50.0
    • Possible values: any double value

    • Similar to maxDist, but only applicable to the first point in the given path or first observation. If not provided, this parameter is set to the value of maxDist.
    • Default value: 60.0
    • Possible values: any double value

    • Minimum normalized probability of all the states at an observation. Similar to maxDist, this parameter is used to control when to terminate the map matching process. The lower value will increase the probability of finishing the matching for all observations successfully with a trade of which will make the process slow. Higher value of minProbNorm makes the map matching faster with the risk of terminating the matching early without finding predictions for all observations.
    • Default value: 0.1
    • Possible values: any double value <= 1.0

    • a boolean parameter indicating whether to allow non emitting states. A non-emitting state is a state that is not associated with an observation Assume that it can be associated with a location in between two observations. Set this parameter to true if there are multiple road segments or edges between two observations. This case is seen in many GPS paths so the default is true.
    • Default value: true
    • Possible values: true, false

    • When calculating the log probability of various states from an observation, a standard deviation of noise equal to this parameter value is considered for the distance between observation and target states.
    • Default value: 20
    • Possible values: any double value

    • Standard deviation of noise for non-emitting states. If not provided, this parameter is set to the value of obsNoise.
    • Default value: same as obsNoise
    • Possible values: any double value

    • Only continue from a limited number of states (thus locations) for a given observation. This possibly speeds up the matching by a lot. If there are more possible next states, the states with the best likelihood so far are selected. This parameter trades accuracy for speed. Tuning this value to 30 speeds up the local map-matching phase by 1.5x without compromising too much accuracy.
    • Default value: None
    • Possible values: any integer value

Last update: November 14, 2023 06:27:23