Advanced Analysis with Swiss Data • mapAI

This vignette demonstrates a workflow for analyzing and correcting a historical map with complex, non-linear distortions. While the Getting Started vignette used a synthetic data set, this one focuses on a more challenging real-world scenario using the swiss_cps dataset.

The key feature of this dataset is that the gross, linear distortions have already been removed with a Helmert transformation. The remaining errors are subtle, non-linear, and spatially variable. Our goal is to assess different validation techniques, train a robust model for these non-linear residuals, and conduct an in-depth analysis of the spatial distortion.

Setup

First, we load all the necessary libraries for our workflow.

# Load the necessary libraries
library(mapAI)
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(knitr)

Initial Data Analysis

We begin by loading the swiss_cps dataset. This sf object contains 343 control points where the source coordinates have been pre-aligned, so we can focus on the residual dx and dy errors.

# Load the built-in Swiss control points dataset
data(swiss_cps)

# Inspect the data structure
glimpse(swiss_cps)
#> Rows: 343
#> Columns: 8
#> $ Index    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
#> $ source_x <dbl> 612293.0, 612225.0, 615470.3, 616339.8, 616927.1, 617571.9, 6…
#> $ source_y <dbl> 267353.6, 270412.7, 270195.1, 269367.5, 269507.6, 269273.5, 2…
#> $ target_x <dbl> 611375.9, 611573.1, 615840.8, 616129.6, 617031.0, 618168.7, 6…
#> $ target_y <dbl> 267719.1, 270370.6, 270463.6, 269380.6, 268932.1, 269179.3, 2…
#> $ dx       <dbl> -917.1063, -651.9320, 370.4737, -210.1946, 103.8862, 596.7662…
#> $ dy       <dbl> 365.468249, -42.050863, 268.478200, 13.144895, -575.530782, -…
#> $ geometry <POINT [m]> POINT (612293 267353.6), POINT (612225 270412.7), POINT…

# Visualize the residual displacement vectors
plot_displacement(swiss_cps, title = "Displacement Vectors") +
  labs(title = "", subtitle = "",
       x = "x (m)",
       y = "y (m)")

The plot shows clear spatial patterns in the remaining errors, which a simple linear model could not fix. This confirms the need for a non-linear approach.

Error Assessment and Model Validation

Before training a final model, it is crucial to assess its expected performance. The assess_pai_model function provides a robust framework for this. We will compare four different validation strategies to understand how the model might perform on new data.

Standard & Spatial Cross-Validation: We test a 10-fold cross-validation, once with random data splits and once with “spatial” splits, which ensures that training and validation sets are spatially separated.
Design-Based Validation: We also test validation based on sampling probability (probability) and stratified random sampling (stratified), which are useful when the data points have a known design or structure.

# Run four different validation methods using a GAM
cvra <- assess_pai_model(swiss_cps, pai_method = "gam", validation_type = "random", k_folds = 10, seed = 1)
cvsp <- assess_pai_model(swiss_cps, pai_method = "gam", validation_type = "spatial", k_folds = 10, seed = 1)
valr <- assess_pai_model(swiss_cps, pai_method = "gam", validation_type = "probability", seed = 1)
vals <- assess_pai_model(swiss_cps, pai_method = "gam", validation_type = "stratified", seed = 1)

# Combine and format the results into a table
validation_df <- rbind(cvra, cvsp, valr, vals) %>%
  select(Model_Method = Method, Mean_2D_RMSE_m = Mean_RMSE_2D, Std_Deviation_of_RMSE_m = SD_RMSE_2D) %>%
  mutate(across(where(is.numeric), ~ round(.x, 1)))

validation_df$type <- c("Standard 10-fold cross-validation",
                       "Spatial 10-fold cross-validation",
                       "Design-based validation",
                       "Design-based validation using stratified random sampling")

# Print the comparison table
kable(validation_df, caption = "Table: Validation performance across different assessment methods.")

Table: Validation performance across different assessment methods.
Model_Method	Mean_2D_RMSE_m	Std_Deviation_of_RMSE_m	type
gam	778.6	176.8	Standard 10-fold cross-validation
gam	1254.8	796.7	Spatial 10-fold cross-validation
gam	876.7	NA	Design-based validation
gam	702.3	NA	Design-based validation using stratified random sampling

Final Correction and Visualization

Now, confident in our modeling choice, we train a final Generalized Additive Model (gam) on the entire dataset. To visualize the effect of the correction, we will apply this model to a regular reference grid.

# Train the final GAM model on all data points
final_gam_model <- train_pai_model(swiss_cps, pai_method = "gam")
#> Training 'gam' model...

# Create a regular line grid to serve as our reference map
source_grid <- sf::st_make_grid(swiss_cps, n = c(15, 15)) %>% 
  st_cast("MULTILINESTRING") %>% 
  st_sf()

# Apply the trained model to correct the grid
gam_corrected_grid <- apply_pai_model(final_gam_model, source_grid)
#> Applying PAI model to map features...
#> Correction complete.

# Prepare data for plotting
source_grid$status <- "Original (Distorted)"
gam_corrected_grid$status <- "GAM"
comparison_data  <- rbind(source_grid[, "status"], gam_corrected_grid[, "status"])

# Create the comparison plot
ggplot(comparison_data) +
  geom_sf(aes(color = status, linetype = status), fill = NA, linewidth = 0.7) +
  scale_color_manual(name = "Map Status", values = c("Original (Distorted)" = "grey50", "GAM" = "#e41a1c")) +
  geom_point(data = swiss_cps, aes(source_x, source_y), size = 0.2) +
  scale_linetype_manual(name = "Map Status", values = c("Original (Distorted)" = "dashed", "GAM" = "solid")) +
  labs(x = "x (m)", y = "y (m)") +
  coord_sf(datum = sf::st_crs(21781)) +
  ggplot2::theme_minimal()

The plot shows how the regular grid (dashed grey) is warped by the model into the corrected grid (solid red). This transformation is precisely what’s needed to align the distorted map features with their true locations.

Residual Analysis

After applying the correction, we should examine the remaining errors, or residuals. This helps us confirm that our model has captured the systematic distortion patterns and that the remaining errors are small and random.

# Plot the residuals after applying the final GAM model
plot_residuals(final_gam_model, swiss_cps) +
  labs(
    title = "",
    subtitle = "",
    x = "x (m)",
    y = "y (m)"
  )
#> Calculating model residuals...

The residual vectors are much smaller and show no obvious spatial pattern, indicating a successful model fit.

Advanced Distortion Analysis

To quantify the nature of the distortion, we use the analyze_distortion function on a dense grid of points. This allows us to map out different distortion metrics across the entire area. We will visualize three key metrics:

Area Scale: Shows how much areas are stretched or compressed. A value of 1 means no distortion.
Maximum Angular Distortion: Measures the maximum change in angle at any point.
Airy-Kavrayskiy Measure: A combined measure of both areal and angular distortion.

# Create a dense grid of points for a high-resolution analysis
analysis_points <- sf::st_make_grid(swiss_cps, n = c(100, 100)) %>% st_centroid() %>% st_sf()
distortion_results <- analyze_distortion(final_gam_model, analysis_points)
#> Calculating distortion metrics for gam model...
#> Finalizing metrics from derivatives...
#> Distortion analysis complete.

# Plot 1: Area scale distortion
plot_distortion_surface(distortion_results, "area_scale", diverging = TRUE) +
  labs(title = "Area Scale Distortion", subtitle = "", x = "x (m)", y = "y (m)", fill = 'σ')
#> Regular grid detected. Creating a surface plot with geom_raster().
#> Warning: Raster pixels are placed at uneven horizontal intervals and will be shifted
#> ℹ Consider using `geom_tile()` instead.
#> Raster pixels are placed at uneven horizontal intervals and will be shifted
#> ℹ Consider using `geom_tile()` instead.


# Plot 2: Maximum angular distortion
plot_distortion_surface(distortion_results, "max_angular_distortion", palette = "magma") +
  labs(title = "Maximum Angular Distortion", subtitle = "", x = "x (m)", y = "y (m)", fill = "2Ω (rad)")
#> Regular grid detected. Creating a surface plot with geom_raster().
#> Warning: Raster pixels are placed at uneven horizontal intervals and will be shifted
#> ℹ Consider using `geom_tile()` instead.
#> Raster pixels are placed at uneven horizontal intervals and will be shifted
#> ℹ Consider using `geom_tile()` instead.


# Plot 3: Airy-Kavrayskiy distortion measure
plot_distortion_surface(distortion_results, "airy_kavrayskiy") +
  labs(title = "Airy-Kavrayskiy Combined Distortion", subtitle = "", x = "x (m)", y = "y (m)", fill = expression(E["AK"]))
#> Regular grid detected. Creating a surface plot with geom_raster().
#> Warning: Raster pixels are placed at uneven horizontal intervals and will be shifted
#> ℹ Consider using `geom_tile()` instead.
#> Raster pixels are placed at uneven horizontal intervals and will be shifted
#> ℹ Consider using `geom_tile()` instead.

We can also summarize these distortion metrics numerically.

# Calculate and print summary statistics for the distortion metrics
summary_stats <- st_drop_geometry(distortion_results) %>%
  reframe(
    Metric = c("Areal Scale", "Max Angular Distortion (rad)", "Airy-Kavrayskiy Measure"),
    Minimum = round(c(min(area_scale), min(max_angular_distortion), min(airy_kavrayskiy)), 2),
    Mean = round(c(mean(area_scale), mean(max_angular_distortion), mean(airy_kavrayskiy)), 2),
    Maximum = round(c(max(area_scale), max(max_angular_distortion), max(airy_kavrayskiy)), 2)
  )
print(kable(summary_stats, caption = "Table: Summary of distortion metrics from GAM model analysis."))
#> 
#> 
#> Table: Table: Summary of distortion metrics from GAM model analysis.
#> 
#> |Metric                       | Minimum| Mean| Maximum|
#> |:----------------------------|-------:|----:|-------:|
#> |Areal Scale                  |    0.45| 0.87|    1.41|
#> |Max Angular Distortion (rad) |    0.00| 0.21|    0.86|
#> |Airy-Kavrayskiy Measure      |    0.00| 0.04|    0.28|

Visualizing Distortion with Indicatrices

A classic way to visualize map distortion is with Tissot’s Indicatrix, which shows how an infinitesimal circle on the original surface is deformed into an ellipse. We can plot these at our control point locations to see the local distortion characteristics.

# Analyze distortion specifically at the control point locations
distortion_at_gcps <- analyze_distortion(final_gam_model, points_to_analyze = swiss_cps)
#> Calculating distortion metrics for gam model...
#> Finalizing metrics from derivatives...
#> Distortion analysis complete.

# Plot the indicatrices (ellipses) at each point
plot_indicatrices(distortion_at_gcps,  scale_factor = 700) +
  labs(
    title = "Distortion Indicatrices at Control Points",
    subtitle = "",
    x = "x (m)",
    y = "y (m)"
  )
#> Generating indicatrix polygons at source locations...

The size and shape of the ellipses clearly show the spatial variation in both the magnitude and direction of the distortion that our model has successfully captured.

This vignette has demonstrated a comprehensive workflow for a complex, real-world PAI task. By combining robust validation, non-linear modeling, and advanced distortion analysis, mapAI provides a powerful toolkit for any historical map correction project.