Skip to contents

Performs k-fold cross-validation to provide a robust estimate of a PAI model's predictive performance, offering both random and spatial CV methods.

Usage

assess_pai_model(
  gcp_data,
  method,
  validation_type = "random",
  k_folds = 5,
  seed = 123,
  ...
)

Arguments

gcp_data

An sf object of homologous points, from read_gcps().

method

A character string specifying the algorithm to assess. One of: "lm", "gam", "rf", "helmert", "tps".

validation_type

A character string specifying the cross-validation strategy. One of "random" (default) or "spatial".

k_folds

An integer specifying the number of folds for cross-validation. Defaults to 10 for "random" and 5 for "spatial".

seed

An integer for setting the random seed for reproducibility.

...

Additional arguments passed to the underlying train_pai_model function (e.g., num.threads for ranger).

Value

A data frame summarizing the cross-validation results, containing:

Method

The algorithm that was assessed.

ValidationType

The CV strategy used.

Mean_RMSE_2D

The average 2D RMSE across all k-folds.

SD_RMSE_2D

The standard deviation of the 2D RMSE across all k-folds.

Details

Model validation is crucial for understanding how well a model will generalize to new data. This function automates this process.

Validation Types:

  • random (default): Standard k-fold cross-validation. Data is randomly partitioned into folds. This can produce overly optimistic results for spatial data due to spatial autocorrelation.

  • spatial: Spatial Cross-Validation (SCV). Homologous points are clustered into k_folds spatially distinct groups using k-means clustering on their coordinates. The model is then trained on k-1 groups and tested on the held-out group.

The function loops through the folds, trains a temporary model on the training data for each fold, predicts on the test data, and calculates the 2D Root Mean Squared Error (RMSE). The final output is the mean and standard deviation of the RMSEs across all folds.

Examples

if (FALSE) { # \dontrun{
# --- 1. Generate and read demo data ---
demo_files <- create_demo_data(seed = 42)
gcp_data <- read_gcps(gcp_path = demo_files$gcp_path, crs = 3857)

# --- 2. Assess a Random Forest model with RANDOM CV ---
random_assessment <- assess_pai_model(
  gcp_data,
  method = "rf",
  validation_type = "random",
  k_folds = 10
)
print(random_assessment)

# --- 3. Assess the SAME model with SPATIAL CV ---
spatial_assessment <- assess_pai_model(
  gcp_data,
  method = "rf",
  validation_type = "spatial",
  k_folds = 5
)
print(spatial_assessment)
} # }