Assess PAI Model Performance Using Cross-Validation
assess_pai_model.Rd
Performs k-fold cross-validation to provide a robust estimate of a PAI model's predictive performance, offering both random and spatial CV methods.
Usage
assess_pai_model(
gcp_data,
method,
validation_type = "random",
k_folds = 5,
seed = 123,
...
)
Arguments
- gcp_data
An
sf
object of homologous points, fromread_gcps()
.- method
A character string specifying the algorithm to assess. One of: "lm", "gam", "rf", "helmert", "tps".
- validation_type
A character string specifying the cross-validation strategy. One of "random" (default) or "spatial".
- k_folds
An integer specifying the number of folds for cross-validation. Defaults to 10 for "random" and 5 for "spatial".
- seed
An integer for setting the random seed for reproducibility.
- ...
Additional arguments passed to the underlying
train_pai_model
function (e.g.,num.threads
forranger
).
Value
A data frame summarizing the cross-validation results, containing:
- Method
The algorithm that was assessed.
- ValidationType
The CV strategy used.
- Mean_RMSE_2D
The average 2D RMSE across all k-folds.
- SD_RMSE_2D
The standard deviation of the 2D RMSE across all k-folds.
Details
Model validation is crucial for understanding how well a model will generalize to new data. This function automates this process.
Validation Types:
random
(default): Standard k-fold cross-validation. Data is randomly partitioned into folds. This can produce overly optimistic results for spatial data due to spatial autocorrelation.spatial
: Spatial Cross-Validation (SCV). Homologous points are clustered intok_folds
spatially distinct groups using k-means clustering on their coordinates. The model is then trained on k-1 groups and tested on the held-out group.
The function loops through the folds, trains a temporary model on the training data for each fold, predicts on the test data, and calculates the 2D Root Mean Squared Error (RMSE). The final output is the mean and standard deviation of the RMSEs across all folds.
Examples
if (FALSE) { # \dontrun{
# --- 1. Generate and read demo data ---
demo_files <- create_demo_data(seed = 42)
gcp_data <- read_gcps(gcp_path = demo_files$gcp_path, crs = 3857)
# --- 2. Assess a Random Forest model with RANDOM CV ---
random_assessment <- assess_pai_model(
gcp_data,
method = "rf",
validation_type = "random",
k_folds = 10
)
print(random_assessment)
# --- 3. Assess the SAME model with SPATIAL CV ---
spatial_assessment <- assess_pai_model(
gcp_data,
method = "rf",
validation_type = "spatial",
k_folds = 5
)
print(spatial_assessment)
} # }