Assess PAI Model Performance — assess_pai

Performs model validation to estimate a PAI model's predictive performance using k-fold cross-validation or design-based probability sampling.

Usage

assess_pai_model(
  gcp_data,
  pai_method,
  validation_type = "random",
  k_folds = 5,
  train_split_ratio = 0.8,
  n_strata = 4,
  seed = 123,
  ...
)

Arguments

gcp_data: An sf object of homologous points, from read_gcps().
pai_method: A character string specifying the algorithm to assess. One of: helmert,tps, gam, lm, rf, svmRadial and svmLinear.
validation_type: A character string specifying the validation strategy. One of "random", "spatial", "probability", or "stratified".
k_folds: An integer for the number of folds in CV. Only used for validation_type "random" and "spatial". Defaults to 5.
train_split_ratio: A numeric value between 0 and 1. The proportion of data for the training set. Used for "probability" and "stratified" types. Defaults to 0.8.
n_strata: An integer specifying the number of strata to create for stratified sampling. Only used for validation_type = "stratified". Defaults to 4 (quartiles).
seed: An integer for setting the random seed for reproducibility.
...: Additional arguments passed to the train_pai_model function.

Value

A data frame summarizing the validation results.

Details

Model validation is crucial for understanding how well a model will generalize to new data. This function automates this process.

Validation Types:

random: Standard k-fold cross-validation.
spatial: Spatial k-fold cross-validation.
probability: Design-based validation using a single train/test split based on simple random sampling.
stratified: Design-based validation using stratified random sampling. A single train/test split is performed. Strata are created based on the quantiles of the Euclidean distance of the error vectors (dx, dy), ensuring the validation set represents all error magnitudes proportionally.

Examples

if (FALSE) { # \dontrun{
# --- 1. create a demo data set
demo_files <- create_demo_data(seed = 42)
gcp_data <- read_gcps(gcp_path = demo_files$gcp_path)

# --- 2. Assess with RANDOM k-fold CV ---
random_assessment <- assess_pai_model(
  gcp_data, pai_method = "rf", validation_type = "random", k_folds = 5
)
print(random_assessment)

# --- 3. Assess with SPATIAL k-fold CV ---
spatial_assessment <- assess_pai_model(
  gcp_data, pai_method = "rf", validation_type = "spatial", k_folds = 5
)
print(spatial_assessment)

# --- 4. Assess with PROBABILITY (simple random) sampling ---
prob_assessment <- assess_pai_model(
  gcp_data, pai_method = "rf", validation_type = "probability", train_split_ratio = 0.75
)
print(prob_assessment)

# --- 5. Assess with STRATIFIED probability sampling ---
stratified_assessment <- assess_pai_model(
  gcp_data,
  pai_method = "rf",
  validation_type = "stratified",
  train_split_ratio = 0.75,
  n_strata = 4 # Use quartiles for stratification
)
print(stratified_assessment)
} # }