Compares genotypes (presence of resistance markers) to observed phenotypes
(resistant vs susceptible) using a binary matrix from get_binary_matrix().
A genotypic prediction variable is defined on the basis of presence of
genotype markers (either any marker in the input table, or those defined by
an input inclusion list or exclusion list). This genotypic prediction is then
compared to the observed phenotypes using standard classification metrics
(via the yardstick pkg) and AMR-specific error rates (major error, ME
and very major error, VME) per ISO 20776-2 (and see
FDA definitions).
Supports evaluating both R and NWT outcomes
in a single call, with flexible prediction rules and marker inclusion options.
Usage
concordance(
binary_matrix,
markers = NULL,
exclude_markers = NULL,
ppv_threshold = NULL,
ppv_results = NULL,
truth = c("R", "NWT"),
prediction_rule = "any",
min_count = NULL,
logreg_results = NULL,
pval_threshold = NULL
)Arguments
- binary_matrix
A data frame output by
get_binary_matrix(), containing one row per sample, columns indicating binary phenotypes (R,I,NWT) and binary marker presence/absence.- markers
A character vector of marker column names to include in a new binary outcome variable
genotypic_prediction. DefaultNULLincludes all marker columns.- exclude_markers
A character vector of marker column names to exclude from the genotypic prediction. Applied after
markersfiltering.- ppv_threshold
A numeric PPV threshold (0-1). Markers with solo PPV below this value are excluded. Requires
ppv_results.- ppv_results
Output of
solo_ppv_analysis()orppv(), used for solo PPV-based marker filtering whenppv_thresholdis set. When the output ofppv()is supplied, single-marker entries (marker_count == 1) in thesummarytable are used to obtain per-marker PPVs, making the behaviour equivalent to supplyingsolo_ppv_analysis()output directly.- truth
A character vector specifying the phenotypic truth column(s) to evaluate:
"R"(resistant vs susceptible/intermediate),"NWT"(non-wildtype vs wildtype), orc("R", "NWT")(default) to evaluate both.- prediction_rule
The rule for generating genotypic predictions:
"any"(default) predicts positive if any marker is present (after applying filters as specified bymarkers,exclude_markers,ppv_threshold,pval_threshold,min_count);"all"predicts positive only if all markers are present (after applying filters as specified bymarkers,exclude_markers,ppv_threshold,pval_threshold,min_count); a positive integer predicts positive if at least that many markers are present (after applying filters as specified bymarkers,exclude_markers,ppv_threshold,pval_threshold,min_count);"logistic"uses a logistic regression model fromlogreg_resultsto predict outcomes for each sample.- min_count
An integer or
NULL. Exclude markers with total frequency (column sum in binary_matrix) below this value. DefaultNULL(no filtering).- logreg_results
Output of
amr_logistic(). Used for p-value filtering (whenpval_thresholdis set) and forprediction_rule = "logistic".- pval_threshold
A numeric p-value threshold. Exclude markers with logistic regression p-value >= this value. Requires
logreg_results.
Value
An S3 object of class "amr_concordance", a list containing:
conf_mat: Named list of yardstick confusion matrix objects (e.g.list(R = <cm>, NWT = <cm>)).metrics: A tibble with columnsoutcome,metric,estimate.data: The input binary matrix with addedR_predand/orNWT_predcolumns.markers_used: Named list of character vectors of markers used per outcome.truth_col: Character vector of truth columns evaluated.n: Named integer vector of sample counts per outcome.prediction_rule: The prediction rule used.
Details
The function identifies marker columns as all columns not in the reserved set
(id, pheno, ecoff, R, I, NWT, mic, disk). It then applies
filtering in order: inclusion list markers, exclusion list exclude_markers, min_count,
ppv_threshold filtering, then pval_threshold filtering.
Marker filtering is performed per outcome (R, NWT), since PPV-based and p-value-based filters are category/outcome-specific.
Standard metrics (sensitivity, specificity, PPV, NPV, accuracy, kappa,
F-measure) are calculated using pkg yardstick. AMR-specific error rates are
computed internally:
VME (Very Major Error): FN / (TP + FN) = 1 - sensitivity. Proportion of truly resistant isolates not predicted as such from genotype.
ME (Major Error): FP / (TN + FP) = 1 - specificity. Proportion of truly susceptible isolates incorrectly predicted resistant from genotype.
Examples
if (FALSE) { # \dontrun{
geno_table <- import_amrfp(ecoli_geno_raw, "Name")
binary_matrix <- get_binary_matrix(
geno_table = geno_table,
pheno_table = ecoli_ast,
antibiotic = "Ciprofloxacin",
drug_class_list = c("Quinolones"),
sir_col = "pheno_clsi"
)
# Basic concordance for both R and NWT
result <- concordance(binary_matrix)
result
# Single outcome
result <- concordance(binary_matrix, truth = "R")
# Exclude specific markers
result <- concordance(binary_matrix, exclude_markers = c("qnrS1"))
# Filter markers by solo PPV threshold, using solo_ppv_analysis() output
solo_ppv <- solo_ppv_analysis(binary_matrix = binary_matrix)
result <- concordance(
binary_matrix,
ppv_threshold = 0.5,
ppv_results = solo_ppv
)
# Filter markers by solo PPV threshold, using ppv() output
ppv_res <- ppv(binary_matrix = binary_matrix)
result <- concordance(
binary_matrix,
ppv_threshold = 0.5,
ppv_results = ppv_res
)
# Require at least 2 markers present for prediction
result <- concordance(binary_matrix, prediction_rule = 2)
# Use logistic regression model for prediction
logreg <- amr_logistic(binary_matrix = binary_matrix)
result <- concordance(
binary_matrix,
prediction_rule = "logistic",
logreg_results = logreg
)
# Access components
result$conf_mat$R
result$metrics
result$markers_used
} # }
