This function generates a plot showing summaries of phenotype results (assay distributions, phenotype category percentages, and/or predictive value for phenotype) for each combination of markers observed in the data.
Usage
amr_ppv(
binary_matrix = NULL,
min_set_size = 2,
order = "ppv",
geno_table,
pheno_table,
pheno_drug = NULL,
geno_class = NULL,
geno_drug = NULL,
geno_sample_col = NULL,
pheno_sample_col = NULL,
sir_col = NULL,
ecoff_col = "ecoff",
marker_col = "marker",
colours_ppv = c(R = "maroon", NWT = "navy"),
SIR_col = c(S = "#3CAEA3", I = "#F6D55C", R = "#ED553B"),
upset_grid = FALSE,
marker_label_space = NULL,
plot_category = TRUE,
print_category_counts = TRUE,
plot_ppv = TRUE,
plot_assay = FALSE,
assay = NULL,
boxplot_col = "grey",
species = NULL,
bp_site = NULL,
guideline = "EUCAST 2025",
bp_S = NULL,
bp_R = NULL,
ecoff_bp = NULL,
pd = position_dodge(width = 0.8),
marker_order = NULL,
plot_title = NULL,
plot_subtitle = NULL
)Arguments
- binary_matrix
A data frame containing the original binary matrix output from the
get_binary_matrix()function. If not provided (or set toNULL), user must specifygeno_table,pheno_table,pheno_drug, and optionallygeno_class,geno_drug,geno_sample_col,pheno_sample_col,sir_col,ecoff_col,marker_colto pass toget_binary_matrix().- min_set_size
An integer specifying the minimum size for a gene set to be included in the analysis and plots. Default is 2. Only marker combinations with at least this number of occurrences are included in the plots.
- order
A character string indicating the order of the combinations on the x-axis. Options are:
"": decreasing frequency of combinations"genes": order by the number of genes in each combination"value": order by the median assay value (MIC or disk zone) for each combination."ppv"(default): order by the PPV estimated for each combination
- geno_table
(Required if
binary_matrixnot provided) A data frame containing genotype data, formatted withimport_amrfp(). Only used ifbinary_matrixnot provided.- pheno_table
(Required if
binary_matrixnot provided) A data frame containing phenotype data, formatted withimport_pheno(). Only used ifbinary_matrixnot provided.- pheno_drug
(Required if
binary_matrixnot provided) A character string specifying the drug of interest to filter phenotype data. The value must match one of the entries in thedrugcolumn ofpheno_tableor be coercible to a match using AMR::as.ab.- geno_class
(Optional if
binary_matrixnot provided) A character vector of drug classes to filter genotype markers. Markers ingeno_tablewill be filtered based on whether theirdrug_classmatches any value in this list. If not provided, the AMR pkg is used to check what class name/s are associated withpheno_drugand uses those (these are printed to screen so the user can see what is being filtered).- geno_drug
(Optional if
binary_matrixnot provided) A character vector of drug names whose relevant genotype markers should be included.- geno_sample_col
(Optional) A character string specifying the column name in
geno_tablecontaining sample identifiers. Defaults toNULL, in which case it is assumed the first column contains identifiers. Only used ifbinary_matrixnot provided.- pheno_sample_col
(Optional) A character string specifying the column name in
pheno_tablecontaining sample identifiers. Defaults toNULL, in which case it is assumed the first column contains identifiers. Only used ifbinary_matrixnot provided.- sir_col
A character string specifying the column name in
pheno_tablethat contains the resistance interpretation (SIR) data. The values should be"S","I","R"or otherwise interpretable byAMR::as.sir(). If not provided, the first column prefixed with "phenotype*" will be used if present, otherwise an error is thrown. Only used ifbinary_matrixnot provided.- ecoff_col
A character string specifying the column name in
pheno_tablethat contains resistance interpretations (SIR) made against the ECOFF rather than a clinical breakpoint. The values should be"S","I","R"or otherwise interpretable byAMR::as.sir(). Defaultecoff. Set toNULLif not available. Only used ifbinary_matrixnot provided.- marker_col
A character string specifying the column name in
geno_tablecontaining the marker identifiers. Default"marker". Only used ifbinary_matrixnot provided.- colours_ppv
A named vector of colours for the plot of PPV estimates. The names should be
"R","I"and"NWT", and the values should be valid colour names or hexadecimal colour codes.- SIR_col
A named vector of colours for the percentage bar plot. The names should be the phenotype categories (e.g.,
"R","I","S"), and the values should be valid colour names or hexadecimal colour codes. Default values are those used in the AMR packageAMR::scale_colour_sir().- upset_grid
Logical indicating whether to show marker combinations as an upset plot-style grid (default
FALSE, so that each row is instead labelled with a printed list of markers).- marker_label_space
Relative width of plotting area to provide to the marker list/grid. (Default
NULL, which results in a default value of 3 whenupset_grid=FALSEand 1 otherwise).- plot_category
Logical indicating whether to include a stacked bar plot showing, for each marker combination, the proportion of samples with each phenotype classification (specified by the
phenocolumn in the input file). Default isTRUE.- print_category_counts
Logical indicating whether, if
plot_category=TRUE, to print the number of strains in each resistance category for each marker combination in the plot. Default isFALSE.- plot_ppv
Logical indicating whether to plot the estimates for positive predictive value, for each marker combination (default
TRUE).- plot_assay
Logical indicating whether to plot the distribution of MIC/disk assay values, for each marker combination (default
FALSE). Note you must also indicate which assay column to plot ("mic"or"disk") viaassay.- assay
A character string indicating whether to plot MIC or disk diffusion data. Must be one of:
NULL: (default) if no assay data is to be plotted"mic": plot MIC data stored in columnmic"disk": plot disk diffusion data stored in columndisk
- boxplot_col
Colour for lines of the box plots summarising the MIC distribution for each marker combination. Default is
"grey".- species
(Optional) Species name used to retrieve clinical breakpoints for annotation of the assay distribution plot.
- bp_site
(Optional) Breakpoint site (e.g. "Non-meningitis") used when retrieving clinical breakpoints.
- guideline
Guideline used for breakpoint lookup. Default is
"EUCAST 2025".- bp_S
(Optional) S breakpoint to add to assay distribution plot (numerical).
- bp_R
(Optional) R breakpoint to add to assay distribution plot (numerical).
- ecoff_bp
(Optional) ECOFF breakpoint to add to assay distribution plot (numerical).
- pd
A
ggplot2::position_dodge()object controlling horizontal spacing of points and confidence intervals in the PPV plot. Default isposition_dodge(width = 0.8).- marker_order
(optional) A character string or vector indicating the order of the marker rows in the UpSet grid. Options are:
"freq"orNULL(default): order markers by decreasing frequency"alpha": order markers alphabeticallycharacter vector: vector of markers in the order in which they should appear
- plot_title
(Optional) A character string specifying a title for the plot. Default
NULL, in which case a default title is constructed of the formpaste(pheno_drug, "phenotypes"). Set to""` to remove title entirely.- plot_subtitle
(Optional) A character string specifying a subtitle for the plot. Default
NULL, in which case default subtitle will be constructed:paste("vs markers for", paste0(geno_class, collapse = ", ")). Set to""to remove subtitle entirely.
Value
A list containing the following elements:
plot: A grid of the requested plotsbinary_matrix: A copy of the genotype-phenotype binary matrix (either provided as input or generated by the function)summary: A data frame summarizing each marker combination observed, including number of resistant isolates, positive predictive values, and median assay values (and interquartile range) where relevant.
Examples
if (FALSE) { # \dontrun{
ecoli_geno <- import_amrfp(ecoli_geno_raw, "Name")
# Generate binary matrix
binary_matrix <- get_binary_matrix(
geno_table = ecoli_geno,
pheno_table = ecoli_pheno,
pheno_drug = "Ciprofloxacin",
geno_class = c("Quinolones"),
sir_col = "pheno_clsi",
keep_assay_values = TRUE,
keep_assay_values_from = "mic"
)
# Run ppv analysis using this binary_matrix
ppv <- amr_ppv(binary_matrix)
# Alternatively, generate binary matrix and run ppv() in one step
ppv <- amr_ppv(
geno_table = ecoli_geno,
pheno_table = ecoli_pheno,
pheno_drug = "Ciprofloxacin",
sir_col = "pheno_clsi"
)
} # }
