Import and process antimicrobial phenotype data from common sources

This function imports an antibiotic susceptibility testing datasets in formats exported by EBI, NCBI, WHOnet and several automated AST instruments (Vitek, Microscan, Sensititre, Phoenix). It optionally can use the AMR package to interpret susceptibility phenotype (SIR) based on EUCAST or CLSI guidelines (human breakpoints and/or ECOFF).

Usage

import_pheno(
  input,
  format = "ebi",
  interpret_eucast = FALSE,
  interpret_clsi = FALSE,
  interpret_ecoff = FALSE,
  ...
)

Arguments

input

A string representing a dataframe, or a path to an input file, containing the phenotype data in a supported format. These files may be downloaded from public sources such as the EBI AMR web browser, EBI FTP site, or NCBI browser, or using the functions download_ebi(), download_ncbi_pheno(), or query_ncbi_bq_geno(); or the files may be exported from supported AST instruments.

format

A string indicating the format of the data: "ebi" (default), "ebi_web", "ebi_ftp", "ncbi", "ncbi_biosample", "vitek", "microscan", "phoenix", "sensititre", or "whonet". This determines which importer function the data is passed on to for processing (see below).

interpret_eucast

A logical value (default is FALSE). If TRUE, the function will interpret the susceptibility phenotype (SIR) for each row based on the MIC or disk diffusion values, against EUCAST human breakpoints. These will be reported in a new column pheno_eucast, of class 'sir'.

interpret_clsi

A logical value (default is FALSE). If TRUE, the function will interpret the susceptibility phenotype (SIR) for each row based on the MIC or disk diffusion values, against CLSI human breakpoints. These will be reported in a new column pheno_clsi, of class 'sir'.

interpret_ecoff

A logical value (default is FALSE). If TRUE, the function will interpret the wildtype vs nonwildtype status for each row based on the MIC or disk diffusion values, against epidemiological cut-off (ECOFF) values. These will be reported in a new column ecoff, of class 'sir' and coded as NWT (nonwildtype) or WT (wildtype).

...

Format-specific arguments. See

"ebi" : import_ebi_pheno()
"ebi_web" : import_ebi_pheno()
"ebi_ftp" :import_ebi_pheno_ftp()
"ncbi" : import_ncbi_pheno()
"ncbi_biosample" : import_ncbi_biosample()
"vitek" : import_vitek_pheno()
"microscan" : import_microscan_pheno()
"sensititre" : import_sensititre_pheno()
"phoenix" : import_phoenix_pheno()
"whonet" : import_whonet_pheno()

Value

A data frame with the processed AST data, including additional columns:

id: The sample identifier (character).
spp_pheno: The species phenotype, formatted using the AMR::as.mo() function (class mo).
drug: The antibiotic used in the test, formatted using the AMR::as.ab() function (class ab).
mic: The minimum inhibitory concentration (MIC) value, formatted using the AMR::as.mic() function (class mic).
disk: The disk diffusion measurement (in mm), formatted using the AMR::as.disk() function (class disk).
method: The AST method (e.g., "broth dilution", "disk diffusion", "Etest", "agar dilution"). Expected values are based on the NCBI/EBI antibiogram specification (character).
platform: The AST platform/instrument (e.g., "Vitek", "Phoenix", "Sensititre") (character).
guideline: The AST standard recorded in the input file as being used for the AST assay (character).
pheno_eucast: The phenotype newly interpreted against EUCAST human breakpoint standards (as S/I/R), based on the MIC or disk diffusion data (class sir).
pheno_clsi: The phenotype newly interpreted against CLSI human breakpoint standards (as S/I/R), based on the MIC or disk diffusion data (class sir).
ecoff: The phenotype newly interpreted against the ECOFF (as WT/NWT), based on the MIC or disk diffusion data (class sir).
pheno_provided: The original phenotype interpretation provided in the input file, formatted using AMR::as.sir() (class sir).
source: The source of each data point (from the publications or bioproject field in the input file, or replaced with a single value passed in as the source parameter) (character).

Examples

if (FALSE) { # \dontrun{
# import NCBI data retrieved from Google Cloud, without re-interpreting resistance
head(staph_pheno_ncbi_cloud_raw)
pheno <- import_pheno(staph_pheno_ncbi_cloud_raw, format = "ncbi")

# import NCBI data where biosample column has been renamed to 'id'
head(staph_pheno_ncbi_raw)
import_pheno(staph_pheno_ncbi_raw, "ncbi", sample_col = "id")

# import NCBI data and re-interpret resistance (S/I/R) and WT/NWT (vs ECOFF)
head(ecoli_pheno_raw)
pheno <- import_pheno(ecoli_pheno_raw,
  format = "ncbi",
  interpret_eucast = TRUE, interpret_ecoff = TRUE
)

# download Klebsiella quasipneumoniae phenotype data from NCBI BioSample
kquasi_raw_ncbi <- download_ncbi_pheno("Klebsiella quasipneumoniae")
head(kquasi_raw_ncbi)
# import the data and interpret against EUCAST breakpoints
pheno <- import_pheno(kquasi_raw_ncbi,
  format = "ncbi_biosample",
  interpret_eucast = T
)

# download Klebsiella quasipneumoniae phenotype data from EBI
kquasi_raw_ebi <- download_ebi(species = "Klebsiella quasipneumoniae")
head(kquasi_raw_ebi)
# import the data and interpret against ecoff
pheno <- import_pheno(kquasi_raw_ebi,
  format = "ebi_ftp",
  interpret_ecoff = TRUE
)

# import Vitek data from file, with default parameters
pheno <- import_pheno("vitek_export.tsv",
  format = "vitek"
)

# import Vitek data from file
# specify guideline that was used, remove dates, ignore expertized calls
pheno <- import_pheno("vitek_export.tsv",
  format = "vitek",
  instrument_guideline = "EUCAST 2025",
  use_expertized = FALSE,
  include_dates = FALSE
)
} # }