Skip to contents

Compares a genotype table and phenotype table, to summarise the number of samples present in both tables. For each drug in the phenotype table, summarise the number of phenotype observations, and details of genotype markers detected for the associated drug class/es.

Usage

summarise_geno_pheno(
  geno_table,
  pheno_table,
  geno_sample_col = NULL,
  pheno_sample_col = NULL,
  pheno_cols,
  drug_col = "drug",
  class_col = "drug_class",
  marker_col = "marker",
  gene_col = "gene",
  variation_col = "variation type",
  mic_col = "mic",
  disk_col = "disk",
  spp_col = "spp_pheno",
  method_cols = c("method", "platform", "guideline", "source")
)

Arguments

geno_table

A tibble or data frame containing genotype data, in the format output by import_amrfp.

pheno_table

A tibble or data frame containing phenotype data, in the format output by import_pheno.

geno_sample_col

Character. Name of the column in the genotype table containing sample identifiers. Default is "Name".

pheno_sample_col

Character. Name of the column in the phenotype table containing sample identifiers. Default is "id".

pheno_cols

Vector. Vector giving names of columns in the phenotype table containing categorical phenotype calls (S/I/R or NWT/WT). Default is any columns beginning with "pheno" or "ecoff".

drug_col

Character. Name of the column in the phenotype table containing drug agent identifiers. Default is "drug". Entries will be annotated with their full antibiotic names, converted using AMR::as.ab(). If the genotype table contains a column indicating individual agents it should share this same name.

class_col

Character. Name of the column in the genotype table containing drug class identifiers. Default is "drug_class". This should be antibiotic group names understood by the AMR pkg, as per drug_col columns generated by the import_geno() functions.

marker_col

Character. Name of the column in the genotype table containing marker identifiers. Default is "marker".

gene_col

Character. Name of the column in the genotype table containing gene identifiers. Default is "gene".

variation_col

Character. Name of the column in the genotype table containing variation type identifiers. Default is "variation type".

mic_col

Character. Name of the column in the phenotype table containing MIC measurements Default is "mic".

disk_col

Character. Name of the column in the phenotype table containing disk diffusion zone measurements. Default is "disk".

spp_col

Character. Name of the column in the phenotype table containing species names. Default is "spp_pheno".

method_cols

Vector. Vector giving names of columns in the phenotype table containing method or source information by which to summarise MIC/disk data. Default is c("method", "platform", "guideline", "source").

Value

A named list with the following elements:

  • drugs_with_pheno: A tibble listing the drugs included in the table, and the associated number of samples with MIC measures, disk measures, neither or both, for each drug; restricted to samples that also appear in the genotype table.

  • geno_hits: A tibble listing the drugs and/or drug classes corresponding to drugs with phenotypes, and the associated number of unique markers, unique samples, and total hits for each drug/class amongst samples with corresponding phenotype data.

  • geno_markers: A tibble listing the genotypic markers in the genotype table corresponding to drugs with phenotypes, and the associated drugs/classes and variation types (if present). Number indicates the count of hits detected per marker, amongst samples with corresponding phenotype data.

  • pheno_counts_list: A list of tibbles, each corresponding to a unique categorical phenotype column in the input, indicating the counts of each phenotypic category per drug and species; restricted to samples that also appear in the genotype table.

Details

The function automatically adapts to the presence or absence of columns in pheno_table. The force_ab parameter allows the addition of full antibiotic names using the ab_name() function even when the first column is not recognized as an "ab" object.

Examples

staph_geno_pheno <- summarise_geno_pheno(staph_geno_ebi, staph_pheno_ebi,
  pheno_cols = c("pheno_clsi", "pheno_provided")
)
staph_geno_pheno
#> $overlapping_samples
#> [1] 190
#> 
#> $drugs_with_pheno
#> # A tibble: 2 × 8
#>   drug     n drug_class      drug_name   spp_pheno              disk   mic  none
#>   <ab> <int> <chr>           <chr>       <chr>                 <int> <int> <int>
#> 1 AMK     84 Aminoglycosides Amikacin    Staphylococcus aureus     3    11    70
#> 2 DOX    134 Tetracyclines   Doxycycline Staphylococcus aureus    NA    47    87
#> 
#> $geno_hits
#> # A tibble: 7 × 6
#>   drug drug_name    drug_class      markers samples  hits
#>   <ab> <chr>        <chr>             <int>   <int> <int>
#> 1 AMK  Amikacin     Aminoglycosides       2      37   170
#> 2 GEN  Gentamicin   Aminoglycosides       1      27   108
#> 3 KAN  Kanamycin    Aminoglycosides       2      37   170
#> 4 STR1 Streptomycin Aminoglycosides       1      15    15
#> 5 TOB  Tobramycin   Aminoglycosides       1      27   108
#> 6 TGC  Tigecycline  Tetracyclines         1     116   116
#> 7 NA   NA           Tetracyclines         4     116   144
#> 
#> $geno_markers
#> # A tibble: 12 × 5
#>    marker                 drug drug_name    drug_class          n
#>    <chr>                  <ab> <chr>        <chr>           <int>
#>  1 aac(6')-Ie/aph(2'')-Ia AMK  Amikacin     Aminoglycosides   108
#>  2 aac(6')-Ie/aph(2'')-Ia GEN  Gentamicin   Aminoglycosides   108
#>  3 aac(6')-Ie/aph(2'')-Ia KAN  Kanamycin    Aminoglycosides   108
#>  4 aac(6')-Ie/aph(2'')-Ia TOB  Tobramycin   Aminoglycosides   108
#>  5 ant(6)-Ia              STR1 Streptomycin Aminoglycosides    15
#>  6 aph(3')-IIIa           AMK  Amikacin     Aminoglycosides    62
#>  7 aph(3')-IIIa           KAN  Kanamycin    Aminoglycosides    62
#>  8 mepA                   TGC  Tigecycline  Tetracyclines     116
#>  9 tet(38)                NA   NA           Tetracyclines     116
#> 10 tet(K)                 NA   NA           Tetracyclines      14
#> 11 tet(L)                 NA   NA           Tetracyclines       1
#> 12 tet(M)                 NA   NA           Tetracyclines      13
#> 
#> $pheno_counts_list
#> $pheno_counts_list$pheno_clsi
#> # A tibble: 2 × 7
#>   drug drug_name   spp_pheno              `NA`     S     I     R
#>   <ab> <chr>       <chr>                 <int> <int> <int> <int>
#> 1 AMK  Amikacin    Staphylococcus aureus    84    NA    NA    NA
#> 2 DOX  Doxycycline Staphylococcus aureus    87    41     5     1
#> 
#> $pheno_counts_list$pheno_provided
#> # A tibble: 2 × 6
#>   drug drug_name   spp_pheno                 S     R     I
#>   <ab> <chr>       <chr>                 <int> <int> <int>
#> 1 AMK  Amikacin    Staphylococcus aureus    12    72    NA
#> 2 DOX  Doxycycline Staphylococcus aureus   103    20    11
#> 
#>