
Summarise the intersection of a genotype table and a phenotype table
Source:R/summarise.R
summarise_geno_pheno.RdCompares a genotype table and phenotype table, to summarise the number of samples present in both tables. For each drug in the phenotype table, summarise the number of phenotype observations, and details of genotype markers detected for the associated drug class/es.
Usage
summarise_geno_pheno(
geno_table,
pheno_table,
geno_sample_col = NULL,
pheno_sample_col = NULL,
pheno_cols,
drug_col = "drug",
class_col = "drug_class",
marker_col = "marker",
gene_col = "gene",
variation_col = "variation type",
mic_col = "mic",
disk_col = "disk",
spp_col = "spp_pheno",
method_cols = c("method", "platform", "guideline", "source")
)Arguments
- geno_table
A tibble or data frame containing genotype data, in the format output by import_amrfp.
- pheno_table
A tibble or data frame containing phenotype data, in the format output by import_pheno.
- geno_sample_col
Character. Name of the column in the genotype table containing sample identifiers. Default is
"Name".- pheno_sample_col
Character. Name of the column in the phenotype table containing sample identifiers. Default is
"id".- pheno_cols
Vector. Vector giving names of columns in the phenotype table containing categorical phenotype calls (S/I/R or NWT/WT). Default is any columns beginning with
"pheno"or"ecoff".- drug_col
Character. Name of the column in the phenotype table containing drug agent identifiers. Default is
"drug". Entries will be annotated with their full antibiotic names, converted usingAMR::as.ab(). If the genotype table contains a column indicating individual agents it should share this same name.- class_col
Character. Name of the column in the genotype table containing drug class identifiers. Default is
"drug_class". This should be antibiotic group names understood by the AMR pkg, as perdrug_colcolumns generated by theimport_geno()functions.- marker_col
Character. Name of the column in the genotype table containing marker identifiers. Default is
"marker".- gene_col
Character. Name of the column in the genotype table containing gene identifiers. Default is
"gene".- variation_col
Character. Name of the column in the genotype table containing variation type identifiers. Default is
"variation type".- mic_col
Character. Name of the column in the phenotype table containing MIC measurements Default is
"mic".- disk_col
Character. Name of the column in the phenotype table containing disk diffusion zone measurements. Default is
"disk".- spp_col
Character. Name of the column in the phenotype table containing species names. Default is
"spp_pheno".- method_cols
Vector. Vector giving names of columns in the phenotype table containing method or source information by which to summarise MIC/disk data. Default is
c("method", "platform", "guideline", "source").
Value
A named list with the following elements:
drugs_with_pheno: A tibble listing the drugs included in the table, and the associated number of samples with MIC measures, disk measures, neither or both, for each drug; restricted to samples that also appear in the genotype table.geno_hits: A tibble listing the drugs and/or drug classes corresponding to drugs with phenotypes, and the associated number of unique markers, unique samples, and total hits for each drug/class amongst samples with corresponding phenotype data.geno_markers: A tibble listing the genotypic markers in the genotype table corresponding to drugs with phenotypes, and the associated drugs/classes and variation types (if present). Number indicates the count of hits detected per marker, amongst samples with corresponding phenotype data.pheno_counts_list: A list of tibbles, each corresponding to a unique categorical phenotype column in the input, indicating the counts of each phenotypic category per drug and species; restricted to samples that also appear in the genotype table.
Details
The function automatically adapts to the presence or absence of columns in pheno_table.
The force_ab parameter allows the addition of full antibiotic names using the ab_name() function even when the first column is not recognized as an "ab" object.
Examples
staph_geno_pheno <- summarise_geno_pheno(staph_geno_ebi, staph_pheno_ebi,
pheno_cols = c("pheno_clsi", "pheno_provided")
)
staph_geno_pheno
#> $overlapping_samples
#> [1] 190
#>
#> $drugs_with_pheno
#> # A tibble: 2 × 8
#> drug n drug_class drug_name spp_pheno disk mic none
#> <ab> <int> <chr> <chr> <chr> <int> <int> <int>
#> 1 AMK 84 Aminoglycosides Amikacin Staphylococcus aureus 3 11 70
#> 2 DOX 134 Tetracyclines Doxycycline Staphylococcus aureus NA 47 87
#>
#> $geno_hits
#> # A tibble: 7 × 6
#> drug drug_name drug_class markers samples hits
#> <ab> <chr> <chr> <int> <int> <int>
#> 1 AMK Amikacin Aminoglycosides 2 37 170
#> 2 GEN Gentamicin Aminoglycosides 1 27 108
#> 3 KAN Kanamycin Aminoglycosides 2 37 170
#> 4 STR1 Streptomycin Aminoglycosides 1 15 15
#> 5 TOB Tobramycin Aminoglycosides 1 27 108
#> 6 TGC Tigecycline Tetracyclines 1 116 116
#> 7 NA NA Tetracyclines 4 116 144
#>
#> $geno_markers
#> # A tibble: 12 × 5
#> marker drug drug_name drug_class n
#> <chr> <ab> <chr> <chr> <int>
#> 1 aac(6')-Ie/aph(2'')-Ia AMK Amikacin Aminoglycosides 108
#> 2 aac(6')-Ie/aph(2'')-Ia GEN Gentamicin Aminoglycosides 108
#> 3 aac(6')-Ie/aph(2'')-Ia KAN Kanamycin Aminoglycosides 108
#> 4 aac(6')-Ie/aph(2'')-Ia TOB Tobramycin Aminoglycosides 108
#> 5 ant(6)-Ia STR1 Streptomycin Aminoglycosides 15
#> 6 aph(3')-IIIa AMK Amikacin Aminoglycosides 62
#> 7 aph(3')-IIIa KAN Kanamycin Aminoglycosides 62
#> 8 mepA TGC Tigecycline Tetracyclines 116
#> 9 tet(38) NA NA Tetracyclines 116
#> 10 tet(K) NA NA Tetracyclines 14
#> 11 tet(L) NA NA Tetracyclines 1
#> 12 tet(M) NA NA Tetracyclines 13
#>
#> $pheno_counts_list
#> $pheno_counts_list$pheno_clsi
#> # A tibble: 2 × 7
#> drug drug_name spp_pheno `NA` S I R
#> <ab> <chr> <chr> <int> <int> <int> <int>
#> 1 AMK Amikacin Staphylococcus aureus 84 NA NA NA
#> 2 DOX Doxycycline Staphylococcus aureus 87 41 5 1
#>
#> $pheno_counts_list$pheno_provided
#> # A tibble: 2 × 6
#> drug drug_name spp_pheno S R I
#> <ab> <chr> <chr> <int> <int> <int>
#> 1 AMK Amikacin Staphylococcus aureus 12 72 NA
#> 2 DOX Doxycycline Staphylococcus aureus 103 20 11
#>
#>