A S4 object holds the barcode data and samples' metadata. A set of operations can be applied to the BarcodeObj object for quality control and selecting barcodes/samples subset.

Value

A BarcodeObj object.

Details

The BarcodeObj object is a S4 object, it has three slots, which can be access by "@" operator, they are messyBc, cleanBc and metadata. A BarcodeObj object can be generated by bc_extract function. The bc_extract function can use various data types as input, such as data.frame, fastq files, or ShortReadQ.

Slot messyBc is a list that holds the raw barcodes sequence without filtering, where each element is a data.table corresponding to the successive samples. Each table has 3 columns: 1. umi_seq (optional): UMI sequence. 2. barcode_seq: barcode sequence. 3. count: how many reads a full sequence has. In this table, barcode_seq value can be duplicated, as two different full read sequences can have the same barcode sequence, due to the diversity of the UMI or mutations in the constant region.

Slot cleanBc is a list holds the barcodes sequence after filtering, where each element is a data.table corresponding to the successive samples. The "cleanBc" slot contains 2 columns 1. barcode_seq: barcode sequence 2. counts: reads count, or UMI count if the cleanBc was created by bc_cure_umi.

Examples


#######
# Create BarcodeObj with fastq file
fq_file <- system.file("extdata", "simple.fq", package="CellBarcode")
library(ShortRead)
#> Loading required package: BiocGenerics
#> 
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:CellBarcode’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:stats’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: BiocParallel
#> Loading required package: Biostrings
#> Loading required package: S4Vectors
#> Loading required package: stats4
#> 
#> Attaching package: ‘S4Vectors’
#> The following objects are masked from ‘package:data.table’:
#> 
#>     first, second
#> The following object is masked from ‘package:utils’:
#> 
#>     findMatches
#> The following objects are masked from ‘package:base’:
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> 
#> Attaching package: ‘IRanges’
#> The following object is masked from ‘package:data.table’:
#> 
#>     shift
#> Loading required package: XVector
#> Loading required package: GenomeInfoDb
#> 
#> Attaching package: ‘Biostrings’
#> The following object is masked from ‘package:base’:
#> 
#>     strsplit
#> Loading required package: Rsamtools
#> Loading required package: GenomicRanges
#> Loading required package: GenomicAlignments
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: ‘matrixStats’
#> The following object is masked from ‘package:CellBarcode’:
#> 
#>     count
#> 
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#> 
#>     rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     anyMissing, rowMedians
#> 
#> Attaching package: ‘GenomicAlignments’
#> The following object is masked from ‘package:data.table’:
#> 
#>     last
#> 
#> Attaching package: ‘ShortRead’
#> The following object is masked from ‘package:data.table’:
#> 
#>     tables
bc_extract(fq_file, pattern = "AAAAA(.*)CCCCC")
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 2 field(s) available:
#> raw_read_count  barcode_read_count
#> ----------
#> @messyBc: 1 sample(s) for raw barcodes:
#>     In sample $simple.fq there are: 1 Tags 

#######
# data manipulation on BarcodeObj object
data(bc_obj)

bc_obj
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 3 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 10 Tags
#>     In sample $test2 there are: 9 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 4 barcodes
#>     In sample $test2 there are: 5 barcodes 

# Select barcodes
bc_subset(bc_obj, barcode = c("AACCTT", "AACCTT"))
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 3 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 0 Tags
#>     In sample $test2 there are: 0 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 0 barcodes
#>     In sample $test2 there are: 0 barcodes 
bc_obj[c("AGAG", "AAAG"), ]
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 3 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 6 Tags
#>     In sample $test2 there are: 5 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 2 barcodes
#>     In sample $test2 there are: 2 barcodes 

# Select samples by metadata
bc_meta(bc_obj)$phenotype <- c("l", "b")
bc_meta(bc_obj)
#>       raw_read_count barcode_read_count depth_cutoff phenotype
#> test1            184                178            5         l
#> test2            124                118            5         b
bc_subset(bc_obj, sample = phenotype == "l")
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 4 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff  phenotype
#> ----------
#> @messyBc: 1 sample(s) for raw barcodes:
#>     In sample $test1 there are: 10 Tags
#> ----------
#> @cleanBc: 1 samples for cleaned barcodes
#>     In sample $test1 there are: 4 barcodes 

# Select samples by sample name
bc_obj[, "test1"]
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 4 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff  phenotype
#> ----------
#> @messyBc: 1 sample(s) for raw barcodes:
#>     In sample $test1 there are: 10 Tags
#> ----------
#> @cleanBc: 1 samples for cleaned barcodes
#>     In sample $test1 there are: 4 barcodes 
bc_obj[, c("test1", "test2")]
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 4 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff  phenotype
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 10 Tags
#>     In sample $test2 there are: 9 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 4 barcodes
#>     In sample $test2 there are: 5 barcodes 
bc_subset(bc_obj, sample = "test1", barcode = c("AACCTT", "AACCTT"))
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 4 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff  phenotype
#> ----------
#> @messyBc: 1 sample(s) for raw barcodes:
#>     In sample $test1 there are: 0 Tags
#> ----------
#> @cleanBc: 1 samples for cleaned barcodes
#>     In sample $test1 there are: 0 barcodes 

# Apply barcodes blacklist
bc_subset(
bc_obj,
    sample = c("test1", "test2"),
    barcode = c("AACCTT"))
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 4 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff  phenotype
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 0 Tags
#>     In sample $test2 there are: 0 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 0 barcodes
#>     In sample $test2 there are: 0 barcodes 

# Join two samples with no barcodes overlap
bc_obj["AGAG", "test1"] + bc_obj["AAAG", "test2"]
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 8 field(s) available:
#> raw_read_count.bc_obj["AGAG", "test1"]  barcode_read_count.bc_obj["AGAG", "test1"]  depth_cutoff.bc_obj["AGAG", "test1"]  phenotype.bc_obj["AGAG", "test1"]  raw_read_count.bc_obj["AAAG", "test2"]  barcode_read_count.bc_obj["AAAG", "test2"]  depth_cutoff.bc_obj["AAAG", "test2"]  phenotype.bc_obj["AAAG", "test2"]
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 3 Tags
#>     In sample $test2 there are: 2 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 1 barcodes
#>     In sample $test2 there are: 1 barcodes 

# Join two samples with overlap barcodes
bc_obj_join <- bc_obj["AGAG", "test1"] + bc_obj["AGAG", "test2"]
bc_obj_join
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 8 field(s) available:
#> raw_read_count.bc_obj["AGAG", "test1"]  barcode_read_count.bc_obj["AGAG", "test1"]  depth_cutoff.bc_obj["AGAG", "test1"]  phenotype.bc_obj["AGAG", "test1"]  raw_read_count.bc_obj["AGAG", "test2"]  barcode_read_count.bc_obj["AGAG", "test2"]  depth_cutoff.bc_obj["AGAG", "test2"]  phenotype.bc_obj["AGAG", "test2"]
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 3 Tags
#>     In sample $test2 there are: 3 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 1 barcodes
#>     In sample $test2 there are: 1 barcodes 
# The same barcode will be merged after applying bc_cure_depth()
bc_cure_depth(bc_obj_join)
#> ------------
#> bc_cure_depth: isUpdate is TRUE, update the cleanBc.
#> ------------
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 9 field(s) available:
#> raw_read_count.bc_obj["AGAG", "test1"]  barcode_read_count.bc_obj["AGAG", "test1"]  depth_cutoff.bc_obj["AGAG", "test1"]  phenotype.bc_obj["AGAG", "test1"]  raw_read_count.bc_obj["AGAG", "test2"]  barcode_read_count.bc_obj["AGAG", "test2"]  depth_cutoff.bc_obj["AGAG", "test2"]  phenotype.bc_obj["AGAG", "test2"]  depth_cutoff
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 3 Tags
#>     In sample $test2 there are: 3 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 1 barcodes
#>     In sample $test2 there are: 1 barcodes 

# Remove barcodes
bc_obj
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 4 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff  phenotype
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 10 Tags
#>     In sample $test2 there are: 9 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 4 barcodes
#>     In sample $test2 there are: 5 barcodes 
bc_obj - "AAAG"
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 4 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff  phenotype
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 7 Tags
#>     In sample $test2 there are: 7 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 3 barcodes
#>     In sample $test2 there are: 4 barcodes 

# Select barcodes in a white list
bc_obj
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 4 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff  phenotype
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 10 Tags
#>     In sample $test2 there are: 9 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 4 barcodes
#>     In sample $test2 there are: 5 barcodes 
bc_obj * "AAAG"
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 4 field(s) available:
#> raw_read_count  barcode_read_count  depth_cutoff  phenotype
#> ----------
#> @messyBc: 2 sample(s) for raw barcodes:
#>     In sample $test1 there are: 3 Tags
#>     In sample $test2 there are: 2 Tags
#> ----------
#> @cleanBc: 2 samples for cleaned barcodes
#>     In sample $test1 there are: 1 barcodes
#>     In sample $test2 there are: 1 barcodes 
###