bc_extract_sc_sam can extract cellular barcode, UMI, and lineage barcode sequences from 10X Genomics scRNASeq sam file (or bam file have similar data structure). This function can not process bam file directly, users need to uncompress the bam file to get a sam file to run this function See example.

bc_extract_sc_sam(sam, pattern, cell_barcode_tag = "CR", umi_tag = "UR")

bc_extract_sc_bam(bam, pattern, cell_barcode_tag = "CR", umi_tag = "UR")

Arguments

sam

A string, define the un-mapped sequences

pattern

A string, define the regular expression to match the barcode sequence. The barcode sequence should be in the first catch. Please see the documents of bc_extract and example for more information.

cell_barcode_tag

A string, define the tag of cellular barcode field in sam file. The default is "CR".

umi_tag

A string, define the tag of a UMI field in the sam file.

bam

A string, define the bam file, it will be converted to sam file

Value

A BarcodeObj object with each cell as a sample.

Details

Although the function `bc_extract_sc_bam` can process bam file directly, some optimization is still working on, it will be much more efficient to use `samtools` to get the sam file.

What's more, if the barcode sequence does not map to the reference genome. The user should use the samtools to get the un-mapped reads and save it as sam format for using as the input. It can save a lot of time. The way to get the un-mapped reads:


samtools view -f 4 input.bam > output.sam 

Examples

## NOT run
# In the case that when the barcode sequence is not mapped to 
# reference genome, it will be much more efficient to get 
# the un-mapped sequences as the input.

## Get un-mapped reads
# samtools view -f 4 input.bam > scRNASeq_10X.sam 

sam_file <- system.file("extdata", "scRNASeq_10X.sam", package = "CellBarcode")

bc_extract_sc_sam(
  sam = sam_file,
  pattern = "AGATCAG(.*)TGTGGTA",
  cell_barcode_tag = "CR",
  umi_tag = "UR"
)
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 2 field(s) available:
#> raw_read_count  barcode_read_count
#> ----------
#> @messyBc: 49 sample(s) for raw barcodes:
#>     In sample $AAACGAAAGTTCATGC there are: 1 Tags
#>     In sample $AAACGCTAGCTGACCC there are: 1 Tags
#>     In sample $AAAGGATAGTGTGTTC there are: 1 Tags
#>     In sample $AAAGGGCAGCCTGAGA there are: 1 Tags
#>     In sample $AAAGGTAAGGCCCACT there are: 1 Tags
#>     In sample $AAAGTCCCACTGCGAC there are: 1 Tags
#>     In sample $AAAGTGATCTGACGCG there are: 1 Tags
#>     In sample $AACACACCAATCCAGT there are: 1 Tags
#>     In sample $AACACACTCACCGACG there are: 1 Tags
#>     In sample $AACCCAAGGTAATTTA there are: 1 Tags
#>     In sample $AACCTTTGTCATTCCC there are: 1 Tags
#>     In sample $AACGAAATCTGGGTCG there are: 1 Tags
#>     In sample $AACTTCTGTGTTCCTC there are: 1 Tags
#>     In sample $AAGACAAAGCTAAATG there are: 1 Tags
#>     In sample $AAGACAAAGGCGCTTC there are: 1 Tags
#>     In sample $AAGCAGTGGTATCAAC there are: 8 Tags
#>     In sample $AAGGAGCTGACAGGTG there are: 1 Tags
#>     In sample $AAGTCGTAGCACGGAT there are: 1 Tags
#>     In sample $AAGTGAAAGTCCCGGT there are: 1 Tags
#>     In sample $AAGTTCGAGAGGTATT there are: 1 Tags
#>     In sample $AATAGAGGTTCTAACG there are: 1 Tags
#>     In sample $AATCGACTCATCGCTC there are: 1 Tags
#>     In sample $ACAAAGAGTAGGTCAG there are: 1 Tags
#>     In sample $ACAACCATCACGGAGA there are: 1 Tags
#>     In sample $ACATGGGGTAAAAGGA there are: 1 Tags
#>     In sample $AGCAGTGGTATCAACG there are: 1 Tags
#>     In sample $AGGCATTAAAGCAGCG there are: 1 Tags
#>     In sample $AGGCCTCACATTCTTC there are: 1 Tags
#>     In sample $AGTCTTTCGTCAAACA there are: 1 Tags
#>     In sample $CAAATTTTGTAATCCA there are: 1 Tags
#>     In sample $CACAAATTTTGTAATC there are: 1 Tags
#>     In sample $CACAACTCCTCATAAA there are: 1 Tags
#>     In sample $CAGTGGTATCAACGCA there are: 1 Tags
#>     In sample $CCTTGTGAGTGTTACC there are: 1 Tags
#>     In sample $CTACGGGAAGCAATAG there are: 1 Tags
#>     In sample $GATACAAAGGCATTAA there are: 1 Tags
#>     In sample $GCAGTGGTATCAACGC there are: 2 Tags
#>     In sample $GGAAGCAATAGCATGA there are: 1 Tags
#>     In sample $GTGGTATCAACGCAGA there are: 2 Tags
#>     In sample $GTTAAGAATACCAGTC there are: 1 Tags
#>     In sample $TAAGCCAAAAGAACAA there are: 1 Tags
#>     In sample $TAAGCCATAAACATAT there are: 1 Tags
#>     In sample $TGAAAGTGACAACTGA there are: 1 Tags
#>     In sample $TTCACCGATTTTGTAA there are: 2 Tags
#>     In sample $TTCCAAATTTTGTAAT there are: 1 Tags
#>     In sample $TTCCTCTCAGAATTGG there are: 1 Tags
#>     In sample $TTCGCTGATTTTGTAA there are: 1 Tags
#>     In sample $TTCTTCACAGAATTGG there are: 1 Tags
#>     In sample $TTGGGCGTCTTTGGGC there are: 1 Tags 

## Read bam file directly
bam_file <- system.file("extdata", "scRNASeq_10X.bam", package = "CellBarcode")
bc_extract_sc_bam(
   bam = bam_file,
   pattern = "AGATCAG(.*)TGTGGTA",
   cell_barcode_tag = "CR",
   umi_tag = "UR"
)
#> Start to convert bam file to sam file.
#> sam file path:  /var/folders/46/vh_b6qzs5kzgdhvx1kcn7hyh0000gn/T//RtmpdbkBWL/output.sam 
#> Bonjour le monde, This is a BarcodeObj.
#> ----------
#> It contains: 
#> ----------
#> @metadata: 2 field(s) available:
#> raw_read_count  barcode_read_count
#> ----------
#> @messyBc: 49 sample(s) for raw barcodes:
#>     In sample $AAACGAAAGTTCATGC there are: 1 Tags
#>     In sample $AAACGCTAGCTGACCC there are: 1 Tags
#>     In sample $AAAGGATAGTGTGTTC there are: 1 Tags
#>     In sample $AAAGGGCAGCCTGAGA there are: 1 Tags
#>     In sample $AAAGGTAAGGCCCACT there are: 1 Tags
#>     In sample $AAAGTCCCACTGCGAC there are: 1 Tags
#>     In sample $AAAGTGATCTGACGCG there are: 1 Tags
#>     In sample $AACACACCAATCCAGT there are: 1 Tags
#>     In sample $AACACACTCACCGACG there are: 1 Tags
#>     In sample $AACCCAAGGTAATTTA there are: 1 Tags
#>     In sample $AACCTTTGTCATTCCC there are: 1 Tags
#>     In sample $AACGAAATCTGGGTCG there are: 1 Tags
#>     In sample $AACTTCTGTGTTCCTC there are: 1 Tags
#>     In sample $AAGACAAAGCTAAATG there are: 1 Tags
#>     In sample $AAGACAAAGGCGCTTC there are: 1 Tags
#>     In sample $AAGCAGTGGTATCAAC there are: 8 Tags
#>     In sample $AAGGAGCTGACAGGTG there are: 1 Tags
#>     In sample $AAGTCGTAGCACGGAT there are: 1 Tags
#>     In sample $AAGTGAAAGTCCCGGT there are: 1 Tags
#>     In sample $AAGTTCGAGAGGTATT there are: 1 Tags
#>     In sample $AATAGAGGTTCTAACG there are: 1 Tags
#>     In sample $AATCGACTCATCGCTC there are: 1 Tags
#>     In sample $ACAAAGAGTAGGTCAG there are: 1 Tags
#>     In sample $ACAACCATCACGGAGA there are: 1 Tags
#>     In sample $ACATGGGGTAAAAGGA there are: 1 Tags
#>     In sample $AGCAGTGGTATCAACG there are: 1 Tags
#>     In sample $AGGCATTAAAGCAGCG there are: 1 Tags
#>     In sample $AGGCCTCACATTCTTC there are: 1 Tags
#>     In sample $AGTCTTTCGTCAAACA there are: 1 Tags
#>     In sample $CAAATTTTGTAATCCA there are: 1 Tags
#>     In sample $CACAAATTTTGTAATC there are: 1 Tags
#>     In sample $CACAACTCCTCATAAA there are: 1 Tags
#>     In sample $CAGTGGTATCAACGCA there are: 1 Tags
#>     In sample $CCTTGTGAGTGTTACC there are: 1 Tags
#>     In sample $CTACGGGAAGCAATAG there are: 1 Tags
#>     In sample $GATACAAAGGCATTAA there are: 1 Tags
#>     In sample $GCAGTGGTATCAACGC there are: 2 Tags
#>     In sample $GGAAGCAATAGCATGA there are: 1 Tags
#>     In sample $GTGGTATCAACGCAGA there are: 2 Tags
#>     In sample $GTTAAGAATACCAGTC there are: 1 Tags
#>     In sample $TAAGCCAAAAGAACAA there are: 1 Tags
#>     In sample $TAAGCCATAAACATAT there are: 1 Tags
#>     In sample $TGAAAGTGACAACTGA there are: 1 Tags
#>     In sample $TTCACCGATTTTGTAA there are: 2 Tags
#>     In sample $TTCCAAATTTTGTAAT there are: 1 Tags
#>     In sample $TTCCTCTCAGAATTGG there are: 1 Tags
#>     In sample $TTCGCTGATTTTGTAA there are: 1 Tags
#>     In sample $TTCTTCACAGAATTGG there are: 1 Tags
#>     In sample $TTGGGCGTCTTTGGGC there are: 1 Tags