bc_extract_10X_fastq can extract cellular barcode, UMI, and lineage barcode sequences from 10X Genomics scRNASeq fastq file. This function can process the barcodes in the scRNASeq fastq file or target amplified fastq files directly.

bc_extract_sc_fastq(
  fq1,
  fq2 = NULL,
  patternCellBarcode = NULL,
  patternUMI = NULL,
  patternBarcode = NULL
)

Arguments

fq1

A string, the fastq file contains the cellular barcode and lineage barcode

fq2

A string, it is optional, it provides the second fastq file contains the cellular barcode and lineage barcode. Two fastq files will be concatenated for the barcode extraction

patternCellBarcode

A string, defines the regular expression to match the single cell cellular barcode sequence. The expected sequence should be in the first catch. Please see the documents of bc_extract and example for more information.

patternUMI

A string, defines the regular expression to match the UMI sequence. The expected sequence should be in the first catch. Please see the documents of bc_extract and example for more information.

patternBarcode

the regular expression to match the lineage barcode. The expected sequence should be in the first catch. Please see the documents of bc_extract and example for more information.

Value

A BarcodeObj object with each cell as a sample.

Details

It should take some effort to define the regular expression to match the barcode sequence. Here I also provide the example to extract the barcode from 10X Genomics scRNASeq results. It also can be used to extract the barcode from other system.

The function can process the barcodes in the scRNASeq fastq file or target amplified fastq files. For the 10X scRNASeq fastq file, the cellular barcode is in the first 16bp of the read1, the UMI is in the next 12bp, and the lineage barcode is in the read2.

The usage of the function will be like this:


bc_extract_sc_fastq(
   fq1 = "read1.fastq.gz",
   fq2 = "read2.fastq.gz",
   patternCellBarcode = "(.{16})",
   patternUMI = ".{16}(.{12})",
   patternBarcode = "CGAAGTATCAAG(.+)CCGTAGCAAG"
)