R/function-single_cell_seq.R
bc_extract_sc_fastq.Rd
bc_extract_10X_fastq
can extract cellular barcode, UMI, and lineage barcode
sequences from 10X Genomics scRNASeq fastq file. This function can process
the barcodes in the scRNASeq fastq file or target amplified fastq files directly.
bc_extract_sc_fastq(
fq1,
fq2 = NULL,
patternCellBarcode = NULL,
patternUMI = NULL,
patternBarcode = NULL
)
A string, the fastq file contains the cellular barcode and lineage barcode
A string, it is optional, it provides the second fastq file contains the cellular barcode and lineage barcode. Two fastq files will be concatenated for the barcode extraction
A string, defines the regular expression to match
the single cell cellular barcode sequence. The expected sequence should be in
the first catch. Please see the documents of
bc_extract
and example for more information.
A string, defines the regular expression to match the UMI
sequence. The expected sequence should be in the first catch. Please see the
documents of bc_extract
and example for more
information.
the regular expression to match the lineage barcode. The
expected sequence should be in the first catch. Please see the documents of
bc_extract
and example for more information.
A BarcodeObj object with each cell as a sample.
It should take some effort to define the regular expression to match the barcode sequence. Here I also provide the example to extract the barcode from 10X Genomics scRNASeq results. It also can be used to extract the barcode from other system.
The function can process the barcodes in the scRNASeq fastq file or target amplified fastq files. For the 10X scRNASeq fastq file, the cellular barcode is in the first 16bp of the read1, the UMI is in the next 12bp, and the lineage barcode is in the read2.
The usage of the function will be like this:
bc_extract_sc_fastq(
fq1 = "read1.fastq.gz",
fq2 = "read2.fastq.gz",
patternCellBarcode = "(.{16})",
patternUMI = ".{16}(.{12})",
patternBarcode = "CGAAGTATCAAG(.+)CCGTAGCAAG"
)