This function will merge the UMIs by using the hamming distance. If two UMIs have hamming distance no more than 1, only the UMI with more reads will be kept.
seq_correct(
seq,
count,
count_threshold,
dist_threshold,
depth_fold_threshold = 1,
dist_method = 1L,
insert_cost = 1L,
delete_cost = 1L,
replace_cost = 1L
)
A string vector.
An integer vector with the same order and length of UMI
An integer, barcode count threshold to consider a barcode as a true barcode, when when a barcode with count higher than this threshold it will not be removed.
A integer, distance threshold to consider two barcodes are related.
An numeric, control the fold cange threshold between the ' major barcodes and the potential contamination that need to be removed.
A integer, if 2 the levenshtein distance will be used, otherwise the hamming distance will be applied.
A integer, the insert cost when levenshtein distance is applied.
A integer, the delete cost when levenshtein distance is applied.
A integer, the replace cost when levenshtein distance is applied.
a list with two data.frame. seq_freq_tab: table with barcode and corrected ' sequence reads; link_tab: data table record for the clustering process with ' first column of barcode be removed and second column of the majority barcode barcode.
This function will return the corrected UMI list.