Sample demultiplexing, read mapping and UMI (Unique Molecular Identifier) counting are usually the first steps in a pipeline for analysing single-cell RNA-seq (scRNA-seq) data. 10x Chromium is currently the most dominating scRNA-seq platform in the field. While there have been several tools developed for quantifying data generated from this platform, Cell Ranger, developed by 10X, is currently the most widely used method.
In the last decade, we have created a toolkit covering many aspects of bulk RNA-seq data analysis. This includes the Subread aligner for read mapping and featureCounts for read quantification. Because of its superior speed and accuracy, our Rsubread package has achieved a high level of adoption by the researchers around the world.
We extended the functionality of Rsubread and created a new program in Rsubread, called cellCounts, for scRNA-seq read mapping and quantification. Many thoughts have been put into the development of this program to make it highly efficient. The cellCounts program directly takes the raw base calling results from the sequencer as the input, avoiding the time-consuming sample-demultiplexing step. It then uses the highly efficient seed-and-vote paradigm for read mapping. It also adopts the very fast read matching algorithm in our featureCounts program for read quantification. In our tests, cellCounts has demonstrated a higher speed than Cell Ranger with comparable or higher accuracy.
In this talk, I will present the details of the cellCounts program. I will elaborate on the cellCounts method for demultiplexing the raw samples, for read mapping and for UMI counting/filtering. The speed and accuracy comparison between cellCounts and CellRanger using real and simulation data will also be presented.