Single cell rna seq workflow

All authors wrote and approved the final manuscript. This version of the workflow contains a number of improvements based on the referees' comments. We have re-compiled the workflow using the latest packages from Bioconductor release 3. We have added a reference to the Bioconductor workflow page, which provides user-friendly instructions for installation and execution of the workflow.

We have also moved cell cycle classification before gene filtering as this provides more precise cell cycle phase classifications. Some minor rewording and elaborations have also been performed in various parts of the article. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity.

The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise.

This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection.

single cell rna seq workflow

Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines. Single-cell RNA sequencing scRNA-seq is widely used to measure the genome-wide expression profile of individual cells.

This can be done using microfluidics platforms like the Fluidigm C1 Pollen et al. The number of reads mapped to each gene is then used to quantify its expression in each cell. Alternatively, unique molecular identifiers UMIs can be used to directly measure the number of transcript molecules for each gene Islam et al.

Count data are analyzed to detect highly variable genes HVGs that drive heterogeneity across cells in a population, to find correlations between genes and cellular phenotypes, or to identify new subpopulations via dimensionality reduction and clustering. This provides biological insights at a single-cell resolution that cannot be achieved with conventional bulk RNA sequencing of cell populations.

One technical reason is that scRNA-seq data are much noisier than bulk data Brennecke et al. Reliable capture i. This increases the frequency of drop-out events where none of the transcripts for a gene are captured. Dedicated steps are required to deal with this noise during analysis, especially during quality control. In addition, scRNA-seq data can be used to study cell-to-cell heterogeneity, e. This is simply not possible with bulk data, meaning that custom methods are required to perform these analyses.

This article describes a computational workflow for basic analysis of scRNA-seq data, using software packages from the open-source Bioconductor project release 3. Starting from a count matrix, this workflow contains the steps required for quality control to remove problematic cells; normalization of cell-specific biases, with and without spike-ins; cell cycle phase classification from gene expression data; data exploration to identify putative subpopulations; and finally, HVG and marker gene identification to prioritize interesting genes.

The application of different steps in the workflow will be demonstrated on several public scRNA-seq datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells, generated with a range of experimental protocols and platforms Buettner et al.

The aim is to provide a variety of modular usage examples that can be applied to construct custom analysis pipelines. To introduce most of the concepts of scRNA-seq data analysis, we use a relatively simple dataset from a study of haematopoietic stem cells HSCs Wilson et al. Single mouse HSCs were isolated into microtiter plates and libraries were prepared for 96 cells using the Smart-seq2 protocol. High-throughput sequencing was performed and the expression of each gene was quantified by counting the total number of reads mapped to its exonic regions.

Similarly, the quantity of each spike-in transcript was measured by counting the number of reads mapped to the spike-in reference sequences. For simplicity, we forego a description of the read processing steps required to generate the count matrix, i. These steps have been described in some detail elsewhere Chen et al. The only additional consideration is that the spike-in information must be included in the pipeline. Typically, spike-in sequences can be included as additional FASTA files during genome index building prior to alignment, while genomic intervals for both spike-in transcripts and endogenous genes can be concatenated into a single GTF file prior to counting.A highly sensitive and accurate tool for measuring expression across the transcriptome, it is providing researchers with visibility into previously undetected changes occurring in disease states, in response to therapeutics, under different environmental conditions, and across a broad range of other study designs.

RNA-Seq allows researchers to detect both known and novel features in a single assay, enabling the detection of transcript isoforms, gene fusions, single nucleotide variants, and other features without the limitation of prior knowledge. RNA sequencing can have far-reaching effects on research and innovation, transforming our understanding of the world around us. RNA-Seq with next-generation sequencing NGS is increasingly the method of choice for researchers studying the transcriptome.

It offers numerous advantages over gene expression arrays. Learn how RNA-Seq is advancing transcriptome research in various fields, and how gene regulation studies can provide complementary information. Learn about 7 key RNA-Seq methods.

A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

Find out how they differ to help you determine the method most appropriate for your research. Study gene expression and transcriptome changes with cancer RNA-Seq. Analyze host-pathogen interactions or bacterial transcriptome signatures with microbial RNA-Seq. Study drug response RNA biomarkers. Sensitively and accurately quantify gene expression, identify known and novel isoforms in the coding transcriptome, detect gene fusions, and measure allele-specific expression.

Analyze gene expression in a focused set of genes of interest. Targeted RNA-Seq can be achieved via either enrichment or amplicon-based approaches. Use deep RNA-Seq to examine the signals and behavior of a cell in the context of its surrounding environment. This method is advantageous for biologists studying processes such as differentiation, proliferation, and tumorigenesis. Deeply sequence ribosome-protected mRNA fragments to gain a complete view of the ribosomes active in a cell at a specific time point, and predict protein abundance.

Accurately measure gene and transcript abundance and detect both known and novel features in coding and multiple forms of noncoding RNA. Isolate and sequence small RNA species, such as microRNA, to understand the role of noncoding RNA in gene silencing and posttranscriptional regulation of gene expression. Achieve cost-effective RNA exome analysis using sequence-specific capture of the coding regions of the transcriptome.

Ideal for low-quality samples or limited starting material. Benchtop sequencer supporting multiple applications, including up to 16 mRNA samples in a single run. This collection contains protocol diagrams, advantages and disadvantages, and related peer-reviewed publications on various RNA-Seq methods featuring Illumina technology.

Researchers use targeted RNA sequencing to understand the role of fusion genes in pediatric leukemia. Learn about read length and depth requirements for RNA-Seq and find resources to help with experimental design.

RNA sequencing provides deeper insights for complex research.This article is included in the Bioconductor gateway. Single-cell RNA sequencing scRNA-seq is a powerful and promising class of high-throughput assays that enable researchers to measure genome-wide transcription levels at the resolution of single cells. To properly account for features specific to scRNA-seq, such as zero inflation and high levels of technical noise, several novel statistical methods have been developed to tackle questions that include normalization, dimensionality reduction, clustering, the inference of cell lineages and pseudotimes, and the identification of differentially expressed DE genes.

While each individual method is useful on its own for addressing a specific question, there is an increasing need for workflows that integrate these tools to yield a seamless scRNA-seq data analysis pipeline.

This is all the more true with novel sequencing technologies that allow an increasing number of cells to be sequenced in each run. The workflow described in Lun et al. In these workflows, single-cell expression data are organized in objects of the SCESet class allowing integrated analysis. However, these workflows are mostly used to prepare the data for further downstream analysis and do not focus on steps such as cell clustering and lineage inference.

Here, we propose an integrated workflow for dowstream analysis, with the following four main steps: 1 dimensionality reduction accounting for zero inflation and over-dispersion, and adjusting for gene and cell-level covariates, using the zinbwave Bioconductor package; 2 robust and stable cell clustering using resampling-based sequential ensemble clustering, as implemented in the clusterExperiment Bioconductor package; 3 inference of cell lineages and ordering of the cells by developmental progression along lineages, using the slingshot R package; and 4 DE analysis along lineages.

Throughout the workflow, we use a single SummarizedExperiment object to store the scRNA-seq data along with any gene or cell-level metadata available from the experiment See Figure 1.

This workflow is illustrated using data from a scRNA-seq study of stem cell differentiation in the mouse olfactory epithelium OE Fletcher et al. The olfactory epithelium contains mature olfactory sensory neurons mOSN that are continuously renewed in the epithelium via neurogenesis through the differentiation of globose basal cells GBCwhich are the actively proliferating cells in the epithelium. When a severe injury to the entire tissue happens, the olfactory epithelium can regenerate from normally quiescent stem cells called horizontal basal cells HBCwhich become activated to differentiate and reconstitute all major cell types in the epithelium.

The scRNA-seq dataset we use as a case study was generated to study the differentiation of HBC stem cells into different cell types present in the olfactory epithelium. The expression level of each gene in a given cell was quantified by counting the total number of reads mapping to it.

Cells were then assigned to different lineages using a statistical analysis pipeline analogous to that in the present workflow. Finally, results were validated experimentally using in vivo lineage tracing.

Details on data generation and statistical methods are available in Fletcher et al. In this workflow, we describe a sequence of steps to recover the lineages found in the original study, starting from the genes by cells matrix of raw counts publicly available on the NCBI Gene Expression Omnibus with accession GSE Note that in order to successfully run the workflow, we need the following versions of the Bioconductor packages scone 1.

We recommend running Bioconductor 3. To give the user an idea of the time needed to run the workflow, the function system. Computations were performed with 2 cores on a MacBook Pro early with a 2. The Bioconductor package iocParallel was used to allow for parallel computing in the zinbwave function.

Users with a different operating system may change the package used for parallel computing and the NCORES variable below.

Before filtering, the dataset had cells and 28, detected genes i. Note that in the following, we assume that the user has access to a data folder located at.

Throughout the workflow, we use the class SummarizedExperiment to keep track of the counts and their associated metadata within a single object. The cell-level metadata contain quality control measures, sequencing batch ID, and cluster and lineage labels from the original publication Fletcher et al. Cells with a cluster label of -2 were not assigned to any cluster in the original publication.

See the scone vignette for details on the filtering procedure. Finally, for computational efficiency, we retain only the 1, most variable genes. This seems to be a reasonnable choice for the illustrative purpose of this workflow, as we are able to recover the biological signal found in the published analysis Fletcher et al.

SINGLE-CELL RNA SEQUENCING

In general, however, we recommend care in selecting a gene filtering scheme, as an appropriate choice is dataset-dependent. Overall, after the above pre-processing steps, our dataset has 1, genes and cells. Metadata for the cells are stored in the slot colData from the SummarizedExperiment object. Cells were processed in 18 different batches. In the original work Fletcher et al.

Note that there is partial nesting of batches within clusters i.This article is included in the Bioconductor gateway.

This version of the workflow contains a number of improvements based on the referees' comments. We have re-compiled the workflow using the latest packages from Bioconductor release 3. We have added a reference to the Bioconductor workflow page, which provides user-friendly instructions for installation and execution of the workflow.

We have also moved cell cycle classification before gene filtering as this provides more precise cell cycle phase classifications. Some minor rewording and elaborations have also been performed in various parts of the article. See the authors' detailed response to the review by Diana H. Low See the authors' detailed response to the review by Andrew McDavid See the authors' detailed response to the review by Antonio Rausell See the authors' detailed response to the review by David duVerle See the authors' detailed response to the review by Hongkai Ji.

Single-cell RNA sequencing scRNA-seq is widely used to measure the genome-wide expression profile of individual cells. This can be done using microfluidics platforms like the Fluidigm C1 Pollen et al. The number of reads mapped to each gene is then used to quantify its expression in each cell. Alternatively, unique molecular identifiers UMIs can be used to directly measure the number of transcript molecules for each gene Islam et al.

Count data are analyzed to detect highly variable genes HVGs that drive heterogeneity across cells in a population, to find correlations between genes and cellular phenotypes, or to identify new subpopulations via dimensionality reduction and clustering. This provides biological insights at a single-cell resolution that cannot be achieved with conventional bulk RNA sequencing of cell populations. One technical reason is that scRNA-seq data are much noisier than bulk data Brennecke et al.

Reliable capture i. This increases the frequency of drop-out events where none of the transcripts for a gene are captured. Dedicated steps are required to deal with this noise during analysis, especially during quality control.

In addition, scRNA-seq data can be used to study cell-to-cell heterogeneity, e. This is simply not possible with bulk data, meaning that custom methods are required to perform these analyses. This article describes a computational workflow for basic analysis of scRNA-seq data, using software packages from the open-source Bioconductor project release 3.

Starting from a count matrix, this workflow contains the steps required for quality control to remove problematic cells; normalization of cell-specific biases, with and without spike-ins; cell cycle phase classification from gene expression data; data exploration to identify putative subpopulations; and finally, HVG and marker gene identification to prioritize interesting genes.

single cell rna seq workflow

The application of different steps in the workflow will be demonstrated on several public scRNA-seq datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells, generated with a range of experimental protocols and platforms Buettner et al.

The aim is to provide a variety of modular usage examples that can be applied to construct custom analysis pipelines.

To introduce most of the concepts of scRNA-seq data analysis, we use a relatively simple dataset from a study of haematopoietic stem cells HSCs Wilson et al.

single cell rna seq workflow

Single mouse HSCs were isolated into microtiter plates and libraries were prepared for 96 cells using the Smart-seq2 protocol. High-throughput sequencing was performed and the expression of each gene was quantified by counting the total number of reads mapped to its exonic regions.

Similarly, the quantity of each spike-in transcript was measured by counting the number of reads mapped to the spike-in reference sequences. For simplicity, we forego a description of the read processing steps required to generate the count matrix, i. These steps have been described in some detail elsewhere Chen et al. The only additional consideration is that the spike-in information must be included in the pipeline. Typically, spike-in sequences can be included as additional FASTA files during genome index building prior to alignment, while genomic intervals for both spike-in transcripts and endogenous genes can be concatenated into a single GTF file prior to counting.

For users favouring an R-based approach to read alignment and counting, we suggest using the methods in the Rsubread package Liao et al. Alternatively, rapid quantification of expression with alignment-free methods such as kallisto Bray et al. The first task is to load the count matrix into memory.

In this case, some work is required to retrieve the data from the Gzip-compressed Excel format.Single-cell analyses allow uncovering cellular heterogeneity, not only per se, but also in response to viral infection.

Similarly, single cell transcriptome analyses scRNA-Seq can highlight specific signatures, identifying cell subsets with particular phenotypes, which are relevant in the understanding of virus-host interactions.

Identification of specific cell signatures using single-cell RNA-seq and phenotypic analyses. The whole analysis pipeline can be divided in five steps. Multiple transcriptomic analyses can be performed to inform about data structure and cell heterogeneity using dimensionality reduction plots, such as t-Distributed Stochastic Neighbor Embedding t-SNE or Principal Component Analysis PCA.

RNA-Seq Workflow Template

Cell clustering and differential expression analysis can provide specific gene expression profiles. Finally, single cell transcriptomes can be compared to cell population transcriptome or to quantitative RT-qPCR analyses. These three first steps characterize the process for single-cell transcriptomic analysis, allowing mostly revealing the level of cellular heterogeneity in a cell population, that is, identifying one or multiple cell subpopulations. Phenotypic analyses include assessment of protein expression level and localization, using FACS staining, immunofluorescence or western blots, or assessment of protein activity studies, using assays quantifying enzymatic activity, transport activity channels or translocation assays metabolites.

The analysis of specific phenotypes should also reveal single cell heterogeneity i. Tagged with: scRNA-seq Single-cell. Your email address will not be published. Save my name, email, and website in this browser for the next time I comment.

Time limit is exhausted. Don't Miss Toxic cell atlas guides new therapies for neurodegeneration Scirpy — A Scanpy extension for analyzing single-cell T-cell receptor sequencing data Racing a pandemic — Yale scientists design new ways of tracking COVID Sci-fate characterizes the dynamics of gene expression in single cells Ultrasensitive detection of circular RNA by accurate recognition of the specific junction site The architecture of SARS-CoV-2 transcriptome Researchers develop an online tool to refine results from RNA sequencing Single cell RNA sequencing reveals which cells are attacked by the novel coronavirus Sequencing and structure probing of long RNAs using MarathonRT A cheaper way to study the immune system, one cell at a time.

Identification of specific cell signatures using single-cell RNA-seq and phenotypic analyses The whole analysis pipeline can be divided in five steps. Cristinelli S, Ciuffi A. Curr Opin Virol RNA-Seq Blog. Leave a Reply Cancel reply Your email address will not be published.Cell populations are rarely homogeneous and synchronized in their characteristics. Single-cell RNA sequencing aims to uncover the transcriptome diversity in heterogeneous samples.

Recent advances in microfluidics and molecular barcoding have made the transcriptional profiling of tens of thousands of individual cells cost-effective and easy to interpret.

As an early adopter of these platforms, our optimized workflows—including pre-submission cryopreservation and post-submission dead cell removal— maximize project flexibility, speed, and data accuracy. Not sure which service is right for you? See our FAQ. This tech note describes how GENEWIZ scientists used optimized single-cell workflows, including dead cell removal, to overcome low viability and generate high-quality sequencing data. Standard RNA-Seq approaches are limited to reporting general expression levels thus omitting minor subpopulation profiles.

This study highlights new single-cell RNA-sequencing capabilities for identifying rare cells, characterizing their transcriptomes, and discovering potential biomarkers. With our methods for cryopreservation and dead cell removal, we provide flexibility and convenience to scientists.

This presentation was featured at the Deep Sequencing Forum Toggle navigation. Brooks Life Sciences — Services. Next Generation Sequencing. Request Quote. Early adopter of 10x Genomics Chromium with optimized workflows that maximize project flexibility, speed, and data accuracy. Highest throughput sequencing platformsincluding the Illumina NovaSeqprovide cost-effective single-cell solutions.

Proprietary cell freezing protocol maintains cell viability during transit and provides a convenient method to ship samples. Interactive analysis report provides an intuitive way to explore the data and find biological insights.

Technical Resources. Download Now. Watch Now. Related Services. Whole Genome Sequencing including single-cell options. How To Order. Email: NGS.Michael I. Bioconductor has many packages which support analysis of high-throughput sequence data, including RNA sequencing RNA-seq.

The packages which we will use in this workflow include core packages maintained by the Bioconductor core team for working with gene annotations gene and transcript locations in the genome, as well as gene ID lookup. We will also use contributed packages for statistical analysis and visualization of sequencing data. The packages used in this workflow are loaded with the library function and can be installed by following the Bioconductor package installation instructions.

The data used in this workflow is stored in the airway package that summarizes an RNA-seq experiment wherein airway smooth muscle cells were treated with dexamethasone, a synthetic glucocorticoid steroid with anti-inflammatory effects Himes et al.

Glucocorticoids are used, for example, by people with asthma to reduce inflammation of the airways. In the experiment, four primary human airway smooth muscle cell lines were treated with 1 micromolar dexamethasone for 18 hours. For each of the four cell lines, we have a treated and an untreated sample. The value in the i -th row and the j -th column of the matrix tells how many reads or fragments, for paired-end RNA-seq can be assigned to gene i in sample j.

Analogously, for other types of assays, the rows of the matrix might correspond e. A previous version of this workflow including the published version demonstrated how to align reads to the genome and then count the number of reads that are consistent with gene models.

We now recommend a faster, alternative pipeline to genome alignment and read counting. This workflow will demonstrate how to import transcript-level quantification data, aggregating to the gene-level with tximport or tximeta.

Transcript quantification methods such as Salmon Patro et al.

single cell rna seq workflow

After running one of these tools, the tximport Soneson, Love, and Robinson or tximeta Love et al. A tutorial on how to use the Salmon software for quantifying transcript abundance can be found here. We recommend using the --gcBias flag which estimates a correction factor for systematic biases commonly present in RNA-seq data Love, Hogenesch, and Irizarry ; Patro et al.

Note that transcript abundance quantifiers skip the generation of large files which store read alignments, instead producing smaller files which store estimated abundances, counts, and effective lengths per transcript. For more details, see the manuscript describing this approach Soneson, Love, and Robinsonand the tximport package vignette for software details.

See the tximeta vignette package vignette for more details. We will also discuss the various possible inputs into DESeq2whether using tximporttximetahtseq Anders, Pyl, and Huberor a pre-computed count matrix.

Analysis of single cell RNA-seq data 23-24 May 2019

As mentioned above, a short tutorial on how to use Salmon can be found hereso instead we will provide the code that was used to quantify the files used in this workflow.


Single cell rna seq workflow