可变剪切

参考
https://blog.csdn.net/dikuangzhong6068/article/details/101198262?utm_medium=distribute.pc_relevant.none-task-blog-2~default~baidujs_baidulandingword~default-0.highlightwordscore&spm=1001.2101.3001.4242.1

概念

可变剪接(alternative splicing),在真核生物中是一种非常基本的生物学事件。即基因转录后,先产生初始RNA或称作RNA前体,然后再通过可变剪接方式,选择性的把不同的外显子进行重连,从而产生不同的剪接异构体(isoform)。这种方式,使得一个基因可产生多个不同的转录本,这些转录本分别在细胞/个体分化发育的不同阶段,在不同的组织中有各自特异的表达和功能,从而极大地丰富了编码RNA和非编码RNA种类和数量,进而增加了转录组和蛋白质组的复杂性。

形式

1、外显子跳跃,英文Exon Skipping, 或者叫做cassette exon;

2、内含子保留,英文Intron Retention;

3、互斥外显子,英文Mutually Exclusive Exons;

4、外显子5’端的选择性剪接,Alternative 5’ splice site,A5SS

5、外显子3’端的选择性剪接,Alternative 3’ splice site,A3SS

单细胞领域工具

STARsolo: mapping, demultiplexing and gene quantification for single cell RNA-seq.

别人的总结

自己的搜索

  • scVelo - [Python] - scVelo is a scalable toolkit for RNA velocity analysis in single cells. It generalizes the concept of RNA velocity by relaxing previously made assumptions with a dynamical model. It allows to identify putative driver genes, infer a latent time, estimate reaction rates of transcription, splicing and degradation, and detect competing kinetics. 怎么去看呢,是否有具体的信息

  • SingleSplice - [R, perl, C++] - A tool for detecting biological variation in alternative splicing within a population of single cells. See Welch et al. 2016. 需要ERCC作为参照

  • rMATS - [Python] - RNA-Seq Multavariate Analysis of Transcript Splicing. 2014年发表的工具,后面有速度的提升版本,目测不能用于3’数据。

  • outrigger - [Python] - Outrigger is a program to calculate alternative splicing scores of RNA-Seq data based on junction reads and a de novo, custom annotation created with a graph database, especially made for single-cell analyses.作为Expedition的一个部分,专门为单细胞设计。文献里面是C1的数据来做的。

  • ICGS - [Python] - Iterative Clustering and Guide-gene Selection (Olsson et al. Nature 2016). Identify discrete, transitional and mixed-lineage states from diverse single-cell transcriptomics platforms. Integrated FASTQ pseudoalignment /quantification (Kallisto), differential expression, cell-type prediction and optional cell cycle exclusion analyses. Specialized methods for processing BAM and 10X Genomics spares matrix files. Associated single-cell splicing PSI methods (MultIPath-PSI). Apart of the AltAnalyze toolkit along with accompanying visualization methods (e.g., heatmap, t-SNE, SashimiPlots, network graphs). Easy-to-use graphical user and commandline interfaces.

  • flotilla - [Python] - Reproducible machine learning analysis of gene expression and alternative splicing data

  • Sierra: discovery of differential transcript usage from polyA-captured single-cell RNA-seq data,可以针对3’端的数据

  • BRIE:不适用于3‘端测序数据

  • LeafCutter:不是单细胞的方法

  • SpliZ:RNA splicing programs define tissue compartments and cell types at single-cell resolution

分析目的

xbp1两种形式的功能研究。

xbp1的两种剪切体的形式:Xbp1属于外显子3‘端的选择性剪切

定量两种isoform,并且得到两个group(ko和wt)中分别的表达量。

问题
没有搞清楚scVelo得到的文件是否有两个isoform的定量信息,得到的信息为unspliced和spliced的两个矩阵。没有不同isoform的格式。
如果用其他方法的话,Sierra,ISOP,ICGS 哪个能用

ICGS:更名为AltAnalyze,适用于单细胞数据,可以用于3’端实验的分析,似乎使用率很高。没有搞懂内容
https://altanalyze.readthedocs.io/en/latest/

review 里面的东西:
ISOP:10x数据可以用,是R包,下游处理差异表达的,输入是什么?输入就是一个isoform乘以sample的矩阵,上游是利用cufflinks对比对后的bam产生两个矩阵。需要全长序列
https://academic.oup.com/bioinformatics/article/34/14/2392/4911530
https://github.com/nghiavtr/ISOP

Sierra:GB的方法,discovery of differential transcript usage from polyA-captured single-cell RNA-seq data
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02071-7
https://github.com/VCCRI/Sierra

得到一个序列比对位置peak的矩阵,鉴定used polyadenylated sites in scRNA-seq data。

查看单个基因的isoform,需要提取从bam里面提取这个基因,进行查看。

三个方法的输入输出,可以做哪些分析。

The 10X data were not equipped for alternative splicing analysis due to the 3′-bias (Figure 6C, Figure S8C). Nevertheless, 10X still detected non-negligible number of junctions, even though they only accounted for approximately 50% of those junctions detected by Smart-seq2. Although Smart-seq2 data were clearly much more suitable for alternative splicing studies [41], [42], the limited number of splicing junctions detected by 10X might be suitable for certain analyses that rely on junction-based characterization, such as the RNA velocity analysis [43].

有10x的isoform的鉴定
“Isoform specificity in the mouse primary motor cortex”
https://github.com/pachterlab/BYVSTZP_2020/blob/master/analysis/notebooks/10xv3/final-10x_isoform.ipynb

Modular, efficient and constant-memory single-cell RNA-seq preprocessing
https://www.nature.com/articles/s41587-021-00870-2

A discriminative learning approach to differential expression analysis for single-cell RNA-seq
https://github.com/pachterlab/NYMP_2018/blob/master/10x_example-logR/10x_example_logR-TCC_notebook.ipynb
产生TCC matrix:https://github.com/pachterlab/scRNA-Seq-TCC-prep

https://www.nature.com/articles/s41592-018-0303-9#code-availability

kallisto
https://www.kallistobus.tools/kb_usage/kb_usage/
使用bustool可以生成TCC,但是需要bus file的输入,有fastq文件生成。

AltAnalyze
http://altanalyze.blogspot.com/2016/08/bye-bye-bed-files-welcome-bam.html

https://altanalyze.readthedocs.io/en/latest/RunningAltAnalyze/#selecting-the-rna-seq-analysis-method