Lecture 1 Introduction

Lecture 2 Dynamic Programming(动态规划)
Module 1: Aligning and modeling genomes

人的基因组1.5%编码20000多个基因,与其他物种高度保守,剩余序列不保守。
Introduction to sequence alignment
genomes change over time.

occam’s razor:找最短路径
比对的本质就是两个字符串相似性的比较,是否gaps allow, 找到longest common subsequence.
Formualation1. longest common substring(no gap)
Formualation2. longest common subsequnece( gaps allowed)
edit distance: number of change needed for S1-S2,uniform scoring function
Formualtion 3. Sequence alignment
碱基替换的方式有两种,一种allow gaps(fixed penalty),所有替换打分类似。另一种varing penalties for edit operations.包含transition(嘌呤和嘌呤,嘧啶和嘧啶),tranversion(嘌呤嘧啶随意转换) ,polymerase(不懂)。
Formulation 4. varing gap cost models
linear gap penality(用时短), affine gap penalty,general gap penalty,frame-aware gap penalty(蛋白比对,multiples of 3 disrupt coding regions), seek sulicated regions, rearrangements
Introduction to principles of dynical programming
由于找到最优序列比对是一个指数级的运算,所以省时省力,需要使用动态规划。(指数型到多项式)
斐波那契数列:
1 |
|
DP for sequence alignment
- Parameterization 参数化
- sub-problem space 划分亚空间
- traversal order 遍历的顺序
- recursion formula 递归公式
- trace-back
Apply dynamic programming to sequence alignment?
key: score is additive, smaller to larger
calculate maximun alignment score of longer sequences based on previously-computed to scores of shorter sequences.
