参考https://blog.csdn.net/sunchengquan/article/details/85176940
下采样一定数量的reads
http://seqanswers.com/forums/showthread.php?t=41983
https://bioinformatics.stackexchange.com/questions/402/how-can-i-downsample-a-bam-file-while-keeping-both-reads-in-pairs
方法1
1 | ## 需要额外下载StreamSampler.jar,并且有bam文件版本限制,不好用 |
方法2 sample 1000 reads/cell,可用
1 | ##在shell直接执行,存在一个问题就是subsample的不是正好1000,可能是999。因为flagstat里面有两个reads相加,所以需要double number。 |
下采样一定百分比的reads
1 | samtools view -bs 42.1 in.bam > subsampled.bam |
过滤低mapping质量的reads,并且sam转bam
1 | samfile=out.sam |
过滤PCR duplication
1 | samtools rmdup in.bam -o rmdup.bam |
bam, sam的相互转化
1 | samtools view -bS in.sam> out.bam |
1 | /samtools view -h -f 0x002 file.bam |\ |
从bam file中提取固定标签的文件
1 | cd /media/ggj/home/ggj/tmp/DarkReaction/barcode/nofilter |