samtool命令

参考https://blog.csdn.net/sunchengquan/article/details/85176940

下采样一定数量的reads

http://seqanswers.com/forums/showthread.php?t=41983
https://bioinformatics.stackexchange.com/questions/402/how-can-i-downsample-a-bam-file-while-keeping-both-reads-in-pairs

方法1

1
2
3
4
5
6
7
## 需要额外下载StreamSampler.jar,并且有bam文件版本限制,不好用
(
samtools view -H [bamfile];
samtools view -F 0x004 [bamfile] |
java -jar StreamSampler.jar [# of reads to sample] [total # reads]
) |
samtools -bS - > [sampled bam file]

方法2 sample 1000 reads/cell,可用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
##在shell直接执行,存在一个问题就是subsample的不是正好1000,可能是999。因为flagstat里面有两个reads相加,所以需要double number。

frac=$( samtools flagstat $outbam | cut -f3 | awk 'BEGIN {total=0} {total += $1} END {frac=2000/total;{print frac}}')
int=${frac%.*}
if [ $int -eq 0 ];then
samtools view -bs $frac $outbam > $outfolder3$samplebam
fi


## 原始网页写法,但是运行terminal会闪退。

function SubSample {
## Calculate the sampling factor based on the intended number of reads:
FACTOR=$(samtools idxstats $1 | cut -f3 | awk -v COUNT=$2 'BEGIN {total=0} {total += $1} END {print COUNT/total}')

if [[ $FACTOR > 1 ]]
then
echo '[ERROR]: Requested number of reads exceeds total read count in' $1 '-- exiting' && exit 1
fi

samtools view -s $FACTOR -f bam -l 5 $1
}

## Usage example, selecting 100.000 reads:
SubSample in.bam 100000 > subsampled.bam

下采样一定百分比的reads

1
samtools view -bs 42.1 in.bam > subsampled.bam

过滤低mapping质量的reads,并且sam转bam

1
2
3
samfile=out.sam
outbam=filtered.bam
samtools view -h -q 10 $samfile | samtools view -bS > $outbam

过滤PCR duplication

1
samtools rmdup in.bam -o rmdup.bam

bam, sam的相互转化

1
2
samtools view -bS in.sam> out.bam
samtools view -h -o out.sam in.bam

1
2
/samtools view -h -f 0x002 file.bam |\
egrep "(^@|XT\:A\:U)"

从bam file中提取固定标签的文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cd /media/ggj/home/ggj/tmp/DarkReaction/barcode/nofilter
BAM_FILE="/media/ggj/home/ggj/tmp/DarkReaction/barcode/nofilter/unaligned_tagged_filtered.bam"
filter_file="/media/ggj/home/ggj/tmp/DarkReaction/barcode/barcode_88K_tag.txt"
# Save the header lines
samtools view -H $BAM_FILE > SAM_header

# Filter alignments using filter.txt. Use LC_ALL=C to set C locale instead of UTF-8
samtools view $BAM_FILE | LC_ALL=C grep -F -f $filter_file > filtered_SAM_body

# Combine header and body
cat SAM_header filtered_SAM_body > filtered.sam

# Convert filtered.sam to BAM format
samtools view -b filtered.sam > filtered.bam