SampleReadsFromBamFiles

下采样一定数量的reads

http://seqanswers.com/forums/showthread.php?t=41983
https://bioinformatics.stackexchange.com/questions/402/how-can-i-downsample-a-bam-file-while-keeping-both-reads-in-pairs
方法1

1
2
3
4
5
6
(
samtools view -H [bamfile];
samtools view -F 0x004 [bamfile] |
java -jar StreamSampler.jar [# of reads to sample] [total # reads]
) |
samtools -bS - > [sampled bam file]

方法2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function SubSample {
## Calculate the sampling factor based on the intended number of reads:
FACTOR=$(samtools idxstats $1 | cut -f3 | awk -v COUNT=$2 'BEGIN {total=0} {total += $1} END {print COUNT/total}')

if [[ $FACTOR > 1 ]]
then
echo '[ERROR]: Requested number of reads exceeds total read count in' $1 '-- exiting' && exit 1
fi

sambamba view -s $FACTOR -f bam -l 5 $1

}

## Usage example, selecting 100.000 reads:
SubSample in.bam 100000 > subsampled.bam

下采样一定百分比的reads

samtools view -bs 42.1 in.bam > subsampled.bam