Variety of Formulae

1. Nucleotide Diver­si­ty

Screenshot from 2016-12-30 20-44-52

where xi and xj are the respec­tive fre­quen­cies of the ith and jth sequences, πij is the num­ber of nucleotide dif­fer­ences per nucleotide site between the ith and jth sequences, and n is the num­ber of sequences in the sam­ple.

2. vari­ance


3. the aver­age k-mer cov­er­age

如果k=L, 根据公式,那么Ckmer=0。但是这明显是错的,实际应该是1,因为,每个k-mer(reads),至少被覆盖了一次。而在k=L的极端情况下,只会存在少量频率大于1的k-mer。因为reads之间,除了PCR duplication会生成完全相同的reads,一般情况下,两条reads完全相同的概率是非常小的。


Leave a Reply

Your email address will not be published. Required fields are marked *