summary of phylogenetic tree


2014-RAxML ver­sion 8  ->  2006-RAxML-VI-HPC  ->  2005-RAxML-III


=>  1981-Evo­lu­tion­ary Trees from DNA Sequences: A Max­i­mum Like­li­hood Approach


=>  Max­i­mum Like­li­hood Approach ->  sta­tis­tics

Beavis effect

In a sim­u­la­tion study, William D. Beav­is showed that the aver­age esti­mates of phe­no­typ­ic vari­ances asso­ci­at­ed with cor­rect­ly iden­ti­fied QTL were great­ly over­es­ti­mat­ed if only 100 prog­eny were eval­u­at­ed, slight­ly over­es­ti­mat­ed if 500 prog­eny were eval­u­at­ed, and fair­ly close to the actu­al mag­ni­tude when 1000 prog­eny were eval­u­at­ed.


QTL Analysis

a) Quan­ti­ta­tive trait locus (QTL) map­ping requires parental strains (red and blue plots) that dif­fer genet­i­cal­ly for the trait, such as lines cre­at­ed by diver­gent arti­fi­cial selec­tion.

b) The parental lines are crossed to cre­ate F1 indi­vid­u­als (not shown), which are then crossed among them­selves to cre­ate an F2, or crossed to one of the par­ent lines to cre­ate back­cross prog­eny. Both of the­se cross­es pro­duce indi­vid­u­als or strains that con­tain dif­fer­ent frac­tions of the genome of each parental line. The phe­no­type for each of the­se recom­bi­nant indi­vid­u­als or lines is assessed, as is the geno­type of mark­ers that vary between the parental strains.

c) Sta­tis­ti­cal tech­niques such as com­pos­ite inter­val map­ping eval­u­ate the prob­a­bil­i­ty that a mark­er or an inter­val between two mark­ers is asso­ci­at­ed with a QTL affect­ing the trait, while simul­ta­ne­ous­ly con­trol­ling for the effects of oth­er mark­ers on the trait. The results of such an analy­sis are pre­sent­ed as a plot of the test sta­tis­tic again­st the chro­mo­so­mal map posi­tion, in recom­bi­na­tion units (cM). Posi­tions of the mark­ers are shown as tri­an­gles. The hor­i­zon­tal line marks the sig­nif­i­cance thresh­old. Like­li­hood ratios above this line are for­mal­ly sig­nif­i­cant, with the best esti­mate of QTL posi­tions given by the chro­mo­so­mal posi­tion cor­re­spond­ing to the high­est sig­nif­i­cant like­li­hood ratio. Thus, the fig­ure shows five pos­si­ble QTL, with the best-sup­port­ed QTL around 10 and 60 cM.


plink –23file JPT-NA19001.snp JPT ID002 –out JPT-NA19001

plink –bfile JPT-NA19001 –exclude merge.missnp –make-bed –out new

plink –bfile source1 –bmerge source2_­tri­al –make-bed –out merged_­tri­al

plink –merge-list merge_list –make-bed –out merge


1. 面向对象编程的奥义在于每种数据都自带其操作,这样使用者就不必了解如何操作复杂的数据结构了,而只需要学习这种数据的接口即可;


C++ 模板与泛型编程

泛型编程旨在编写独立于数据类型的代码” 《c++ primer plus》(6th ed)



using namespace std;

template <class Nott>
class Stack{
    Nott arr[20];
    int num;
    void push(const Nott& ele);
    void print();

template <class Nott>
  num = 0;

template <class Nott>
void Stack<Nott>::push(const Nott& ele){
  arr[num] = ele;
  num ++; 

template <class Nott>
void Stack<Nott>::print(){
  for(int i = 0;i < num;i ++) 
    cout << arr[i] << " ";
  cout << endl;

int main(){
  Stack<char> nott;
  Stack<int> nottt;
  return 0;


6 6 6 6

HOX gene

ref: 2013-the reg­u­la­tion of hox gene expres­sion dur­ing ani­mal devel­op­ment


home­o­sis the replace­ment of part of one seg­ment of an insect or oth­er sege­ment­ed ani­mal by a struc­ture char­ac­ter­is­tic of a dif­fer­ent seg­ment, espe­cial­ly through muta­tion.
home­obox any of a class of close­ly sim­i­lar sequences which occur in var­i­ous genes and are involved in reg­u­lat­ing embry­on­ic devel­op­ment in a wide range of species



GATK caveat

1. 选择/过滤

Vari­ant­Fil­tra­tion: Fil­ter vari­ant calls based on INFO and/or FORMAT anno­ta­tions
out­put: A fil­tered VCF in which pass­ing vari­ants are anno­tat­ed as PASS and fail­ing vari­ants are anno­tat­ed with the name(s) of the filter(s) they failed.
Select­Vari­ants:    Select a sub­set of vari­ants from a VCF file.



notes of ANNOVAR

1. 坐标系: By default, 1-based coor­di­nate sys­tem is used.

2. 核心程序:

3. 注释类型: gene-based (-genean­no), region-based (-regio­nan­no) and fil­ter-based (-fil­ter) anno­ta­tions.

4. 输出结果:

a. The first file con­tains anno­ta­tion for all vari­ants, by adding two columns to the begin­ning of each input line.

b. The sec­ond out­put file con­tains the amino acid changes as a result of the exon­ic vari­ant.

5. 重点定位:

What about GFF3 file for new species?(

gff3­To­GenePred                                                                               (