summary of phylogenetic tree

Tools:

2014-RAxML ver­sion 8  ->  2006-RAxML-VI-HPC  ->  2005-RAxML-III

Meth­ods:

=>  1981-Evo­lu­tion­ary Trees from DNA Sequences: A Max­i­mum Like­li­hood Approach

The­o­ries:

=>  Max­i­mum Like­li­hood Approach ->  sta­tis­tics

Beavis effect

In a sim­u­la­tion study, William D. Beav­is showed that the aver­age esti­mates of phe­no­typ­ic vari­ances asso­ci­at­ed with cor­rect­ly iden­ti­fied QTL were great­ly over­es­ti­mat­ed if only 100 prog­eny were eval­u­at­ed, slight­ly over­es­ti­mat­ed if 500 prog­eny were eval­u­at­ed, and fair­ly close to the actu­al mag­ni­tude when 1000 prog­eny were eval­u­at­ed.

(http://www.genetics.org/content/165/4/2259)

QTL Analysis

a) Quan­ti­ta­tive trait locus (QTL) map­ping requires parental strains (red and blue plots) that dif­fer genet­i­cal­ly for the trait, such as lines cre­at­ed by diver­gent arti­fi­cial selec­tion.

b) The parental lines are crossed to cre­ate F1 indi­vid­u­als (not shown), which are then crossed among them­selves to cre­ate an F2, or crossed to one of the par­ent lines to cre­ate back­cross prog­eny. Both of the­se cross­es pro­duce indi­vid­u­als or strains that con­tain dif­fer­ent frac­tions of the genome of each parental line. The phe­no­type for each of the­se recom­bi­nant indi­vid­u­als or lines is assessed, as is the geno­type of mark­ers that vary between the parental strains.

c) Sta­tis­ti­cal tech­niques such as com­pos­ite inter­val map­ping eval­u­ate the prob­a­bil­i­ty that a mark­er or an inter­val between two mark­ers is asso­ci­at­ed with a QTL affect­ing the trait, while simul­ta­ne­ous­ly con­trol­ling for the effects of oth­er mark­ers on the trait. The results of such an analy­sis are pre­sent­ed as a plot of the test sta­tis­tic again­st the chro­mo­so­mal map posi­tion, in recom­bi­na­tion units (cM). Posi­tions of the mark­ers are shown as tri­an­gles. The hor­i­zon­tal line marks the sig­nif­i­cance thresh­old. Like­li­hood ratios above this line are for­mal­ly sig­nif­i­cant, with the best esti­mate of QTL posi­tions given by the chro­mo­so­mal posi­tion cor­re­spond­ing to the high­est sig­nif­i­cant like­li­hood ratio. Thus, the fig­ure shows five pos­si­ble QTL, with the best-sup­port­ed QTL around 10 and 60 cM.

https://www.nature.com/scitable/topicpage/quantitative-trait-locus-qtl-analysis-53904

plink

#snp2bed­bim­fam
plink –23file JPT-NA19001.snp JPT ID002 –out JPT-NA19001

#去除有问题的s­np
plink –bfile JPT-NA19001 –exclude merge.missnp –make-bed –out new

#merge单个文件
plink –bfile source1 –bmerge source2_­tri­al –make-bed –out merged_­tri­al

#merge多个文件
plink –merge-list merge_list –make-bed –out merge

编程哲理

1. 面向对象编程的奥义在于每种数据都自带其操作,这样使用者就不必了解如何操作复杂的数据结构了,而只需要学习这种数据的接口即可;

2.泛型编程使得编写的一种算法可以广泛用于各种类型的数据,这样就不必为每种类型的数据重新重载一次函数。

C++ 模板与泛型编程

泛型编程旨在编写独立于数据类型的代码” 《c++ primer plus》(6th ed)

实现一种方法,可以用于各种类型的数据。

#include<iostream>

using namespace std;

template <class Nott>
class Stack{
  private:
    Nott arr[20];
    int num;
  public:
    Stack();
    void push(const Nott& ele);
    void print();
};


template <class Nott>
Stack<Nott>::Stack(){
  num = 0;
}

template <class Nott>
void Stack<Nott>::push(const Nott& ele){
  arr[num] = ele;
  num ++; 
}

template <class Nott>
void Stack<Nott>::print(){
  for(int i = 0;i < num;i ++) 
    cout << arr[i] << " ";
  cout << endl;
}

int main(){
  Stack<char> nott;
  nott.push('N');
  nott.push('O');
  nott.push('T');
  nott.push('T');
  nott.print();
  Stack<int> nottt;
  nottt.push(6);
  nottt.push(6);
  nottt.push(6);
  nottt.push(6);
  nottt.print();
  return 0;
}

输出结果:

I C B C
6 6 6 6

HOX gene

ref: 2013-the reg­u­la­tion of hox gene expres­sion dur­ing ani­mal devel­op­ment

 

home­o­sis the replace­ment of part of one seg­ment of an insect or oth­er sege­ment­ed ani­mal by a struc­ture char­ac­ter­is­tic of a dif­fer­ent seg­ment, espe­cial­ly through muta­tion.
home­obox any of a class of close­ly sim­i­lar sequences which occur in var­i­ous genes and are involved in reg­u­lat­ing embry­on­ic devel­op­ment in a wide range of species

 

 

GATK caveat

1. 选择/过滤

Vari­ant­Fil­tra­tion: Fil­ter vari­ant calls based on INFO and/or FORMAT anno­ta­tions
out­put: A fil­tered VCF in which pass­ing vari­ants are anno­tat­ed as PASS and fail­ing vari­ants are anno­tat­ed with the name(s) of the filter(s) they failed.
Select­Vari­ants:    Select a sub­set of vari­ants from a VCF file.
out­put:
1.如果一个值缺失,VariantFiltration会认为这条值所在的记录通过检查,而SelectVariants认为这条记录不能通过检查。

2.foobar

 

notes of ANNOVAR

1. 坐标系: By default, 1-based coor­di­nate sys­tem is used.

2. 核心程序: annotate_variation.pl

3. 注释类型: gene-based (-genean­no), region-based (-regio­nan­no) and fil­ter-based (-fil­ter) anno­ta­tions.

4. 输出结果:

a. The first file con­tains anno­ta­tion for all vari­ants, by adding two columns to the begin­ning of each input line.

b. The sec­ond out­put file con­tains the amino acid changes as a result of the exon­ic vari­ant.

5. 重点定位:

What about GFF3 file for new species?(http://annovar.openbioinformatics.org/en/latest/user-guide/gene/)

gff3­To­GenePred                                                                               (http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/)