python 单元测试

unittest

https://docs.python.org/2/library/unittest.html

A test­case is cre­at­ed by sub­class­ing unittest.TestCase. The three indi­vid­u­al tests are defined with meth­ods whose names start with the let­ters test. This nam­ing con­ven­tion informs the test run­ner about which meth­ods rep­re­sent tests.

The crux of each test is a call to assertE­qual() to check for an expect­ed result; assert­True() or assert­False() to ver­i­fy a con­di­tion; or asser­tRais­es() to ver­i­fy that a speci­fic excep­tion gets raised. The­se meth­ods are used instead of the assert state­ment so the test run­ner can accu­mu­late all test results and pro­duce a report.

The setUp() and tear­Down() meth­ods allow you to define instruc­tions that will be exe­cut­ed before and after each test method. They are cov­ered in more detail in the sec­tion Orga­niz­ing test code.

The final block shows a sim­ple way to run the tests. unittest.main() pro­vides a com­mand-line inter­face to the test script. When run from the com­mand line, the above script pro­duces an out­put that looks like this:

PCA — 数据降维

原数据,2维:(3,4),(6,8)

新数据,2维:(5,0), (10,0)

最终简化为一维:5, 10

从几何来理解,就是坐标轴的旋转。

这里降维的理由:所有的点实际上都是分布在y=(4/3)X这条斜线上的。

relat­ed posts:

1. http://www.iro.umontreal.ca/~pift6080/H09/documents/papers/pca_tutorial.pdf

2. https://stats.stackexchange.com/questions/90331/step-by-step-implementation-of-pca-in-r-using-lindsay-smiths-tutorial

3. http://www.cnblogs.com/pangxiaodong/archive/2011/10/15/2212786.html

 

 

git

1. 提交

git add .

git com­mit –m “your com­ments about this sub­mis­sion”

git push orig­in mas­ter

2. 下载

计算库

1.BLAS library

BLAS (Basic Lin­ear Alge­bra Sub­pro­grams)

2.LINPACK

3. binu­tils

4.

eigensoft 7.2.1

## shrinkmode added. 3/15

EIGENSOFT ver­sion 7.2.1, 06/30/17 (for Lin­ux only)

 

The EIGENSOFT pack­age imple­ments meth­ods from the fol­low­ing 2 papers:

Pat­ter­son et al. 2006 PLoS Genet 2:e190 (pop­u­la­tion struc­ture)

Price et al. 2006 Nat Genet 38:904–9 (EIGENSTRAT strat­i­fi­ca­tion cor­rec­tion)

 

1. 升级日志

NEW fea­tures of EIGENSOFT ver­sion 7.2.0

– shrinkmode

 

NEW fea­tures of EIGENSOFT ver­sion 6.1.4 include:

– pcas­e­lec­tion was omit­ted from 6.1.3 by acci­dent

– Sta­t­i­cal­ly linked GSL/openblas

– Fixed mem­o­ry allo­ca­tion bug in pcas­e­lec­tion

– Some rou­ti­nes moved into nick­lib

– Error mes­sage on allo­cate fail­ure now prints length as “%ld”

sup­port­ing long val­ues.

 

NEW fea­tures of EIGENSOFT ver­sion 6.1.3 include:

– Restored script file exten­sions to .perl instead of .pl

– Added updat­ed ploteig script that dis­ap­peared from the repos­i­to­ry

 

NEW fea­tures of EIGENSOFT ver­sion 6.1.2 include:

– Updat­ed license info to be GPL com­pli­ant required by link­ing the GSL

 

NEW fea­tures of EIGENSOFT ver­sion 6.1.1 include:

– Minor bug fix to cor­rect­ly merge ver­sion 6.0.2 and ver­sion 6.1 changes.

– pcas­e­lec­tion oper­ates on evec files. Added exam­ples.

– Back­port­ed twtable.c/h from EIGENSOFT 7al­pha

 

NEW fea­tures of EIGENSOFT ver­sion 6.1 include:

– The range find­ing step of PCA fast­mode only scales the mul­ti­plied matrix,

as orthog­o­nal­iza­tion is unnec­es­sary. This appears to improve accu­ra­cy.

 

NEW fea­tures of EIGENSOFT ver­sion 6.0.2 include:

– Fixed Make­file and doc­u­men­ta­tion to build eigen­strat prop­er­ly

– Moved Tra­cy-Widom table into a head­er file for eas­ier build­ing

 

NEW fea­tures of EIGENSOFT ver­sion 6.0.1 include:

– Minor bug fix which pre­vents smart­p­ca from try­ing to print out eigen­val­ues

if fast­mode is set.

 

NEW fea­tures of EIGENSOFT ver­sion 6.0.0beta includ­ed:

– New option fast­mode which imple­ments a very fast pca approx­i­ma­tion.

See POPGEN/README and Galin­sky 2014 ASHG talk.

– Changes to exter­nal pack­ages required.  EIGENSOFT ver­sion 5.0.2 required

lapack + blas.  On the oth­er hand, EIGENSOFT ver­sion 6.0beta requires

GSL + lapack + Open­BLAS (but does not require the native ver­sion of blas).

The Make­file has been changed accord­ing­ly.

EIGENSOFT ver­sion 6.0beta sup­ports mul­ti-thread­ing.  See POPGEN/README.

– Bug fix for ldregress option.

 

2. 手册位置

See CONVERTF/README for doc­u­men­ta­tion of pro­grams for con­vert­ing file for­mats.

See POPGEN/README for doc­u­men­ta­tion of pop­u­la­tion struc­ture pro­grams.

See EIGENSTRAT/README for doc­u­men­ta­tion of EIGENSTRAT pro­grams.

 

Ques­tions?

See https://www.hsph.harvard.edu/alkes-price/eigensoft-frequently-asked-questions/

https://github.com/DReichLab/EIG

 

For ques­tions about build­ing this soft­ware:

Matthew Mah <matthew_mah@hms.harvard.edu>

 

For ques­tions about smart­p­ca:

Nick Pat­ter­son <nickp@broadinstitute.org>

 

For ques­tions about eigen­strat:

Alkes Price <aprice@hsph.harvard.edu>

 

3. 可执行程序以及源代码

—————————-

All C exe­cuta­bles are in the bin/ direc­to­ry.

 

We have placed source code for all C exe­cuta­bles in the src/ direc­to­ry,

for users who wish to mod­i­fy and recom­pile our pro­grams.  For exam­ple, to

recom­pile the eigen­strat pro­gram, type

cd src”

make eigen­strat”

mv eigen­strat ../bin”

 

Note that some of our soft­ware will only com­pile if your sys­tem has the

GSL + lapack + Open­BLAS pack­ages installed.

 

On Mac OSX, you can install gsl and Open­BLAS with lapack using home­brew:

brew install gsl”

brew install homebrew/science/openblas”

 

If the­se pack­ages are not in stan­dard direc­to­ries, you can spec­i­fy the

appro­pri­ate include and library direc­to­ries with the CFLAGS and LDFLAGS

make vari­ables.

For exam­ple, on the Har­vard Med­ical School O2 clus­ter, the com­mand is:

make CFLAGS=”-I/n/app/openblas/0.2.19/include –I/n/app/gsl/2.3/include” LDFLAGS=”-L/n/app/openblas/0.2.19/lib –L/n/app/gsl/2.3/lib/“‘

On Mac OSX:

make CFLAGS=”-I/usr/local/opt/openblas/include –I/usr/local/opt/gsl/include” LDFLAGS=”-L/usr/local/opt/openblas/lib –L/usr/local/opt/gsl/lib“‘

 

If you have issues with miss­ing lapacke sym­bols, for exam­ple “unde­fined ref­er­ence to ‘LAPACK­E_d­lange’”, run make with the cor­re­spond­ing addi­tion­al libraries linked:

make LDLIBS=”-llapacke“‘

This has been encoun­tered on Lin­ux Mint 18.

 

If you have trou­ble com­pil­ing and run­ning our code, try com­pil­ing and

run­ning the pca­toy pro­gram in the src direc­to­ry:

cd src”

make pca­toy”

./pcatoy”

If you are unable to run the pca­toy pro­gram suc­cess­ful­ly, please con­tact

your sys­tem admin­is­tra­tor for help, as this is a sys­tems issue which is

beyond our scope. Your sys­tem admin­is­tra­tor will be able to trou­bleshoot

your sys­tems issue using this triv­ial pro­gram.  [You can also try run­ning

the pca­toy pro­gram in the bin direc­to­ry, which we have already com­piled.]

 

To remake the entire pack­age:

cd src”

make clob­ber”

make install”

 

To remake EIG7.2 it is nec­es­sary to link to the Open­BLAS library. On orches­tra,

the path is /opt/openblas and should work auto­mat­i­cal­ly. On Broad insti­tute machi­nes,

the user should exe­cute “use .openblas-0.2.8” and “use GCC-4.9” at the com­mand

prompt before attempt­ing to remake. All oth­er users should install Open­BLAS and

set the vari­able OPENBLAS to the path at the make com­mand line,

e.g. “make install OPENBLAS=/usr/local/openblas”

 

—————————-

4. 致谢

EIGENSOFT was writ­ten by Nick Pat­ter­son, Alkes Price, Samue­la Pol­lack,

Kev­in Galin­sky, Chris Chang, and Sasha Gusev.

 

We thank John Novem­bre and Mike Boursnell for code improve­ments, Matt Han­na

for the first imple­men­ta­tion of mul­ti-thread­ing, and Ange­la Yu for a bug­fix.

 

—————————-

5.软件版权通知协议

This soft­ware and its doc­u­men­ta­tion are copy­right (2010) by Har­vard Uni­ver­si­ty

and The Broad Insti­tute. All rights are reserved. This soft­ware is sup­plied

with­out any war­ran­ty or guar­an­teed sup­port what­so­ev­er. Nei­ther Har­vard

Uni­ver­si­ty nor The Broad Insti­tute can be respon­si­ble for its use, mis­use, or

func­tion­al­i­ty. The soft­ware may be freely copied for non-com­mer­cial pur­pos­es,

pro­vid­ed this copy­right notice is retained.

 

PCA summary

方法:

1. 使用的pop­u­la­tion scale SNPs

2. EIGENSOFT 4.2

结果解读:

亚洲的野猪和家猪聚类在一起;欧洲的野猪和家猪以及巴克夏猪聚类在一起;非洲的疣猪和四种野生猪聚在一起(另外这四种野生猪是否也是非洲的?);

引文:

2014 — Whole-genome sequenc­ing of Berk­shire (Euro­pean native pig) pro­vides insights into its orig­in and domes­ti­ca­tion

BMC genomics — 2017 — Oreochromis niloticus (Nile Tilapia) — sex determination regions

Sex deter­mi­na­tion regions

The new O_niloti­cus_UMD1 assem­bly was used to study sequence dif­fer­en­ti­a­tion across two sex-deter­min­ing regions in tilapi­as. The first region is an XX/XY sex-deter­mi­na­tion region on LG1 found in many strains of til-apia [9, 34, 44–47]. We pre­vi­ous­ly char­ac­ter­ized this region by whole genome Illu­mi­na re-sequenc­ing of pooled DNA from males and females [48]. We realigned the­se sequences to the new O_niloti­cus_UMD1 assem­bly and searched for vari­ants that were fixed in the XX female pool and poly-mor­phic in the XY male pool. Fig­ure 4 shows the FST and the sex-pat­terned vari­ant alle le fre­quen­cies for the XX/XY O. niloti­cus com­par­ison across the com­plete Orenil1.1 and O_niloti­cus_UMD1 assem­blies, while Fig. 5 focus­es on the high­ly dif­fer­en­ti­at­ed ~9Mbp region on LG1 with a sub­stan­tial num­ber of sex-pat­terned vari­ants, indica­tive of a reduc­tion in recom­bi­na­tion in a sex deter­mi­na­tion region that hasexistedforsometime[48].

The sec­ond sex com­par­ison is for an ZZ/WZ sex-deter­mi­na­tion region on LG3 in a strain of O. aureus [11,49]. This region has not pre­vi­ous­ly been char­ac­ter­ized using whole genome sequenc­ing. For this com­par­ison we iden­ti­fied vari­ant alle­les fixed in the ZZ male pool and poly­mor­phic in the WZ female pool. Fig­ure 6 shows the FST and the sex-pat­terned vari­ant allele fre­quen­cies for this com­par­ison across the whole O_niloti­cus_UMD1 assem­bly, while Fig. 7 focus­es on the dif­fer­en­ti­at­ed region on LG3. O. aureus LG3 con­tains a large ~50Mbp region of dif­fer­en­ti­at­ed sex-pat­terned vari­ants, also indica­tive of a reduc­tion in recom­bi­na­tion in the sex deter­mi­na­tion region. Fig­ure 6 also shows this dif­fer­en­ti­a­tion pat­tern on sev­er­al oth­er LGs (LG7, LG9, LG14, LG16, LG18, LG22 and LG23). It is pos­si­ble that the­se small­er regions of sex-pat­terned dif­fer­en­ti­a­tion are actu­al­ly translo­ca­tions in O.aureus rel­a­tive to the O. niloti­cus genome assem­bly.

summary of phylogenetic tree

Tools:

2014-RAxML ver­sion 8  ->  2006-RAxML-VI-HPC  ->  2005-RAxML-III

Meth­ods:

=>  1981-Evo­lu­tion­ary Trees from DNA Sequences: A Max­i­mum Like­li­hood Approach

The­o­ries:

=>  Max­i­mum Like­li­hood Approach ->  sta­tis­tics

一种概率论在统计学的应用,它是参数估计的方法之一。

 

进化模型=替换矩阵


con­cate­na­tion: 将关心的基因连在一起做

Beavis effect

In a sim­u­la­tion study, William D. Beav­is showed that the aver­age esti­mates of phe­no­typ­ic vari­ances asso­ci­at­ed with cor­rect­ly iden­ti­fied QTL were great­ly over­es­ti­mat­ed if only 100 prog­eny were eval­u­at­ed, slight­ly over­es­ti­mat­ed if 500 prog­eny were eval­u­at­ed, and fair­ly close to the actu­al mag­ni­tude when 1000 prog­eny were eval­u­at­ed.

(http://www.genetics.org/content/165/4/2259)

QTL Analysis

a) Quan­ti­ta­tive trait locus (QTL) map­ping requires parental strains (red and blue plots) that dif­fer genet­i­cal­ly for the trait, such as lines cre­at­ed by diver­gent arti­fi­cial selec­tion.

b) The parental lines are crossed to cre­ate F1 indi­vid­u­als (not shown), which are then crossed among them­selves to cre­ate an F2, or crossed to one of the par­ent lines to cre­ate back­cross prog­eny. Both of the­se cross­es pro­duce indi­vid­u­als or strains that con­tain dif­fer­ent frac­tions of the genome of each parental line. The phe­no­type for each of the­se recom­bi­nant indi­vid­u­als or lines is assessed, as is the geno­type of mark­ers that vary between the parental strains.

c) Sta­tis­ti­cal tech­niques such as com­pos­ite inter­val map­ping eval­u­ate the prob­a­bil­i­ty that a mark­er or an inter­val between two mark­ers is asso­ci­at­ed with a QTL affect­ing the trait, while simul­ta­ne­ous­ly con­trol­ling for the effects of oth­er mark­ers on the trait. The results of such an analy­sis are pre­sent­ed as a plot of the test sta­tis­tic again­st the chro­mo­so­mal map posi­tion, in recom­bi­na­tion units (cM). Posi­tions of the mark­ers are shown as tri­an­gles. The hor­i­zon­tal line marks the sig­nif­i­cance thresh­old. Like­li­hood ratios above this line are for­mal­ly sig­nif­i­cant, with the best esti­mate of QTL posi­tions given by the chro­mo­so­mal posi­tion cor­re­spond­ing to the high­est sig­nif­i­cant like­li­hood ratio. Thus, the fig­ure shows five pos­si­ble QTL, with the best-sup­port­ed QTL around 10 and 60 cM.

https://www.nature.com/scitable/topicpage/quantitative-trait-locus-qtl-analysis-53904

researchers

1.pig

MIKAWA Satoshi (美川智博士)

https://researchmap.jp/read0080334/

2. phylogeny

高芳銮

https://user.qzone.qq.com/58001704/main

http://blog.sciencenet.cn/home.php?mod=space&uid=460481

plink

#snp2bed­bim­fam
plink –23file JPT-NA19001.snp JPT ID002 –out JPT-NA19001

#去除有问题的s­np
plink –bfile JPT-NA19001 –exclude merge.missnp –make-bed –out new

#merge单个文件
plink –bfile source1 –bmerge source2_­tri­al –make-bed –out merged_­tri­al

#merge多个文件
plink –merge-list merge_list –make-bed –out merge

编程哲理

1. 面向对象编程的奥义在于每种数据都自带其操作,这样使用者就不必了解如何操作复杂的数据结构了,而只需要学习这种数据的接口即可;

2.泛型编程使得编写的一种算法可以广泛用于各种类型的数据,这样就不必为每种类型的数据重新重载一次函数。