KEGG 使用注意事项

  1. bta里的pathway个数在不断增加,过去抓取的和现在的混着用就会出错;
  2. 批量下载KEGG Mapper生成的图像时,由于网络状况可能导致下载不完全,请一定仔细核实数目是否对应,图像是否完整;

KEGG ORTHOLOGY (KO) Database

在KEGG中,分子水平上的功能保存在KO(KEGG Orthology)数据库中。这些功能与直系同源组联系在一起,以此来使得一个特殊物种的实验数据可以被扩展到其他物种。KEGG中的基因组注释是直系同源注释,其方式为,为GENES数据库中的每个基因制定KO iden­ti­fiers (K num­bers) 。对于原始数据,像由RefSeq或者GenBank给出的基因名和描述,即使他们和KO的分配不一致,KEGG也不会做任何修改。

将KO的条目与功能表征的序列数据的实验证据联系在一起的工作,已经开始了,并且现在已经展示在REFERENCE下的SEQUENCE子域中。而且,基因组层面的“KEGG GENES”(http://www.genome.jp/kegg/genes.html)集合已经被扩展,使其可以将蛋白数据也包含在附录中。最终KO数据库将覆盖所有的功能表征蛋白序列信息(另见“KEGG Enzyme”(http://www.genome.jp/kegg/annotation/enzyme.html))。

In KEGG, mol­e­c­u­lar-lev­el func­tions are stored in the KO (KEGG Orthol­o­gy) data­base and asso­ci­at­ed with ortholog groups in order to enable exten­sion of exper­i­men­tal evi­dence in a speci­fic organ­ism to oth­er organ­isms. Genome anno­ta­tion in KEGG is ortholog anno­taion, assign­ing KO iden­ti­fiers (K num­bers) to indi­vid­u­al genes in the GENES data­base. No updates are made to orig­i­nal data, such as gene names and descrip­tions given by Ref­Seq or Gen­Bank, even if they are incon­sis­tent with the KO assign­ment.

Major efforts have been ini­tat­ed to asso­ciate each KO entry with exper­i­men­tal evi­dence of func­tion­al­ly char­ac­ter­ized sequence data, now shown in the SEQUENCE sub­field of the REFERENCE field. Fur­ther­more, the genome-based col­lec­tion of KEGG GENES has been expand­ed to allow indi­vid­u­al pro­tein data to be includ­ed in the adden­dum cat­e­go­ry. Even­tu­al­ly the KO data­base will cov­er all knowl­edge on func­tion­al­ly char­ac­ter­ized pro­tein sequences (see also KEGG Enzyme).

一般来说,KO对功能直系同源的划分是定义在KEGG分子网络的语境中(KEGG path­way maps, BRITE hier­ar­chies and KEGG modules)。KEGG分子网络实际上是由K numbers标识的网络节点表示的。KOs和相应的分子网络的关系呗存储在下面这个系统中。

KEGG Orthol­o­gy (KO

将功能信息和直系同源组关联在一起这个功能是KEGG资源的一个独特的功能。基于有限总量的实验数据生成的对序列相似性的预测被预先定义好在KEGG中。如同在BlastKOALA和其他工具中实现的那样,对KEGG GENES的序列相似性搜索是针对K numbers的。一旦一个K numbers被指定给基因组中的基因,KEGG path­ways maps, Brite hierarchies,和KEGG modules都会自动重建。如此一来,就能对较高水平的功能有一个生物学上的科学的诠释。

In gen­er­al KO group­ing of func­tion­al orthologs is defined in the con­text of KEGG mol­e­c­u­lar net­works (KEGG path­way maps, BRITE hier­ar­chies and KEGG mod­ules), which are in fact rep­re­sent­ed as net­works of nodes iden­ti­fied by K num­bers. The rela­tion­ships between KOs and cor­re­spond­ing mol­e­c­u­lar net­works are rep­re­sent­ed in the fol­low­ing KO sys­tem.

KEGG Orthol­o­gy (KO)The fact that func­tion­al infor­ma­tion is asso­ci­at­ed with ortholog groups is a unique aspect of the KEGG resource. The sequence sim­i­lar­i­ty based infer­ence as a gen­er­al­iza­tion of lim­it­ed amount of exper­i­men­tal evi­dence is pre­de­fined in KEGG. As imple­ment­ed in BlastKOALA and oth­er tools, the sequence sim­i­lar­i­ty search again­st KEGG GENES is a search for most appro­pri­ate K num­bers. Once K num­bers are assigned to genes in the genome, the KEGG path­ways maps, Brite hier­ar­chies, and KEGG mod­ules are auto­mat­i­cal­ly recon­struct­ed, enabling bio­log­i­cal inter­pre­ta­tion of high-lev­el func­tions.

DAVID/DAVID-WS使用技巧

DAVID-WS(网络服务)被开发出来,使用户完成任务无需进行人工交互,而是编程接入DAVID,经由状态网络服务实现自动化。

DAVID-WS (web ser­vice) has been devel­oped to auto­mate user tasks by pro­vid­ing state­ful web ser­vices to access DAVID pro­gram­mat­i­cal­ly with­out the need for human inter­ac­tions. [1]

DAVID-WS通过保留一个用户在一次查询会话中的状态相关的操作输入,使这些输入能在用户该次会话接下来的操作中被获取,从而达到状态化。用户可以增添基因列表,改变分析背景总体,选择物种和种类,重置数据分析的功能参数,在一次会话中调用所有工具以及按照希望规范输出。

DAVID-WS is made state­ful by keep­ing the state-relat­ed input of a user oper­a­tion in a ses­sion con­text that can be accessed by sub­se­quent user oper­a­tions with­in the same ses­sion. Users can add lists, change back­ground pop­u­la­tions, select species and cat­e­gories and reset func­tion­al para­me­ters for data analy­sis, as well as query all tools with­in the same ses­sion and for­mat out­put as desired. [1]

[1] Jiao, X., Sher­man, B.T., Huang da, W., Stephens, R., Basel­er, M.W., Lane, H.C., and Lem­picki, R.A. (2012). DAVID-WS: a state­ful web ser­vice to facil­i­tate gene/protein list analy­sis. Bioin­for­mat­ics 28, 1805–1806.

Perl使用注意及技巧

  1. 位置信息
    1. (子)脚本所在的位置:/home/wangyu/
    File::Spec
    my $path_curf = File::Spec->rel2abs(__FILE__);
    my ($vol, $dirs, $file) = File::Spec->splitpath($path_curf);
    2. 从哪里调用的(主)脚本:/home/wangyu/code
    $ENV{'PWD'}
    3. 程序目前切换(chdir)到哪里了:/lustre/Work
    `pwd`
    解释:
    1. 我用a.pl调用b.pl,主脚本为a.pl,子脚本为b.pl;
    2. a.pl在/home/wangyu/code/perl, b.pl在/home/wangyu;
    3. 使用chdir切换了到/lustre/Work以后,调用b.pl,在b.pl里面,使用三种方式判断路径。
  2. perl –d: 打开调试功能
  3. windows下,html中指定路径:“file:\/\/\/path_to_the_file”;
  4. 对读入的数据进行split前,注意,要用chomp处理;
    因为,读入的数据的末尾的换行符会被分配到最后一串字符里。
    其实际影响案例有:1. 如果一个变量$var包含了换行符,我把这个变量放在system “gzip –d –c $var > filename”,这条命令$var后面的就无法生效,因为在$var已经敲了回车了。
  5. Instal­la­tion:
    perl -MCPAN -e shell
    install SOAP::Lite
  6. Your Perl is con­fig­ured to link again­st libgdbm,but libgdbm.so was not found.:aptitude install libgdbm-dev
  7. Please tell me where I can find your apache src:
  8. Func­tion Round: int($number+0.5)
  9. Unquot­ed string “..” may clash with future reserved word
    I meet this warn­ing because my file­han­dle is low­er­case with the “warn­ing” on. It’s bet­ter to use upper­case as devel­op­ers wish.
  10. $$: 该脚本的进程号;
  11. 微型Perl: 修改文件内容
     perl -p -i -e 's/from/to/' *.file

    –p:输出本行内容(-n: 不输出本行内容)
    –i:指定备份文件后缀名,如果给出-i选项并且没有指定后缀名,则覆盖原文件 (-i.bak)
    –e:需要运行的perl代码,分号分割,可写多条语句。计数变量可用。
    *.file: 需要修改的文件

  12. 已安装模块备份及重装
    #所有安装的模块信息存储在:
    #/home/nott/.cpan/Bundle/Snapshot_2017_03_10_00.pm
    perl -MCPAN -eautobundle 
    
    #重装
    perl -MCPAN -e 'install Bundle::Snapshot_2017_03_10_00'
    
    
  13. 选择性正则匹配:/(.snp.gz|.snp.tar.gz|.snp)/,匹配上的模式保存在$1