网络连接信息

  1. 什么是子网掩码?
    1. 它是一种用来指明一个IP地址的哪些位标识的是主机所在的子网,以及哪些位标识的是主机的位的掩码。
  2. 什么是默认网关
    1. 局域网里面统一收发信息的机器的IP
  3. 什么是DHCP服务器?
    1. 用来为局域网中的电脑默认分发IP的机器,遵守DHCP(一种协议);
  4. 什么是NAT?
    1. 连接内网与外网的中转站;
  5. 什么是DNS?
    1. 一个将域名与IP相互映射的数据库

DAVID — web service

  1. The data anno­tat­ed with tool (Perl ver­sion) can only gen­er­at­ed accord­ing to the old data­base rather than new.(unsolved yet 20160710)

windows 10 安装流程

  1. https://www.microsoft.com/en-us/software-download/windows10
  2. down­load “media cre­ation tool”;
  3. run “media cre­ation tool”;
  4. Select Cre­ate instal­la­tion media for anoth­er PC.
  5. down­load .iso file.

windows系统一般重装流程

  1. 安装U深度,U盘启动盘制作工具:http://down.ushendu.com/20160514/UShenDu_STA_pe.exe
  2. 启动软件
  3. 选择配置:智能模式,HDD-FAT32,兼容模式,默认设置
  4. 选择好优盘,点击开始制作;(整个过程可能需要5到10分钟)
  5. 将之前下载好的系统镜像放到,GHO文件夹下;
  6. 重启电脑,选择从U盘启动,重启电脑;
  7. 选择“运行U深度Win03PE2013增强版”;
  8. 进入winpe系统后,会自动弹出对话窗口“U深度装机工具”,并检测出之前存放于U盘中的ghost系统;(Ghost系统是指通过赛门铁克公司(Symantec Corporation)出品的Ghost在装好的操作系统中进行镜像克隆的版本,通常GHOST用于操作系统的备份,在系统不能正常启动的时候用来进行恢复的。)
  9. 选择所要安装的硬盘分区,点击确定;
  10. 然后软件会,解压文件到选择的硬盘分区,然后重启电脑开始安装;

LibreOffice Calc — common functions

  1. or: returns true if an argu­ment is true
    or(logical val­ue 1, log­i­cal val­ue 2, …)
  2. isna: returns true if val­ue equals #N/A
    isna(value)
  3. isblank: returns true if val­ue refers to an emp­ty cell
    isblank(value)
  4. not: revers­es the val­ue of the argu­ment
    not(logical val­ue)
  5. na: not avail­able. returns the error val­ue #N/A
    na()

Download and install Google Extension .crx file

  1. http://yurl.sinaapp.com/crx.php
  2. input 32-bit exten­sion ID; for exam­ple:
    If we want to down­load this exten­sion “https://chrome.google.com/webstore/detail/proxy-switchyomega/padekgcem­lok­bado­hgk­i­fi­jom­clgjgif”, then the 32-bit exten­sion ID of it is “padekgcem­lok­bado­hgk­i­fi­jom­clgjgif”.
  3. Drop the .crx file into the chrome exten­sion page.

Gene-Gene\Protein-Protein Interaction Network — Tools

  1. Cytoscape — Agi­lent Lit­er­a­ture Search
  2. STRING:
    The many func­tion­al part­ner­ships and inter­ac­tions that occur between pro­teins are at the core of cel­lu­lar pro­cess­ing and their sys­tem­at­ic char­ac­ter­i­za­tion helps to provide con­text in mol­e­c­u­lar sys­tems biol­o­gy. How­ev­er, known and pre­dict­ed inter­ac­tions are scat­tered over mul­ti­ple resources, and the avail­able data exhibit notable dif­fer­ences in terms of qual­i­ty and com­plete­ness. The STRING data­base (http://string-db.org) aims to provide a crit­i­cal assess­ment and inte­gra­tion of protein–protein inter­ac­tions, includ­ing direct (phys­i­cal) as well as indi­rect (func­tion­al) asso­ci­a­tions. The new ver­sion 10.0 of STRING cov­ers more than 2000 organ­isms, which has neces­si­tat­ed nov­el, scal­able algo­rithms for trans­fer­ring inter­ac­tion infor­ma­tion between organ­isms. For this pur­pose, we have intro­duced hier­ar­chi­cal and self-con­sis­tent orthol­o­gy anno­ta­tions for all inter­act­ing pro­teins, group­ing the pro­teins into fam­i­lies at var­i­ous lev­els of phy­lo­ge­net­ic res­o­lu­tion. Fur­ther improve­ments in ver­sion 10.0 include a com­plete­ly redesigned pre­dic­tion pipeline for infer­ring protein–protein asso­ci­a­tions from co-expres­sion data, an API inter­face for the R com­put­ing envi­ron­ment and improved sta­tis­ti­cal analy­sis for enrich­ment tests in user-pro­vid­ed net­works.
  3. 1

Ubuntu Desktop Entry

System-wide:/usr/share/applications

user-only:/.local/share/applications

[Desk­top Entry]
Version=1.0
Name=TopCoder Are­na
Exec=javaws /path_to_thefile/ContestAppletProd.jnlp
Terminal=false
Icon=/pathtothefile/Topcoder_arena.png
Type=Application
Categories=Development

Gnome/Unity  Menu edi­tor: menuli­bre

Linux — Installation of Oracle Jave JDK/JRE

  1.  check OS archi­tec­ture: 32-bit or 64-bit?

    file /lib/systemd/systemd

  2. check if we have Java installed on our OS

    java –ver­sion

    if we should unin­stall Open­JDK if we have one installed on our OS.

    sudo apt-get purge open­jdk-\*

  3. cre­ate a direc­to­ry to hold our Ora­cle Java JDK bina­ries.

    sudo mkdir –p /usr/local/oracle-java

  4. Down­load Ora­cle Java JDK for lin­ux
    link:http://www.oracle.com/technetwork/java/javase/downloads/index.html
  5. copy the file we down­load to the direc­to­ry we cre­ate

    sudo cp –r jdk-8u20-linux-i586.tar.gz /usr/local/oracle-java/
    cd /usr/local/oracle-java/

  6. unpack the com­pressed file we down­load

    sudo tar xvzf jdk-8u91-linux-x64.tar.gz

  7. Edit the sys­tem PATH file /etc/profile and add the fol­low­ing sys­tem vari­ables to your sys­tem path.

    sudo vim /etc/profile

    then add the fol­low­ing lines below to the end of the file pro­file (final­ly save and exit):

    JAVA_HOME=/usr/local/oracle-java/jdk1.8.0_91
    PATH=$PATH:$JAVA_HOME/bin
    export JAVA_HOME
    export PATH

  8. Inform our OS where our Ora­cle Java JDK is locat­ed.

    sudo update-alter­na­tives –install “/usr/bin/java” “java” “/usr/local/oracle-java/jdk1.8.0_91/bin/java” 1
    sudo update-alter­na­tives –install “/usr/bin/javac” “javac” “/usr/local/oracle-java/jdk1.8.0_91/bin/javac” 1
    sudo update-alter­na­tives –install “/usr/bin/javaws” “javaws” “/usr/local/oracle-java/jdk1.8.0_91/bin/javaws” 1

  9. Inform our OS that Ora­cle Java JDK must be the default Java.

    sudo update-alter­na­tives –set java /usr/local/oracle-java/jdk1.8.0_91/bin/java
    sudo update-alter­na­tives –set javac /usr/local/oracle-java/jdk1.8.0_91/bin/javac
    sudo update-alter­na­tives –set javaws /usr/local/oracle-java/jdk1.8.0_91/bin/javaws

  10. Reload our sys­tem wide PATH /etc/profile

    source /etc/profile

  11. test if we hava Java installed

    java –ver­sion
    javac –ver­sion

  12. Suc­cess­ful­ly!

linux常用权限

read­able? writable? exe­cutable?
7 1 1 1
6 1 1 0
5 1 0 1
4 1 0 0
3 0 1 1
2 0 1 0
1 0 0 1
0 0 0 0

4=readable 2=writable 1=executable
1=yes 0=no

Often used:

750,755

能不能不可读但可写?

Ubuntu 常见问题

  1. txt乱码(windows下生成的文本在linux下打开)
    使用iconv解决。
    The iconv program reads in text in one encoding and outputs the text in
    another encoding. If no input files are given, or if it is given as a
    dash (-), iconv reads from standard input. If no output file is given,
    iconv writes to standard output.
    iconv [options] [-f from-encoding] [-t to-encoding] [inputfile] > [outputfile]
    iconv -f gb2312 -t utf8 [inputfile] -o [outputfile]
  2. windows下编写的Perl程序不能在Linux下运行
    使用dos2unix解决
    The Dos2unix package includes utilities "dos2unix" and "unix2dos" to
    convert plain text files in DOS or Mac format to Unix format and vice
    versa.
    In DOS/Windows text files a line break, also known as newline, is a
    combination of two characters: a Carriage Return (CR) followed by a
    Line Feed (LF). In Unix text files a line break is a single character:
    the Line Feed (LF). In Mac text files, prior to Mac OS X, a line break
    was single Carriage Return (CR) character. Nowadays Mac OS uses Unix
    style (LF) line breaks.
    Besides line breaks Dos2unix can also convert the encoding of files. A
    few DOS code pages can be converted to Unix Latin-1. And Windows
    Unicode (UTF-16) files can be converted to Unix Unicode (UTF-8) files.dos2unix [options] [FILE ...] [-n INFILE OUTFILE ...]
  3. Network service discovery disabled
    Your current network has a .local domain, which is not recommended and incompatible with the Avahi network service discovery. The service has been disabled.

    sudo vim /etc/default/avahi-daemon

    Make the parameter below from 1 to 0

    AVAHI_DAEMON_DETECT_LOCAL=0

  4. /boot空间不足:
    /boot是放置内核的地方,这时候就该删除多余的内核了。具体流程为:
    1. 确定自己使用的内核编号

    uname -a

    2. 确定自己安装过哪些内核

    sudo dpkg --get-selections | grep linux-

    3. 删除多余内核
    sudo apt-get purge 后面跟上两类文件,一类是“linux-headers”,另一类是“linux-image”,这两者是成对的。当前使用的内核不能删除。
    4. 清理deinstall (这是一条组合命令,先得到标识为deinstall的名称,再purge。)

    dpkg --purge `dpkg --get-selections | grep deinstall | cut -f 1`

    5. 更新grub

    sudo update-grub

  5. 用户A编辑文件file.txt,这时,用户B向file.txt追加输入字符,能够成功输入。不过,具体顺序,有待探究。
  6. 安装最新版的NVIDIA驱动
    20170312,尝试安装NVIDIA-Linux-x86_64-375.39.run,但是失败了,在网上发现,20170228时,有人也反映了这个问题。Ubuntu安装最新版的NVIDIA会有些问题,因为,Ubuntu跟Nvidia是两个机构,Ubuntu无法得到NVIDIA的源码,只能通过修改和调试让ubuntu兼容nvidia驱动,或者让用户使用ubuntu社区自己开发的驱动。在软件管理中心那里,有367.57版本的NVIDIA。
  7. 修改默认启动的内核
    sudo vim /etc/default/grub
    
    GRUB_DEFAULT=0
    #GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 4.10.0-041000-generic"
    
    sudo update-grub
  8. foobar

RNA-seq — wikipedia

RNA-seq (RNA sequenc­ing), also called whole tran­scrip­tome shot­gun sequencing(WTSS), uses next-gen­er­a­tion sequenc­ing (NGS) to reveal the pres­ence and quan­ti­ty of RNA in a bio­log­i­cal sam­ple at a given moment in time.

RNA-Seq is used to ana­lyze the con­tin­u­al­ly chang­ing cel­lu­lar tran­scrip­tome. Specif­i­cal­ly, RNA-Seq facil­i­tates the abil­i­ty to look at alter­na­tive gene spliced tran­scripts, post-tran­scrip­tion­al mod­i­fi­ca­tions, gene fusion, mutations/SNPs and changes in gene expres­sion. In addi­tion to mRNA tran­scripts, RNA-Seq can look at dif­fer­ent pop­u­la­tions of RNA to include total RNA, small RNA, such as miR­NA, tRNA, and ribo­so­mal pro­fil­ing. RNA-Seq can also be used to deter­mine exon/intron bound­aries and ver­i­fy or amend pre­vi­ous­ly anno­tat­ed 5’ and 3’ gene bound­aries.

Pri­or to RNA-Seq, gene expres­sion stud­ies were done with hybridiza­tion-based microar­rays. Issues with microar­rays include cross-hybridiza­tion arti­facts, poor quan­tifi­ca­tion of low­ly and high­ly expressed genes, and the knowl­edge of the sequence. Because of the­se tech­ni­cal issues, tran­scrip­tomics tran­si­tioned to sequenc­ing-based meth­ods. The­se pro­gressed from Sanger sequenc­ing of Expressed Sequence Tag libraries, to chem­i­cal tag-based meth­ods (e.g., seri­al analy­sis of gene expres­sion), and final­ly to the cur­rent tech­nol­o­gy, NGS of cDNA (notably RNA-Seq).

Ensemble and Entrez

Ensem­ble

The Ensem­bl project was start­ed in 1999, some years before the draft human genome was com­plet­ed. Even at that ear­ly stage it was clear that man­u­al anno­ta­tion of 3 bil­lion base pairs of sequence would not be able to offer researchers time­ly access to the lat­est data. The goal of Ensem­bl was there­fore to auto­mat­i­cal­ly anno­tate the genome, inte­grate this anno­ta­tion with oth­er avail­able bio­log­i­cal data and make all this pub­licly avail­able via the web.

Entrez

Sin­gle search engine of NCBI. Entrez is not a gene data­base. It’s the name of the NCBI infra­struc­ture which pro­vides access to all of the NCBI data­bas­es. One of those is the Gene data­base, so you would say “Entrez Gene”.

there is not nec­es­sar­i­ly an one-to-one map­ping between Entrez Gene and Ensem­bl Gene IDs.

他们之间ID的互相转换可以通过蛋白质序列或者核苷酸序列的比对来实现,但是这种比对得到的结果可能不唯一,因为,相似度很高的比对结果可能有多个。

https://www.biostars.org/p/16505/

ubuntu software

  1. Fcitx: a input method framework with extension support, which provides an interface for entering characters of different scripts in applications using a variety of mapping systems.
    sudo apt-get install fcitx-table-wbpy
  2. baobab,synaptic: Disk Usage Analyzer
    sudo apt-get install baobab synaptic
  3. upgrade ubuntu:
    sudo update - manager -c -d
  4. Install Oracle Java:
    http://www.oracle.com/technetwork/java/javase/downloads/index.html
  5. wine1.8
    sudo add-apt-repository ppa:ubuntu-wine/ppa
    sudo apt-get update
    sudo apt-get install wine1.8
  6. 做图工具:Kolourpaint
    sudo apt-get install kolourpaint4
  7. Process Viewer:Htop
    an interactive process viewer for Unix systems.

    sudo apt-get install htop
  8. Indicator-SysMonitor
    sudo add-apt-repository ppa:fossfreedom/indicator-sysmonitor
    sudo apt-get update
    sudo apt-get install indicator-sysmonitor
  9. Move launcher to bottom
    gsettings set com.canonical.Unity.Launcher launcher-position Bottom
  10. foobar

DAVID new version: 6.8

DAVID 6.8 (cur­rent beta release) May. 2016

– The DAVID Knowl­edge­base com­plete­ly rebuilt
— Entrez Gene inte­grat­ed as the cen­tral iden­ti­fier to allow for more time­ly updates
while still incor­po­rat­ing Ensem­bl and Uniprot as inte­gral data sources
— New GO cat­e­go­ry (GO Direct) pro­vides GO map­pings direct­ly anno­tat­ed by the source data­base (no par­ent terms includ­ed)
— New anno­ta­tion cat­e­gories
— New list iden­ti­fier sys­tems added for list upload­ing and con­ver­sion
— A few bugs fixed

Division of Medical Sciences at Harvard Medical School (DMS)

Bioinformatics and Integrative Genomics (BIG)

The Bioin­for­mat­ics and Inte­gra­tive Genomics (BIG) Pro­gram enrolls PhD stu­dents with excep­tion­al train­ing in quan­ti­ta­tive sci­ences and strong inter­est in bio­med­ical appli­ca­tions. Research areas encom­pass com­pu­ta­tion­al analy­sis and math­e­mat­i­cal mod­el­ing of data gen­er­at­ed by DNA sequence, gene expres­sion, struc­tural, pro­teomics, and metabo­lite-assay­ing tech­nolo­gies. In applied projects, they may also include inte­gra­tion of clin­i­cal and pop­u­la­tion data from elec­tron­ic health records. Both bioin­for­mat­ics and genomics are tight­ly linked to the math­e­mat­i­cal and bio­phys­i­cal mod­el­ing of com­plex bio­log­i­cal sys­tems and exper­i­men­tal val­i­da­tion of com­pu­ta­tion­al pre­dic­tions. Grad­u­ate stu­dents will con­duct orig­i­nal research in the devel­op­ment of nov­el approach­es and new tech­nolo­gies to address fun­da­men­tal bio­log­i­cal ques­tions, and they will acquire the skills to be lead­ers in the field of bioin­for­mat­ics and genomics. Stu­dents will be joint mem­bers of BIG and a “home pro­gram” cho­sen from one of the four DMS pro­grams (BBS, Immunol­o­gy, Neu­ro­science, Virol­o­gy). BIG stu­dents will fol­low the cur­ricu­lum and par­tic­i­pate in activ­i­ties of the home pro­gram, which will be sup­ple­ment­ed with BIG pro­gram­mat­ic and cur­ric­u­lar offer­ings.

Terminology

1.fold-coverage

the the­o­ret­i­cal “fold-cov­er­age” of a shot­gun sequenc­ing exper­i­ment:

<num­ber of reads> * <read length> / <tar­get size>

2.Amplicon

An ampli­con is a piece of DNA or RNA that is <the source and/or pro­duct of nat­u­ral or arti­fi­cial ampli­fi­ca­tion or repli­ca­tion events>.

It can be formed using var­i­ous meth­ods includ­ing poly­merase chain reac­tions (PCR), lig­ase chain reac­tions (LCR), or nat­u­ral gene dupli­ca­tion.

3.Whole genome map­ping

A Whole Genome Map is a high-res­o­lu­tion, ordered, whole genome restric­tion map gen­er­at­ed from sin­gle DNA mol­e­cules extract­ed from bac­te­ria, yeast, or oth­er fungi. Whole Genome Map­ping is a nov­el tech­nol­o­gy with unique capa­bil­i­ties in the field of micro­bi­ol­o­gy, with speci­fic appli­ca­tions in the areas of Com­par­a­tive Genomics, Strain Typ­ing, and Whole Genome Sequence Assem­bly. Whole Genome Maps are gen­er­at­ed de novo, inde­pen­dent of sequence infor­ma­tion, require no ampli­fi­ca­tion or PCR steps, and provide a com­pre­hen­sive view of whole genome archi­tec­ture. A Whole Genome Map is dis­played in the Map­Code pat­tern where the ver­ti­cal lines indi­cate the loca­tions of restric­tion sites, and the dis­tance between the lines rep­re­sent the restric­tion frag­ment size.

4.Radiation hybrid map­ping

A the­o­ry is devel­oped to pre­dict mark­er reten­tion and con­di­tion­al reten­tion or loss in radi­a­tion hybrids. Applied to mul­ti­ple pair­wise analy­sis of a human chro­mo­some 21 data set, this the­o­ry fits much bet­ter than pro­posed alter­na­tives and gives a phys­i­cal map con­sis­tent with oth­er evi­dence and robust with respect to errors to typ­ing. Radi­a­tion hybrids have great promise to provide order and phys­i­cal loca­tion at two lev­els of res­o­lu­tion, span­ning the tech­niques of link­age and restric­tion frag­ments and not lim­it­ed to poly­mor­phic loci.

5.dna bar­cod­ing

DNA bar­cod­ing is a tax­o­nom­ic method that uses a short genet­ic mark­er in an organism’s DNA to iden­ti­fy it as belong­ing to a par­tic­u­lar species

6.metric space

In math­e­mat­ics, a met­ric space is a set for which dis­tances between all mem­bers of the set are defined. Those dis­tances, tak­en togeth­er, are called a met­ric on the set.

7.Pseudometric space

In math­e­mat­ics, a pseudo­met­ric space is a gen­er­al­ized met­ric space in which the dis­tance between two dis­tinct points can be zero.

8.pyrosequencing

Pyrose­quenc­ing is a method of DNA sequenc­ing (deter­min­ing the order of nucleotides in DNA) based on the “sequenc­ing by syn­the­sis” prin­ci­ple. It dif­fers from Sanger sequenc­ing, in that it relies on the detec­tion of pyrophos­phate release on nucleotide incor­po­ra­tion, rather than chain ter­mi­na­tion with dideoxynucleotides.The desired DNA sequence is able to be deter­mined by light emit­ted upon incor­po­ra­tion of the next com­ple­men­tary nucleotide by the fact that only one out of four of the pos­si­ble A/T/C/G nucleotides are added and avail­able at a time so that only one let­ter can be incor­po­rat­ed on the sin­gle strand­ed tem­plate (which is the sequence to be deter­mined). The inten­si­ty of the light deter­mi­nes if there are more than one of the­se “let­ters” in a row. The pre­vi­ous nucleotide let­ter (one out of four pos­si­ble dNTP) is degrad­ed before the next nucleotide let­ter is added for syn­the­sis: allow­ing for the pos­si­ble reveal­ing of the next nucleotide(s) via the result­ing inten­si­ty of light (if the nucleotide added was the next com­ple­men­tary let­ter in the sequence). This process is repeat­ed with each of the four let­ters until the DNA sequence of the sin­gle strand­ed tem­plate is deter­mined.

9.n-gram(k-mer)

In the fields of com­pu­ta­tion­al lin­guis­tics and prob­a­bil­i­ty, an n-gram is a con­tigu­ous sequence of n items from a given sequence of text or speech. The items can be phonemes, syl­la­bles, let­ters, words or base pairs accord­ing to the appli­ca­tion. The n-grams typ­i­cal­ly are col­lect­ed from a text or speech cor­pus.

 

DNA sequencing:base pair

AGCTTCGA

…, A, G, C, T, T, C, G, A, …

…, AG, GC, CT, TT, TC, CGGA, …

…, AGC, GCT, CTT, TTC, TCGCGA, …

10.sequence space

In evo­lu­tion­ary biol­o­gy, sequence space is a way of rep­re­sent­ing all pos­si­ble sequences (for a pro­tein, gene or genome).

11.k-mer dis­tance

1.li,lj,表示两条序列

2.τ:表示一个k-mer的一个子序列,

ni(τ),nj(τ):表示该子序列在两条序列的k-mer中的个数。

3.ki,j:表示这两条序列k-mer的相似度

12.optical map(ordered restric­tion map)

Opti­cal map­ping is a tech­nique for con­struct­ing ordered, genome-wide, high-res­o­lu­tion restric­tion maps from sin­gle, stained mol­e­cules of DNA, called “opti­cal maps”. By map­ping the loca­tion of restric­tion enzyme sites along the unknown DNA of an organ­ism, the spec­trum of result­ing DNA frag­ments col­lec­tive­ly serve as a unique “fin­ger­print” or “bar­code” for that sequence.

13.Restriction map

A restric­tion map is a map of known restric­tion sites with­in a sequence of DNA. Restric­tion map­ping requires the use of restric­tion enzymes. In mol­e­c­u­lar biol­o­gy, restric­tion maps are used as a ref­er­ence to engi­neer plas­mids or oth­er rel­a­tive­ly short pieces of DNA, and some­times for longer genomic DNA.

14.Expressed sequence tag

An expressed sequence tag or EST is a short sub-sequence of a cDNA sequence.They may be used to iden­ti­fy gene tran­scripts, and are instru­men­tal in gene dis­cov­ery and gene sequence deter­mi­na­tion. The iden­ti­fi­ca­tion of ESTs has pro­ceed­ed rapid­ly, with approx­i­mate­ly 74.2 mil­lion ESTs now avail­able in pub­lic data­bas­es (e.g. Gen­Bank 1 Jan­u­ary 2013, all species).

15.Multiple Sequenc­ing Align­ment

A Mul­ti­ple Sequence Align­ment (MSA) is a sequence align­ment of three or more bio­log­i­cal sequences, gen­er­al­ly pro­tein, DNA, or RNA. In many cas­es, the input set of query sequences are assumed to have an evo­lu­tion­ary rela­tion­ship by which they share a lin­eage and are descend­ed from a com­mon ances­tor. From the result­ing MSA, sequence homol­o­gy can be inferred and phy­lo­ge­net­ic analy­sis can be con­duct­ed to assess the sequences’ shared evo­lu­tion­ary ori­gins. Visu­al depic­tions of the align­ment as in the image at right illus­trate muta­tion events such as point muta­tions (sin­gle amino acid or nucleotide changes) that appear as dif­fer­ing char­ac­ters in a sin­gle align­ment column, and inser­tion or dele­tion muta­tions (indels or gaps) that appear as hyphens in one or more of the sequences in the align­ment. Mul­ti­ple sequence align­ment is often used to assess sequence con­ser­va­tion of pro­tein domains, ter­tiary and sec­ondary struc­tures, and even indi­vid­u­al amino acids or nucleotides.

16.POA(Partial Order Align­ment)

Par­tial order align­ment (POA) has been pro­posed as a new approach to mul­ti­ple sequence align­ment (MSA), which can be com­bined with exist­ing meth­ods such as pro­gres­sive align­ment. This is impor­tant for address­ing prob­lems both in the orig­i­nal ver­sion of POA (such as order sen­si­tiv­i­ty) and in stan­dard pro­gres­sive align­ment pro­grams (such as infor­ma­tion loss in com­plex align­ments, espe­cial­ly sur­round­ing gap regions).

17.Progressive Align­ment

This approach begins with the align­ment of the two most close­ly relat­ed sequences (as deter­mined by pair­wise analy­sis) and sub­se­quent­ly adds the next clos­est sequence or sequence group to this ini­tial pair [37,7]. This process con­tin­ues in an iter­a­tive fash­ion, adjust­ing the posi­tion­ing of indels in all sequences. The major short­com­ing of this approach is that a bias may be intro­duced in the infer­ence of the ordered series of motifs (homol­o­gous parts) because of an over­rep­re­sen­ta­tion of a sub­set of sequences.

18.核糖体小亚基(英文:Ribosomal Small Subunit,简称“SSU”)

是核糖体中较小的核糖体亚基。每个核糖体都由一个核糖体小亚基与一个核糖体大亚基共同构成。[1]小亚基在核糖体翻译过程中负责信息的识别。  原核细胞中的70S核糖体、真核细胞细胞质中的80S核糖体与真核细胞线粒体中的线粒体核糖体各拥有一种不同的核糖体小亚基:70S核糖体中包含30S核糖体亚基,80S核糖体中包含40S核糖体亚基,线粒体核糖体中则包含28S核糖体亚基。

原核细胞 (70S核糖体) 大亚基:50S亚基(包含5S rRNA及23S rRNA)  
  小亚基:30S亚基(包含16S rRNA)  
真核细胞 细胞质核糖体 (80S核糖体) 大亚基:60S亚基(包含5S rRNA、5.8S rRNA及28S rRNA)
    小亚基:40S亚基(包含18S rRNA)
  线粒体核糖体 39S大亚基(12S MT-RNR1
    28S小亚基(16S MT-RNR2

19.rare bios­phere

Low-abun­dance high-diver­si­ty group is what is now called the “Rare Bios­phere”.

20.Phred qual­i­ty score

Phred qual­i­ty scores were orig­i­nal­ly devel­oped by the pro­gram Phred to help in the automa­tion of DNA sequenc­ing in the Human Genome Project. Phred qual­i­ty scores are assigned to each nucleotide base call in auto­mat­ed sequencer traces.[1][2] Phred qual­i­ty scores have become wide­ly accept­ed to char­ac­ter­ize the qual­i­ty of DNA sequences, and can be used to com­pare the effi­ca­cy of dif­fer­ent sequenc­ing meth­ods. Per­haps the most impor­tant use of Phred qual­i­ty scores is the auto­mat­ic deter­mi­na­tion of accu­rate, qual­i­ty-based con­sen­sus sequences.

21.Base call­ing

Base call­ing is the process of assign­ing bases (nucle­obas­es) to chro­matogram peaks. One of the best com­put­er pro­grams for accom­plish­ing this job is Phred base-call­ing, which is cur­rent­ly the most wide­ly used base­call­ing soft­ware pro­gram by both aca­d­e­mic and com­mer­cial DNA sequenc­ing lab­o­ra­to­ries because of its high base call­ing accu­ra­cy

22.MIAME(Mini­mum Infor­ma­tion About a Microar­ray Exper­i­ment)

describes the Min­i­mum Infor­ma­tion About a Microar­ray Exper­i­ment that is need­ed to enable the inter­pre­ta­tion of the results of the exper­i­ment unam­bigu­ous­ly and poten­tial­ly to repro­duce the exper­i­ment.

1.The raw data for each hybridi­s­a­tion.

2.The final processed data for the set of hybridi­s­a­tions in the exper­i­ment (study)

3.The essen­tial sam­ple anno­ta­tion, includ­ing exper­i­men­tal fac­tors and their val­ues

4.The exper­i­ment design includ­ing sam­ple data rela­tion­ships

5.Sufficient anno­ta­tion of the array design

6.Essential exper­i­men­tal and data pro­cess­ing pro­to­cols