the the­o­ret­i­cal “fold-cov­er­age” of a shot­gun sequenc­ing exper­i­ment:

<num­ber of reads> * <read length> / <tar­get size>


An ampli­con is a piece of DNA or RNA that is <the source and/or pro­duct of nat­u­ral or arti­fi­cial ampli­fi­ca­tion or repli­ca­tion events>.

It can be formed using var­i­ous meth­ods includ­ing poly­merase chain reac­tions (PCR), lig­ase chain reac­tions (LCR), or nat­u­ral gene dupli­ca­tion.

3.Whole genome map­ping

A Whole Genome Map is a high-res­o­lu­tion, ordered, whole genome restric­tion map gen­er­at­ed from sin­gle DNA mol­e­cules extract­ed from bac­te­ria, yeast, or oth­er fungi. Whole Genome Map­ping is a nov­el tech­nol­o­gy with unique capa­bil­i­ties in the field of micro­bi­ol­o­gy, with speci­fic appli­ca­tions in the areas of Com­par­a­tive Genomics, Strain Typ­ing, and Whole Genome Sequence Assem­bly. Whole Genome Maps are gen­er­at­ed de novo, inde­pen­dent of sequence infor­ma­tion, require no ampli­fi­ca­tion or PCR steps, and provide a com­pre­hen­sive view of whole genome archi­tec­ture. A Whole Genome Map is dis­played in the Map­Code pat­tern where the ver­ti­cal lines indi­cate the loca­tions of restric­tion sites, and the dis­tance between the lines rep­re­sent the restric­tion frag­ment size.

4.Radiation hybrid map­ping

A the­o­ry is devel­oped to pre­dict mark­er reten­tion and con­di­tion­al reten­tion or loss in radi­a­tion hybrids. Applied to mul­ti­ple pair­wise analy­sis of a human chro­mo­some 21 data set, this the­o­ry fits much bet­ter than pro­posed alter­na­tives and gives a phys­i­cal map con­sis­tent with oth­er evi­dence and robust with respect to errors to typ­ing. Radi­a­tion hybrids have great promise to provide order and phys­i­cal loca­tion at two lev­els of res­o­lu­tion, span­ning the tech­niques of link­age and restric­tion frag­ments and not lim­it­ed to poly­mor­phic loci.

5.dna bar­cod­ing

DNA bar­cod­ing is a tax­o­nom­ic method that uses a short genet­ic mark­er in an organism’s DNA to iden­ti­fy it as belong­ing to a par­tic­u­lar species

6.metric space

In math­e­mat­ics, a met­ric space is a set for which dis­tances between all mem­bers of the set are defined. Those dis­tances, tak­en togeth­er, are called a met­ric on the set.

7.Pseudometric space

In math­e­mat­ics, a pseudo­met­ric space is a gen­er­al­ized met­ric space in which the dis­tance between two dis­tinct points can be zero.


Pyrose­quenc­ing is a method of DNA sequenc­ing (deter­min­ing the order of nucleotides in DNA) based on the “sequenc­ing by syn­the­sis” prin­ci­ple. It dif­fers from Sanger sequenc­ing, in that it relies on the detec­tion of pyrophos­phate release on nucleotide incor­po­ra­tion, rather than chain ter­mi­na­tion with dideoxynucleotides.The desired DNA sequence is able to be deter­mined by light emit­ted upon incor­po­ra­tion of the next com­ple­men­tary nucleotide by the fact that only one out of four of the pos­si­ble A/T/C/G nucleotides are added and avail­able at a time so that only one let­ter can be incor­po­rat­ed on the sin­gle strand­ed tem­plate (which is the sequence to be deter­mined). The inten­si­ty of the light deter­mi­nes if there are more than one of the­se “let­ters” in a row. The pre­vi­ous nucleotide let­ter (one out of four pos­si­ble dNTP) is degrad­ed before the next nucleotide let­ter is added for syn­the­sis: allow­ing for the pos­si­ble reveal­ing of the next nucleotide(s) via the result­ing inten­si­ty of light (if the nucleotide added was the next com­ple­men­tary let­ter in the sequence). This process is repeat­ed with each of the four let­ters until the DNA sequence of the sin­gle strand­ed tem­plate is deter­mined.


In the fields of com­pu­ta­tion­al lin­guis­tics and prob­a­bil­i­ty, an n-gram is a con­tigu­ous sequence of n items from a given sequence of text or speech. The items can be phonemes, syl­la­bles, let­ters, words or base pairs accord­ing to the appli­ca­tion. The n-grams typ­i­cal­ly are col­lect­ed from a text or speech cor­pus.


DNA sequencing:base pair


…, A, G, C, T, T, C, G, A, …

…, AG, GC, CT, TT, TC, CGGA, …


10.sequence space

In evo­lu­tion­ary biol­o­gy, sequence space is a way of rep­re­sent­ing all pos­si­ble sequences (for a pro­tein, gene or genome).

11.k-mer dis­tance





12.optical map(ordered restric­tion map)

Opti­cal map­ping is a tech­nique for con­struct­ing ordered, genome-wide, high-res­o­lu­tion restric­tion maps from sin­gle, stained mol­e­cules of DNA, called “opti­cal maps”. By map­ping the loca­tion of restric­tion enzyme sites along the unknown DNA of an organ­ism, the spec­trum of result­ing DNA frag­ments col­lec­tive­ly serve as a unique “fin­ger­print” or “bar­code” for that sequence.

13.Restriction map

A restric­tion map is a map of known restric­tion sites with­in a sequence of DNA. Restric­tion map­ping requires the use of restric­tion enzymes. In mol­e­c­u­lar biol­o­gy, restric­tion maps are used as a ref­er­ence to engi­neer plas­mids or oth­er rel­a­tive­ly short pieces of DNA, and some­times for longer genomic DNA.

14.Expressed sequence tag

An expressed sequence tag or EST is a short sub-sequence of a cDNA sequence.They may be used to iden­ti­fy gene tran­scripts, and are instru­men­tal in gene dis­cov­ery and gene sequence deter­mi­na­tion. The iden­ti­fi­ca­tion of ESTs has pro­ceed­ed rapid­ly, with approx­i­mate­ly 74.2 mil­lion ESTs now avail­able in pub­lic data­bas­es (e.g. Gen­Bank 1 Jan­u­ary 2013, all species).

15.Multiple Sequenc­ing Align­ment

A Mul­ti­ple Sequence Align­ment (MSA) is a sequence align­ment of three or more bio­log­i­cal sequences, gen­er­al­ly pro­tein, DNA, or RNA. In many cas­es, the input set of query sequences are assumed to have an evo­lu­tion­ary rela­tion­ship by which they share a lin­eage and are descend­ed from a com­mon ances­tor. From the result­ing MSA, sequence homol­o­gy can be inferred and phy­lo­ge­net­ic analy­sis can be con­duct­ed to assess the sequences’ shared evo­lu­tion­ary ori­gins. Visu­al depic­tions of the align­ment as in the image at right illus­trate muta­tion events such as point muta­tions (sin­gle amino acid or nucleotide changes) that appear as dif­fer­ing char­ac­ters in a sin­gle align­ment column, and inser­tion or dele­tion muta­tions (indels or gaps) that appear as hyphens in one or more of the sequences in the align­ment. Mul­ti­ple sequence align­ment is often used to assess sequence con­ser­va­tion of pro­tein domains, ter­tiary and sec­ondary struc­tures, and even indi­vid­u­al amino acids or nucleotides.

16.POA(Partial Order Align­ment)

Par­tial order align­ment (POA) has been pro­posed as a new approach to mul­ti­ple sequence align­ment (MSA), which can be com­bined with exist­ing meth­ods such as pro­gres­sive align­ment. This is impor­tant for address­ing prob­lems both in the orig­i­nal ver­sion of POA (such as order sen­si­tiv­i­ty) and in stan­dard pro­gres­sive align­ment pro­grams (such as infor­ma­tion loss in com­plex align­ments, espe­cial­ly sur­round­ing gap regions).

17.Progressive Align­ment

This approach begins with the align­ment of the two most close­ly relat­ed sequences (as deter­mined by pair­wise analy­sis) and sub­se­quent­ly adds the next clos­est sequence or sequence group to this ini­tial pair [37,7]. This process con­tin­ues in an iter­a­tive fash­ion, adjust­ing the posi­tion­ing of indels in all sequences. The major short­com­ing of this approach is that a bias may be intro­duced in the infer­ence of the ordered series of motifs (homol­o­gous parts) because of an over­rep­re­sen­ta­tion of a sub­set of sequences.

18.核糖体小亚基(英文:Ribosomal Small Subunit,简称“SSU”)

是核糖体中较小的核糖体亚基。每个核糖体都由一个核糖体小亚基与一个核糖体大亚基共同构成。[1]小亚基在核糖体翻译过程中负责信息的识别。  原核细胞中的70S核糖体、真核细胞细胞质中的80S核糖体与真核细胞线粒体中的线粒体核糖体各拥有一种不同的核糖体小亚基:70S核糖体中包含30S核糖体亚基,80S核糖体中包含40S核糖体亚基,线粒体核糖体中则包含28S核糖体亚基。

原核细胞 (70S核糖体) 大亚基:50S亚基(包含5S rRNA及23S rRNA)  
  小亚基:30S亚基(包含16S rRNA)  
真核细胞 细胞质核糖体 (80S核糖体) 大亚基:60S亚基(包含5S rRNA、5.8S rRNA及28S rRNA)
    小亚基:40S亚基(包含18S rRNA)
  线粒体核糖体 39S大亚基(12S MT-RNR1
    28S小亚基(16S MT-RNR2

19.rare bios­phere

Low-abun­dance high-diver­si­ty group is what is now called the “Rare Bios­phere”.

20.Phred qual­i­ty score

Phred qual­i­ty scores were orig­i­nal­ly devel­oped by the pro­gram Phred to help in the automa­tion of DNA sequenc­ing in the Human Genome Project. Phred qual­i­ty scores are assigned to each nucleotide base call in auto­mat­ed sequencer traces.[1][2] Phred qual­i­ty scores have become wide­ly accept­ed to char­ac­ter­ize the qual­i­ty of DNA sequences, and can be used to com­pare the effi­ca­cy of dif­fer­ent sequenc­ing meth­ods. Per­haps the most impor­tant use of Phred qual­i­ty scores is the auto­mat­ic deter­mi­na­tion of accu­rate, qual­i­ty-based con­sen­sus sequences.

21.Base call­ing

Base call­ing is the process of assign­ing bases (nucle­obas­es) to chro­matogram peaks. One of the best com­put­er pro­grams for accom­plish­ing this job is Phred base-call­ing, which is cur­rent­ly the most wide­ly used base­call­ing soft­ware pro­gram by both aca­d­e­mic and com­mer­cial DNA sequenc­ing lab­o­ra­to­ries because of its high base call­ing accu­ra­cy

22.MIAME(Mini­mum Infor­ma­tion About a Microar­ray Exper­i­ment)

describes the Min­i­mum Infor­ma­tion About a Microar­ray Exper­i­ment that is need­ed to enable the inter­pre­ta­tion of the results of the exper­i­ment unam­bigu­ous­ly and poten­tial­ly to repro­duce the exper­i­ment.

1.The raw data for each hybridi­s­a­tion.

2.The final processed data for the set of hybridi­s­a­tions in the exper­i­ment (study)

3.The essen­tial sam­ple anno­ta­tion, includ­ing exper­i­men­tal fac­tors and their val­ues

4.The exper­i­ment design includ­ing sam­ple data rela­tion­ships

5.Sufficient anno­ta­tion of the array design

6.Essential exper­i­men­tal and data pro­cess­ing pro­to­cols



Leave a Reply

Your email address will not be published. Required fields are marked *