boost — c++ libraries

What’s this?

Boost pro­vides free peer-reviewed portable C++ source libraries. In a word, Pro­duc­tiv­i­ty. Use of high-qual­i­ty libraries like Boost speeds ini­tial devel­op­ment, results in few­er bugs, reduces rein­ven­tion-of-the-wheel, and cuts long-term main­te­nance costs. And since Boost libraries tend to become de fac­to or de jure stan­dards, many pro­gram­mers are already famil­iar with them.

How to install?   Boost



2../; ./b2 –j12 install –prefix=/path/to/your/directory (link=shared)


4.库文件:-L –l指定;

5.编译:gcc/g++ –L/path/to/your/directory/lib –lxxx a.cpp –o a

Install and Configure Ubuntu

Install ubun­tu:

1. download a linux operation system iso file.
2. change system setting from uefi to legacy, from ‘boot from hdd’ to ’boot from u disk’;
3. make a U disk for installation;
4. manually setting partition:
 all logical partition:
Name Size
boot 500 MB
/ 50 GB
/home 100 GB
swap space 8 GB
boot: 每个内核大概占80MB,所以250MB可以放最多3个内核。大概可以用一个月,不删除内核。具体取决于内核更新的频率。

Con­fig­ure Ubun­tu

1. install adobe flash player for firefox
   move “” to firefox plugin file;
     cp ~/.mozilla/plugins/

Move files under the direc­to­ry “usr” to usr;
sudo cp –r ./usr/* /usr/

2. upgrade:
 sudo apt-get update;
 sudo apt-get upgrade;
3. enable multiple desktops in ubuntu
 system settings → appearance → behavior → enable workspaces
4. install necessary softwares
Flacon extracts individual tracks from one big audio file containing the entire album of music and saves them as separate audio files.
wondershaper wlan0 1000 1000
sudo apt-get nethogs
g++ & gcc
先安装mpc,gmp & mpfr,然后添加其lib至LD_LIBRARY_PAYH;
/gcc_6.3.0/gcc-6.3.0/configure --disable-multilib --prefix=/gcc_6.3.0/ --with-gmp=/gcc_6.3.0/addition/gmp_6.1.0/ --with-mpfr=/gcc_6.3.0/addition/mpfr_3.1.4/ --with-mpc=/gcc_6.3.0/addition/mpc_1.0.3/ && 
make -j 24 && 
make -j 24 check && 
make -j 24 install
sudo apt-get install vim
5. mount windows partition
sudo blkid
 vim /etc/fstab
 Add "/dev/sda5 /media/nolan/WORK/ ntfs defaults 0 0"
 sudo mount -a
6.set dual monitor
xrandr #to find the name of the monitors: DP-1 and eDP-1
 xrandr --output DP-1 --left-of eDP-1 --auto
 #--left-of: set the relative place of two monitors
 #--auto: set resolutions of monitors
7.set environment parameters(path and variable)
set a local environment path: just add the path before the system path like this:PATH=/path/to/program/:$PATH
 set a variable: NAME=/path/of/the/program
8.install nvidia driver
GeForce: 840M notebook
NVIDIA Driver Download
9.PPA (personal package archives)
location: /etc/apt/sources.list.d/

10. foo­bar

Celera Assembler Terminology

  • An assem­bly is a set of scaf­folds com­put­ed from reads.
  • A scaf­fold is an ordered and ori­ent­ed set of one or more con­tigs with dis­tances assigned to the gaps between con­tigs. In prac­tice, each gap dis­tance is com­put­ed from mate pairs that are anchored in neigh­bor con­tigs and span the gap. A scaf­fold implies a sin­gle sequence that pos­si­bly includes gaps.
  • A con­tig con­sists of a set of reads, a lay­out that includes all the reads and leaves no gaps, a mul­ti­ple sequence align­ment of the reads, and a con­sen­sus sequence. In prac­tice con­tigs con­sist of one or more unit­igs. Note the con­sen­sus may con­tain (small) gaps spanned by reads even though the lay­out includes no (0X) gaps.
  • A unit­ig is a spe­cial kind of con­tig. Ide­al­ly, it is ful­ly con­sis­tent with all the data includ­ing reads, over­laps, and mate con­straints. In prac­tice, unit­igs can only be con­sis­tent with most of the data. Con­cep­tu­al­ly, a unit­ig is a high-con­fi­dence con­tig. Max­i­mal unit­igs should con­tain either (1) unique sequence up to repeat bound­aries, with less than a read-length of repeat on each end, or (2) near­ly the full extent of a genomic repeat.

A Scaffold with a Surrogate

A Scaf­fold with a Sur­ro­gate

The Cel­era Assem­bler works with frag­men­tary sequences, their detect­ed over­laps, and their given mate pairs. Often, the data are mutu­al­ly con­tra­dic­to­ry, as shown here. Yet, Cel­era Assem­bler reduces the data to a lin­ear sequence when­ev­er that is jus­ti­fied.

(A) Sequence over­laps and mate pairs sug­gest sev­er­al pos­si­ble joins. Line seg­ments rep­re­sent frag­ments, ver­ti­cal stack­ing rep­re­sents over­laps, rec­tan­gles rep­re­sent con­tigs, arrows rep­re­sent links, and every element’s thick­ness cor­re­lates to the amount of sup­port­ing data.

(B) The assem­bler reduces the graph such that one con­tra­dic­tion remains. The sequence frag­ments were reduced to con­tigs based on over­laps. The mate pairs were reduced to con­tig links of var­i­ous weights. Here, three con­tigs form a lin­ear scaf­fold but the fourth con­tig is prob­lem­at­ic.

© The assem­bler has reduced the graph to a lin­ear sequence. Its final step was to insert the 4th con­tig twice. Called a mul­ti­ply placed sur­ro­gate unit­ig, the 4th con­tig appears to rep­re­sent over-col­lapse of frag­ments induced by a near-per­fect repeat in the genome.


(1) A lay­out and asso­ci­at­ed con­sen­sus sequence(s) and/or multi-alignment(s). In oth­er words, we use this term to speak of a ten­ta­tive recon­struc­tion of seg­ments of the tar­get sequence and the loca­tions from which the reads were sam­pled.
Branch Point
(1) A branch point is a posi­tion on a frag­ment and/or chunk that is known to rep­re­sent the bound­ary of a repet­i­tive ele­ment. The infer­ence one would like to make is that one side of the branch­point is unique sequence and the oth­er is repet­i­tive, but inter­nal repeat bound­aries of micro- and mini-satel­lites are also detect­ed as branch­points.
Con­sen­sus Sequence (or sim­ply Con­sen­sus)
(1) Given a col­lec­tion of over­lap­ping reads, that do not pre­cise­ly match along their over­laps, a con­sen­sus sequence for the col­lec­tion is, loose­ly speak­ing, one’s best guess at the sequence the reads were sam­pled from. Often peo­ple mean some­thing more pre­cise: the math­e­mat­i­cal def­i­n­i­tion of con­sen­sus sequence is one for which the sum of the dif­fer­ences between the con­sen­sus sequence and each one of the reads is min­i­mal.
(1) A max­i­mal set of reads in a lay­out which in aggre­gate cov­er a con­tigu­ous inter­val.
(2) A con­tigu­ous join of unit­igs. It con­sists of a mul­ti­ple sequence align­ment of reads plus a con­sen­sus sequence, although it also has an inter­nal unit­ig struc­ture. The con­sen­sus can have short gaps rep­re­sent­ing inserts in a minor­i­ty of the under­ly­ing reads. The con­sen­sus can have regions of 0X read cov­er­age when the con­sen­sus is due to a sur­ro­gate.
(1) A unit­ig that could not be com­bined into any scaf­fold. It is like a sin­gle­ton but it has more than one read. Degen­er­ates some­times con­tain high-copy plas­mid sequence. Degen­er­ates can reflect bio­log­i­cal phe­nom­e­na that under­mine the assump­tions of Cel­era Assembler’s math­e­mat­i­cal mod­el.
(1) Either a guide or a read. Unfor­tu­nate­ly this term has a long his­to­ry of dif­fer­ent uses by dif­fer­ent groups. In par­tic­u­lar­ly, one may actu­al­ly be talk­ing about inserts. Usu­al­ly the intend­ed mean­ing is clear from con­text, but when it isn’t and its impor­tant to under­stand the pre­cise mean­ing, be sure to ask for clar­i­fi­ca­tion.
Guide (obso­lete)
(1) A read-sized sequence of the rel­e­vant genome sup­plied from an exter­nal data source, e.g. an STS mark­er, a BAC-end, or a fab­ri­cat­ed piece of a known BAC.
(1) A seg­ment of the tar­get genome placed into a vec­tor and ulti­mate­ly end-sequenced by us. For exam­ple, we are cur­rent­ly plan­ning on sequenc­ing the ends of a 4/1 mix of 2Kbp and 10Kbp inserts.
(1) A lay­out is a (par­tial) posi­tion­ing of a set of reads with respect to each oth­er sub­ject to the one con­straint that every pair of reads that over­lap in the lay­out do so as defined imme­di­ate­ly above. The term lay­out is intend­ed to specif­i­cal­ly speak to the arrange­ment of the reads as opposed to their mutu­al con­nec­tiv­i­ty (as in “con­tig” below) or the sequence(s) the set mod­els (as in “con­sen­sus” below). A lay­out includes the ori­en­ta­tion of the frag­ments and in the case that reads are mate-linked gives the esti­mat­ed dis­tance between con­tigs that con­tain each end of a mate pair­ing.
Mate-Pair or Mates
(1) A pair of reads tak­en from the end of a given insert.
Mul­ti Align­ment
(1) A mul­ti-align­ment of a set of over­lap­ping frag­ments is a matrix in which a row is a pos­si­bly emp­ty pre­fix of blanks, fol­lowed by the sequence of a frag­ment inter­spersed with dash­es, fol­lowed by a pos­si­bly emp­ty suf­fix of blanks. One gen­er­al­ly seeks the mul­ti-align­ment of the frag­ments that expos­es their sim­i­lar­i­ty and sup­ports the evi­dence for a par­tic­u­lar con­sen­sus sequence. Indeed, any com­pu­ta­tion that pro­duces a con­sen­sus either implic­it­ly or explic­it­ly com­putes a mul­ti-align­ment of the under­ly­ing reads.
(1) A pair of sequences, say A and B, over­lap if there is an inter­val of A and an inter­val of B that match to with­in a user-spec­i­fied lev­el of sim­i­lar­i­ty. If the sequenc­ing error rate is less than 2% than a match with few­er than 4% dif­fer­ences con­sti­tutes an over­lap. Typ­i­cal­ly, one is also imply­ing that the seg­ments involved con­sti­tute either a suffix/prefix pair (a “dove­tail over­lap”) or all of one of the two sequences (a “con­tain­ment over­lap”). In pic­tures,
   A -------------------          or    A --------------------.
         ------------------- B                 ---------- B
(1) A sin­gle sequence read pro­duced by an ABI 3700 by our inter­nal pro­duc­tion pipeline.
(1) Unit­igs that were used to fill a gap in a scaf­fold. They are usu­al­ly short and repet­i­tive. Rocks require high­er con­fi­dence joins than stones. (An even low­er con­fi­dence cat­e­go­ry, peb­bles, was dis­con­tin­ued after its use in the Cel­era assem­bly of Drosophi­la.) Rocks and stones are “thrown” into gaps late in the scaf­fold build­ing process. They are thrown in mul­ti­ple iter­a­tions, with the loop count con­trolled by a run-time para­me­ter.
(1) A max­i­mal set of con­tigs in a lay­out that are con­nect­ed togeth­er by mate-links.
(2) A lin­ear order­ing of con­tigs joined by mate pairs. A scaf­fold defines the order and ori­en­ta­tion (DNA strand) for each com­po­nent con­tig. There are two ways to mea­sure scaf­fold length. “Scaf­fold bases” is sum of con­tig lengths. “Scaf­fold span” is that plus the sum of gap lengths. Cel­era Assem­bler uses com­plex cri­te­ria to build scaf­folds, but some gen­er­al­iza­tions apply. Every gap in a scaf­fold was spanned by at least two mate pairs. A gap with neg­a­tive length means the sequence data and mate data dis­agree. Usu­al­ly, neg­a­tive gaps are small (20bp) and induced by low-qual­i­ty sequence at the end of a read. In the FASTA rep­re­sen­ta­tion of a scaf­fold, neg­a­tive gaps are rep­re­sent­ed by a fixed num­ber (20) of N’s.
(1) A read that could not assem­ble. Sin­gle­tons can rep­re­sent con­t­a­m­i­na­tion, unique sequence with no over­lap due to the fluc­tu­a­tion of ran­dom cov­er­age, or sequence with so many over­laps it could not be assem­bled effi­cient­ly. It can hap­pen that a mate pair has two sin­gle­tons, and in some con­texts the­se pairs are called mini-scaf­folds.
Sin­gle­ton Unit­ig
(1) A unit­ig con­sist­ing of a sin­gle frag­ment.
(1) A unit­ig whose arrival rate sta­tis­tic was beyond the expect­ed range. Such unit­igs are treat­ed as col­lapsed repeats. Their con­sen­sus may get placed in one or more scaf­folds. Some of their reads may get placed, by mates, late in the pipeline. When a repet­i­tive unit­ig can­not be placed even once, it becomes a degen­er­ate.
Unit­ig (also Chunk)
(1) A high-con­fi­dence con­tig seed. The end of a unit­ig is, by def­i­n­i­tion, a place where the over­lap data shows mul­ti­ple, mutu­al­ly con­tra­dic­to­ry, paths. Unit­igs are sup­posed to end at repeats.
(2) A unique­ly assem­bleable sub­set of over­lap­ping frag­ments. A unit­ig and/or chunk is an assem­bly of frag­ments for which there are no com­pet­ing choic­es in terms of inter­nal over­laps. This means that a chunk is either a cor­rect­ly assem­bled por­tion of a con­tig or it is an over­com­pressed assem­bly of sev­er­al high-fideli­ty copies of a repeat. Every frag­ment belongs to one chunk.
(1) A unit­ig with an arrival rate sta­tis­tic (based on unit­ig length and read cov­er­age) with­in the expect­ed range. The unique­ness des­ig­na­tion becomes impor­tant dur­ing the scaf­fold build­ing stage. Only a unique unit­ig can seed a con­tig. Con­tigs can be extend­ed by mates and over­laps from their unique unit­igs only.