- rpm install putty;
- The data annotated with tool (Perl version) can only generated according to the old database rather than new.(unsolved yet 20160710)
Account: Yu Wang_nott
- Itunes on windows 10 64-bit (unsolved yet 20160726~)
- download “media creation tool”;
- run “media creation tool”;
- Select Create installation media for another PC.
- download .iso file.
- 进入winpe系统后，会自动弹出对话窗口“U深度装机工具”，并检测出之前存放于U盘中的ghost系统；（Ghost系统是指通过赛门铁克公司（Symantec Corporation）出品的Ghost在装好的操作系统中进行镜像克隆的版本，通常GHOST用于操作系统的备份，在系统不能正常启动的时候用来进行恢复的。）
- or: returns true if an argument is true
or(logical value 1, logical value 2, …)
- isna: returns true if value equals #N/A
- isblank: returns true if value refers to an empty cell
- not: reverses the value of the argument
- na: not available. returns the error value #N/A
- Atom : characters
- Molecule : tokens
- Built-in Data Type : scalars, arrays (arrays of scalars) and hashes (hashes of scalars).
- input 32-bit extension ID; for example:
If we want to download this extension “https://chrome.google.com/webstore/detail/proxy-switchyomega/padekgcemlokbadohgkifijomclgjgif”, then the 32-bit extension ID of it is “padekgcemlokbadohgkifijomclgjgif”.
- Drop the .crx file into the chrome extension page.
- Cytoscape — Agilent Literature Search
The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein–protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein–protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.
- wine QQ
Gnome/Unity Menu editor: menulibre
- check OS architecture: 32-bit or 64-bit?
- check if we have Java installed on our OS
if we should uninstall OpenJDK if we have one installed on our OS.
sudo apt-get purge openjdk-\*
- create a directory to hold our Oracle Java JDK binaries.
sudo mkdir –p /usr/local/oracle-java
- Download Oracle Java JDK for linux
- copy the file we download to the directory we create
sudo cp –r jdk-8u20-linux-i586.tar.gz /usr/local/oracle-java/
- unpack the compressed file we download
sudo tar xvzf jdk-8u91-linux-x64.tar.gz
- Edit the system PATH file /etc/profile and add the following system variables to your system path.
sudo vim /etc/profile
then add the following lines below to the end of the file profile (finally save and exit):
- Inform our OS where our Oracle Java JDK is located.
sudo update-alternatives –install “/usr/bin/java” “java” “/usr/local/oracle-java/jdk1.8.0_91/bin/java” 1
sudo update-alternatives –install “/usr/bin/javac” “javac” “/usr/local/oracle-java/jdk1.8.0_91/bin/javac” 1
sudo update-alternatives –install “/usr/bin/javaws” “javaws” “/usr/local/oracle-java/jdk1.8.0_91/bin/javaws” 1
- Inform our OS that Oracle Java JDK must be the default Java.
sudo update-alternatives –set java /usr/local/oracle-java/jdk1.8.0_91/bin/java
sudo update-alternatives –set javac /usr/local/oracle-java/jdk1.8.0_91/bin/javac
sudo update-alternatives –set javaws /usr/local/oracle-java/jdk1.8.0_91/bin/javaws
- Reload our system wide PATH /etc/profile
- test if we hava Java installed
4=readable 2=writable 1=executable
The iconv program reads in text in one encoding and outputs the text in
another encoding. If no input files are given, or if it is given as a
dash (-), iconv reads from standard input. If no output file is given,
iconv writes to standard output.
iconv [options] [-f from-encoding] [-t to-encoding] [inputfile] > [outputfile]
iconv -f gb2312 -t utf8 [inputfile] -o [outputfile]
The Dos2unix package includes utilities "dos2unix" and "unix2dos" to
convert plain text files in DOS or Mac format to Unix format and vice
In DOS/Windows text files a line break, also known as newline, is a
combination of two characters: a Carriage Return (CR) followed by a
Line Feed (LF). In Unix text files a line break is a single character:
the Line Feed (LF). In Mac text files, prior to Mac OS X, a line break
was single Carriage Return (CR) character. Nowadays Mac OS uses Unix
style (LF) line breaks.
Besides line breaks Dos2unix can also convert the encoding of files. A
few DOS code pages can be converted to Unix Latin-1. And Windows
Unicode (UTF-16) files can be converted to Unix Unicode (UTF-8) files.dos2unix [options] [FILE ...] [-n INFILE OUTFILE ...]
- Network service discovery disabled
Your current network has a .local domain, which is not recommended and incompatible with the Avahi network service discovery. The service has been disabled.
sudo vim /etc/default/avahi-daemon
Make the parameter below from 1 to 0
sudo dpkg --get-selections | grep linux-
sudo apt-get purge 后面跟上两类文件，一类是“linux-headers”，另一类是“linux-image”，这两者是成对的。当前使用的内核不能删除。
4. 清理deinstall （这是一条组合命令，先得到标识为deinstall的名称，再purge。）
dpkg --purge `dpkg --get-selections | grep deinstall | cut -f 1`
sudo vim /etc/default/grub GRUB_DEFAULT=0 #GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 4.10.0-041000-generic" sudo update-grub
RNA-seq (RNA sequencing), also called whole transcriptome shotgun sequencing(WTSS), uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment in time.
RNA-Seq is used to analyze the continually changing cellular transcriptome. Specifically, RNA-Seq facilitates the ability to look at alternative gene spliced transcripts, post-transcriptional modifications, gene fusion, mutations/SNPs and changes in gene expression. In addition to mRNA transcripts, RNA-Seq can look at different populations of RNA to include total RNA, small RNA, such as miRNA, tRNA, and ribosomal profiling. RNA-Seq can also be used to determine exon/intron boundaries and verify or amend previously annotated 5’ and 3’ gene boundaries.
Prior to RNA-Seq, gene expression studies were done with hybridization-based microarrays. Issues with microarrays include cross-hybridization artifacts, poor quantification of lowly and highly expressed genes, and the knowledge of the sequence. Because of these technical issues, transcriptomics transitioned to sequencing-based methods. These progressed from Sanger sequencing of Expressed Sequence Tag libraries, to chemical tag-based methods (e.g., serial analysis of gene expression), and finally to the current technology, NGS of cDNA (notably RNA-Seq).
- Biostars https://www.biostars.org/
The Ensembl project was started in 1999, some years before the draft human genome was completed. Even at that early stage it was clear that manual annotation of 3 billion base pairs of sequence would not be able to offer researchers timely access to the latest data. The goal of Ensembl was therefore to automatically annotate the genome, integrate this annotation with other available biological data and make all this publicly available via the web.
Single search engine of NCBI. Entrez is not a gene database. It’s the name of the NCBI infrastructure which provides access to all of the NCBI databases. One of those is the Gene database, so you would say “Entrez Gene”.
there is not necessarily an one-to-one mapping between Entrez Gene and Ensembl Gene IDs.
- Fcitx: a input method framework with extension support, which provides an interface for entering characters of different scripts in applications using a variety of mapping systems.
sudo apt-get install fcitx-table-wbpy
- baobab,synaptic: Disk Usage Analyzer
sudo apt-get install baobab synaptic
- upgrade ubuntu:
sudo update - manager -c -d
- Install Oracle Java:
sudo add-apt-repository ppa:ubuntu-wine/ppa sudo apt-get update sudo apt-get install wine1.8
sudo apt-get install kolourpaint4
- Process Viewer：Htop
an interactive process viewer for Unix systems.
sudo apt-get install htop
sudo add-apt-repository ppa:fossfreedom/indicator-sysmonitor sudo apt-get update sudo apt-get install indicator-sysmonitor
- Move launcher to bottom
gsettings set com.canonical.Unity.Launcher launcher-position Bottom
DAVID 6.8 (current beta release) May. 2016
– The DAVID Knowledgebase completely rebuilt
— Entrez Gene integrated as the central identifier to allow for more timely updates
while still incorporating Ensembl and Uniprot as integral data sources
— New GO category (GO Direct) provides GO mappings directly annotated by the source database (no parent terms included)
— New annotation categories
— New list identifier systems added for list uploading and conversion
— A few bugs fixed
The Bioinformatics and Integrative Genomics (BIG) Program enrolls PhD students with exceptional training in quantitative sciences and strong interest in biomedical applications. Research areas encompass computational analysis and mathematical modeling of data generated by DNA sequence, gene expression, structural, proteomics, and metabolite-assaying technologies. In applied projects, they may also include integration of clinical and population data from electronic health records. Both bioinformatics and genomics are tightly linked to the mathematical and biophysical modeling of complex biological systems and experimental validation of computational predictions. Graduate students will conduct original research in the development of novel approaches and new technologies to address fundamental biological questions, and they will acquire the skills to be leaders in the field of bioinformatics and genomics. Students will be joint members of BIG and a “home program” chosen from one of the four DMS programs (BBS, Immunology, Neuroscience, Virology). BIG students will follow the curriculum and participate in activities of the home program, which will be supplemented with BIG programmatic and curricular offerings.
the theoretical “fold-coverage” of a shotgun sequencing experiment:
<number of reads> * <read length> / <target size>
An amplicon is a piece of DNA or RNA that is <the source and/or product of natural or artificial amplification or replication events>.
It can be formed using various methods including polymerase chain reactions (PCR), ligase chain reactions (LCR), or natural gene duplication.
3.Whole genome mapping
A Whole Genome Map is a high-resolution, ordered, whole genome restriction map generated from single DNA molecules extracted from bacteria, yeast, or other fungi. Whole Genome Mapping is a novel technology with unique capabilities in the field of microbiology, with specific applications in the areas of Comparative Genomics, Strain Typing, and Whole Genome Sequence Assembly. Whole Genome Maps are generated de novo, independent of sequence information, require no amplification or PCR steps, and provide a comprehensive view of whole genome architecture. A Whole Genome Map is displayed in the MapCode pattern where the vertical lines indicate the locations of restriction sites, and the distance between the lines represent the restriction fragment size.
4.Radiation hybrid mapping
A theory is developed to predict marker retention and conditional retention or loss in radiation hybrids. Applied to multiple pairwise analysis of a human chromosome 21 data set, this theory fits much better than proposed alternatives and gives a physical map consistent with other evidence and robust with respect to errors to typing. Radiation hybrids have great promise to provide order and physical location at two levels of resolution, spanning the techniques of linkage and restriction fragments and not limited to polymorphic loci.
DNA barcoding is a taxonomic method that uses a short genetic marker in an organism’s DNA to identify it as belonging to a particular species
In mathematics, a metric space is a set for which distances between all members of the set are defined. Those distances, taken together, are called a metric on the set.
In mathematics, a pseudometric space is a generalized metric space in which the distance between two distinct points can be zero.
Pyrosequencing is a method of DNA sequencing (determining the order of nucleotides in DNA) based on the “sequencing by synthesis” principle. It differs from Sanger sequencing, in that it relies on the detection of pyrophosphate release on nucleotide incorporation, rather than chain termination with dideoxynucleotides.The desired DNA sequence is able to be determined by light emitted upon incorporation of the next complementary nucleotide by the fact that only one out of four of the possible A/T/C/G nucleotides are added and available at a time so that only one letter can be incorporated on the single stranded template (which is the sequence to be determined). The intensity of the light determines if there are more than one of these “letters” in a row. The previous nucleotide letter (one out of four possible dNTP) is degraded before the next nucleotide letter is added for synthesis: allowing for the possible revealing of the next nucleotide(s) via the resulting intensity of light (if the nucleotide added was the next complementary letter in the sequence). This process is repeated with each of the four letters until the DNA sequence of the single stranded template is determined.
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus.
DNA sequencing:base pair
…, A, G, C, T, T, C, G, A, …
…, AG, GC, CT, TT, TC, CG, GA, …
…, AGC, GCT, CTT, TTC, TCG, CGA, …
In evolutionary biology, sequence space is a way of representing all possible sequences (for a protein, gene or genome).
12.optical map(ordered restriction map)
Optical mapping is a technique for constructing ordered, genome-wide, high-resolution restriction maps from single, stained molecules of DNA, called “optical maps”. By mapping the location of restriction enzyme sites along the unknown DNA of an organism, the spectrum of resulting DNA fragments collectively serve as a unique “fingerprint” or “barcode” for that sequence.
A restriction map is a map of known restriction sites within a sequence of DNA. Restriction mapping requires the use of restriction enzymes. In molecular biology, restriction maps are used as a reference to engineer plasmids or other relatively short pieces of DNA, and sometimes for longer genomic DNA.
14.Expressed sequence tag
An expressed sequence tag or EST is a short sub-sequence of a cDNA sequence.They may be used to identify gene transcripts, and are instrumental in gene discovery and gene sequence determination. The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases (e.g. GenBank 1 January 2013, all species).
15.Multiple Sequencing Alignment
A Multiple Sequence Alignment (MSA) is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences’ shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations (single amino acid or nucleotide changes) that appear as differing characters in a single alignment column, and insertion or deletion mutations (indels or gaps) that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.
16.POA(Partial Order Alignment)
Partial order alignment (POA) has been proposed as a new approach to multiple sequence alignment (MSA), which can be combined with existing methods such as progressive alignment. This is important for addressing problems both in the original version of POA (such as order sensitivity) and in standard progressive alignment programs (such as information loss in complex alignments, especially surrounding gap regions).
This approach begins with the alignment of the two most closely related sequences (as determined by pairwise analysis) and subsequently adds the next closest sequence or sequence group to this initial pair [37,7]. This process continues in an iterative fashion, adjusting the positioning of indels in all sequences. The major shortcoming of this approach is that a bias may be introduced in the inference of the ordered series of motifs (homologous parts) because of an overrepresentation of a subset of sequences.
18.核糖体小亚基（英文：Ribosomal Small Subunit，简称“SSU”）
|原核细胞 （70S核糖体）||大亚基：50S亚基（包含5S rRNA及23S rRNA）|
|真核细胞||细胞质核糖体 （80S核糖体）||大亚基：60S亚基（包含5S rRNA、5.8S rRNA及28S rRNA）|
Low-abundance high-diversity group is what is now called the “Rare Biosphere”.
20.Phred quality score
Phred quality scores were originally developed by the program Phred to help in the automation of DNA sequencing in the Human Genome Project. Phred quality scores are assigned to each nucleotide base call in automated sequencer traces. Phred quality scores have become widely accepted to characterize the quality of DNA sequences, and can be used to compare the efficacy of different sequencing methods. Perhaps the most important use of Phred quality scores is the automatic determination of accurate, quality-based consensus sequences.
Base calling is the process of assigning bases (nucleobases) to chromatogram peaks. One of the best computer programs for accomplishing this job is Phred base-calling, which is currently the most widely used basecalling software program by both academic and commercial DNA sequencing laboratories because of its high base calling accuracy
22.MIAME(Minimum Information About a Microarray Experiment)
describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment.
1.The raw data for each hybridisation.
2.The final processed data for the set of hybridisations in the experiment (study)
3.The essential sample annotation, including experimental factors and their values
4.The experiment design including sample data relationships
5.Sufficient annotation of the array design
6.Essential experimental and data processing protocols
Lead discovery by DNA-encoded chemical libraries.