software2

III. Bioinformatics Software

1. Comprehensive packages

BCM Search Luncher: incoperates several protein sequence searching tools, including BLAST, FASTA, Smith-Waterman, BEAUTY, PROSITE, and BLOCKS. Also contains other tools for sequence alignment, protein and DNA sequence analysis, protein secondary structure prediction, etc. In addition to a web interface, the Search Luncher Batch Client (SLBC) can be downloaded for UNIX, Mac, or PC to perform batch searches.
EMBOSS: contains over 100 programs for a variety of purposes in DNA/protein sequence analyses.
CuraTools® by CuraGen Co.: provides a simple interface for using the public and proprietary bioinformatics tools for data mining and sequence analysis. Results can be stored and easily accessed later. Registration is free for academic users.

2. Sequence Format Conversions

Readseq
Sequence formats conversions: a collection of links to several sequence format conversion tools.

3. Database Search

BLAST: the most powerful (fast) tool for comparing a query sequence(s) with a specified database, ranging from comprehensive databases, such as nrdb and SwissProt, to individual genome databases, such as human or E. coli genome database. Need to choose the right program such as blastn, blastp, blastx, tblastn and tblastx, based on forms of database and query sequences used. Refer to the Program Selection Guide.
PSI-BLAST (position-specific iterative BLAST): a profile-based protein sequence search; sensitive for finding domains and remote homologous sequences.
PHI-BLAST (Pattern Hit Initiated BLAST): sequence homology search using protein motifs.
FASTA: uses FASTA algorithm to perform similar functions as the BLAST package; no profile or motif-based searching functions.

4. DNA Sequence Analyses

a) DNA/Genome sequence viewer and annotation tool:

Artemis: a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation. It can read EMBL and GenBank database entries, GFF format, or sequence in FASTA format.
Apollo Genome Browser: for browsing and annotating large eukaryotic genomes. It is downloadable for UNIX or Windows.

b) Gene prediction:

GeneMark: one of the best programs for predicting open reading frames available now. Web interface.
Glimmer: another good program for predicting ORFs. It is the standard gene prediction software used by TIGR. A problem is that it often predicts TTG as a start codon.

c) Translation and backward translation tools:

Transeq from EMBOSS: has several options for codon tables. Color output indicates different properties of the amino acid residues.
The Protein Machine at EBI: similar to Transeq, but not as good.

d) Codon Usage:

Codon usage database
CondonW: for correspondence analysis of codon or amino acid usage as well as calculating the codon usage indices. The software is downloadable for UNIX or Windows.
EMBOSS: offers several other programs for codon analyses:

cai - CAI codon adaptation index

chips - Codon usage statistics

codcmp - Codon usage table comparison

cusp - Create a codon usage table

syco - Synonymous codon usage Gribskov statistic plot

e) Promoter predictions:

Promoter by Neural Network: offers choices for predicting eukaryotic or prokaryotic promoters.
Proscan: predicts eukaryotic Pol II promoters.
Promoter 2.0: predicts vertebrate PolII promoters.

f) Restriction site analysis:

g) Primer designs and analysis:

Primer Design: a collection of PCR-related web sites. First read the brief explanations on each program below the links would be helpful in choosing the proper software to use.

h) Others:

Grasp-DNA: screens for DNA-protein binding site and sequence repeats.
VecScreen: screens DNA sequences for vector contamination.

5. Protein Sequence Analyses

a) Sequence comparisons and alignments:

Clustal W: a web-based tool for multiple alignment of protein and nucleotide sequences.
Clustal X: a downloadable, Linux window-based, user-friendly interface for Clustal W. The alignment can be saved in color as a PostScript file that can be opened with GSview (distributed in the Aladin package downloadable from web site) or Adobe Photoshop and edited with the software described below. The .aln file can be opened with MS Word. Click for the help document.
BLAST 2 Sequences: ideal for pairwise comparison of two sequences.
WebLogo: is the web interface of Sequence Logo designed for displaying sequence conservation with graphics. Consensus residue in each position is correlated with the size of the letter. To use WebLogo, select "Graphical view of postscript logo", paste protein or nucleotide sequences in FASTA format, click "generate" and save the postscript file to your local computer. The result can be visualized with a postscript viewer such as GSview.

b) Sequence alignments editors:

Jalview: a nice alignment editor. Functions include: customizing color schemes based on consensus residues, physico-chemical properties, or secondary structures, editing gaps, pairwise sequence comparison, calculating consensus, and many more·
TeXshade: aligns nucleotide or amino acid residues and shade in customized colors with respect to functions. This software is commonly used in publications.
BoxShade: converts MSF and ClustalW formats to many other formats. Not as flexible as Jalview and TextShade.

c) Domain and motif detection:

RPS-BLAST (Reverse Position Specific BLAST): search against the Conserved Domain Database in NCBI (a combination of Pfam and SMART).
BLOCKS
SPLASH (Structural Pattern Localization Analysis by Sequential Histogram)
See "Protein Domain and Motif Databases and Search Tools" above for other software.

d) Membrane topology:

HMMTOP: one of the best programs for predicting transmembrane regions in prokaryotes. The executable binary is free for academic users.
TMHMM: another one of the best programs for predicting transmembrane regions in prokaryotes. Allows batch search online. The executable binary is free for academic users.
MemSat: might be the best free program for predicting transmembrane regions in eukaryotic proteins, but is not good for prokaryotic proteins. Click for the source code.

e) Signal peptide predictions:

SignalP V2.0: version 2.0 combined a neural network (NN) and a hidden Markov model (HMM). NN is sensitive to detect signal peptidase cleavage sites and HMM can be used to distinguish signal peptides and signal anchors. SignalP V2.0 is probably the most sensitive software so far. However, the frequency of false positive can be up to 18%.
PSORT: predicts the subcellular location and the N-terminal signal sequences. Not as good as SignalP V2.0.

f) Secondary structure predictions:

PSIpred: one of the most accurate software up to date (Q3 = 78%). Academic e-mail address is required. A text result will be e-mailed to the user, along with the link to obtain graphic output in three different formats: PostScript, PDF, or JPEG.
Jpred2: accuracy is close to PSIpred. No e-mail address is necessary.
WHAT: a convenient program for determining the hydrophobicity and amphipathicity of a protein. To predict the amphipathicity, set the angels as 100^o for alpha-helix or as 180^o for beta-sheet. The graphic output can be saved as a GIF file and the Excel output can be copied to MS Excel for drawing a customized diagram.
AveHAS: is a modified version of WHAT that allows analysis of the average hydrophobicity and amphipathicity based on a multiple sequence alignment.

g) 3D structure: search, view and modeling

VAST search (Vector Alignment Search Tool): a structure-structure similarity search tool provided by NCBI for identifying structural neighbors.
RasMol: Free downloadable software for viewing the structures of macromolecules.
Cn3D: another free downloadable software for viewing the macromolecular structures provided by NCBI.

6. Phylogenetic Analyses:

Clustal X (see the Protein Sequence Analyses section).
PHYLIP: uses parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus trees to infer phylogeny.
Treeview: a program used for viewing, editing, and printing phylogenetic tress. The tree can also be copied to a drawing program such as MS powerpoint or Freehand for further edition.
MEGA (Molecular Evolutionary Genetics Analysis): a free downloadable software package for sequence and phylogenetic analysis on Windows or other operating systems for which Windows emulators are available. It can calculate DNA or protein distance statistically and generate trees using a variety of algorithms.
MEP (Molecular Evolution and Phylogenetics)

7. RNA Analyses:

RNA structure logo

8. Other Programs:

Qinhong Ma

qma@biomail.ucsd.edu

Last update: Dec. 27, 2002