III. Bioinformatics Software
1.
Comprehensive packages
- BCM Search Luncher: incoperates several
protein sequence searching tools, including BLAST, FASTA, Smith-Waterman,
BEAUTY, PROSITE, and BLOCKS. Also contains other tools for sequence
alignment, protein and DNA sequence analysis, protein secondary structure
prediction, etc. In addition to a web interface, the Search Luncher Batch
Client (SLBC) can be downloaded for UNIX, Mac, or PC to perform batch
searches.
- EMBOSS: contains over 100
programs for a variety of purposes in DNA/protein sequence analyses.
- CuraToolsŪ by CuraGen Co.:
provides a simple interface for using the public and proprietary
bioinformatics tools for data mining and sequence analysis. Results can be
stored and easily accessed later. Registration is free for academic users.
2. Sequence
Format Conversions
3. Database
Search
- BLAST: the most powerful
(fast) tool for comparing a query sequence(s) with a specified database,
ranging from comprehensive databases, such as nrdb and SwissProt, to
individual genome databases, such as human or E. coli genome database.
Need to choose the right program such as blastn, blastp, blastx, tblastn
and tblastx, based on forms of database and query sequences used. Refer to
the Program Selection Guide.
- PSI-BLAST (position-specific
iterative BLAST): a profile-based protein sequence search; sensitive for
finding domains and remote homologous sequences.
- PHI-BLAST (Pattern Hit
Initiated BLAST): sequence homology search using protein motifs.
- FASTA: uses FASTA algorithm to perform similar
functions as the BLAST package; no profile or motif-based searching
functions.
4. DNA
Sequence Analyses
a)
DNA/Genome sequence viewer and annotation tool:
- Artemis: a free genome viewer and annotation tool that
allows visualization of sequence features and the results of analyses
within the context of the sequence, and its six-frame translation. It can
read EMBL and GenBank database entries, GFF format, or sequence in
FASTA format.
- Apollo Genome Browser: for browsing and
annotating large eukaryotic genomes. It is downloadable for UNIX or
Windows.
b)
Gene prediction:
- GeneMark: one of the best
programs for predicting open reading frames available now. Web interface.
- Glimmer: another good
program for predicting ORFs. It is the standard gene prediction software
used by TIGR. A problem is that it often predicts TTG as a start codon.
c)
Translation and backward translation tools:
- Transeq from EMBOSS: has
several options for codon tables. Color output indicates different
properties of the amino acid residues.
- The Protein Machine at EBI: similar to
Transeq, but not as good.
d)
Codon Usage:
- Codon usage database
- CondonW: for correspondence
analysis of codon or amino acid usage as well as calculating the codon
usage indices. The software is downloadable for UNIX or Windows.
- EMBOSS: offers several
other programs for codon analyses:
cai
- CAI codon adaptation index
chips
- Codon usage statistics
codcmp
- Codon usage table comparison
cusp
- Create a codon usage table
syco
- Synonymous codon usage Gribskov statistic plot
e)
Promoter predictions:
f)
Restriction site analysis:
g)
Primer designs and analysis:
- Primer Design: a collection of
PCR-related web sites. First read the brief explanations on each program
below the links would be helpful in choosing the proper software to use.
h)
Others:
- Grasp-DNA: screens for
DNA-protein binding site and sequence repeats.
- VecScreen: screens DNA
sequences for vector contamination.
5. Protein
Sequence Analyses
a)
Sequence comparisons and alignments:
- Clustal W: a web-based tool
for multiple alignment of protein and nucleotide sequences.
- Clustal X: a downloadable,
Linux window-based, user-friendly interface for Clustal W. The alignment
can be saved in color as a PostScript file that can be opened with GSview
(distributed in the Aladin package downloadable from web site) or Adobe
Photoshop and edited with the software described below. The .aln file can
be opened with MS Word. Click for the help document.
- BLAST 2 Sequences: ideal for pairwise
comparison of two sequences.
- WebLogo: is the web interface
of Sequence Logo designed for
displaying sequence conservation with graphics. Consensus residue in each
position is correlated with the size of the letter. To use WebLogo, select
"Graphical view of postscript logo", paste protein or nucleotide
sequences in FASTA format, click "generate" and save the
postscript file to your local computer. The result can be visualized with
a postscript viewer such as GSview.
b)
Sequence alignments editors:
- Jalview: a nice alignment
editor. Functions include: customizing color schemes based on consensus
residues, physico-chemical properties, or secondary structures, editing
gaps, pairwise sequence comparison, calculating consensus, and many more·
- TeXshade: aligns nucleotide
or amino acid residues and shade in customized colors with respect to
functions. This software is commonly used in publications.
- BoxShade: converts MSF and
ClustalW formats to many other formats. Not as flexible as Jalview and
TextShade.
c)
Domain and motif detection:
- RPS-BLAST (Reverse Position
Specific BLAST): search against the Conserved Domain Database in NCBI (a
combination of Pfam and SMART).
- BLOCKS
- SPLASH (Structural
Pattern Localization Analysis by Sequential Histogram)
- See "Protein
Domain and Motif Databases and Search Tools" above for
other software.
d)
Membrane topology:
- HMMTOP: one of the best
programs for predicting transmembrane regions in prokaryotes. The
executable binary is free for academic users.
- TMHMM: another one of the
best programs for predicting transmembrane regions in prokaryotes. Allows
batch search online. The executable binary is free for academic users.
- MemSat: might be the best
free program for predicting transmembrane regions in eukaryotic proteins,
but is not good for prokaryotic proteins. Click for the source code.
e)
Signal peptide predictions:
- SignalP V2.0: version 2.0 combined a neural
network (NN) and a hidden Markov model (HMM). NN is sensitive to detect
signal peptidase cleavage sites and HMM can be used to distinguish signal
peptides and signal anchors. SignalP V2.0 is probably the most sensitive
software so far. However, the frequency of false positive can be up to
18%.
- PSORT: predicts the
subcellular location and the N-terminal signal sequences. Not as good as
SignalP V2.0.
f)
Secondary structure predictions:
- PSIpred: one of the most
accurate software up to date (Q3 = 78%). Academic e-mail address is required.
A text result will be e-mailed to the user, along with the link to obtain
graphic output in three different formats: PostScript, PDF, or JPEG.
- Jpred2: accuracy is close
to PSIpred. No e-mail address is necessary.
- WHAT: a convenient
program for determining the hydrophobicity and amphipathicity of a
protein. To predict the amphipathicity, set the angels as 100 o
for alpha-helix or as 180o for beta-sheet. The graphic output
can be saved as a GIF file and the Excel output can be copied to MS Excel
for drawing a customized diagram.
- AveHAS: is a modified
version of WHAT that allows analysis of the average hydrophobicity and
amphipathicity based on a multiple sequence alignment.
g)
3D structure: search, view and modeling
- VAST search (Vector Alignment Search
Tool): a structure-structure similarity search tool provided by NCBI for
identifying structural neighbors.
- RasMol: Free downloadable
software for viewing the structures of macromolecules.
- Cn3D: another free downloadable
software for viewing the macromolecular structures provided by NCBI.
6.
Phylogenetic Analyses:
- Clustal X (see the Protein
Sequence Analyses section).
- PHYLIP: uses parsimony,
distance matrix, and likelihood methods, including bootstrapping and
consensus trees to infer phylogeny.
- Treeview: a program used for
viewing, editing, and printing phylogenetic tress. The tree can also be
copied to a drawing program such as MS powerpoint or Freehand for further
edition.
- MEGA (Molecular
Evolutionary Genetics Analysis): a free downloadable software package for
sequence and phylogenetic analysis on Windows or other operating systems
for which Windows emulators are available. It can calculate DNA or protein
distance statistically and generate trees using a variety of algorithms.
- MEP (Molecular Evolution
and Phylogenetics)
7. RNA
Analyses:
8. Other
Programs:

Qinhong Ma
qma@biomail.ucsd.edu
Last update: Dec. 27, 2002