download
@redundancy remover
@redundancy
remover: redundancy remover is a nice program to remove the
redundant sequences in the fasta files. Often in a huge fasta file, its
difficult to remove exactly identical sequences by manually comparing
each and every character. redundancy remover can come in handy at such
times to blast every sequence against remaining sequences, compare these
sequences and thrash out redundant sequences. Program takes a PIR file
as an input. This PIR file has to be generated from an aligned fasta
file.
The step by step procedure to use this utility is
explained below:
1) Get the fasta file, align it using clustal
software.
2) Go to following link. Input your alignment file
and convert into NBRF/PIR format.
http://bioweb.pasteur.fr/seqanal/interfaces/clustalw_convert.html
3) Input this PIR file to this utility. Program also
prompts to input a number for maximum characters which don't match
between the sequences. For eg. If you input 5, program compares two
sequences and finds if the number of non-matching sequences are greater
than this number. If not, it lists the two similar sequences in command
prompt and thrash out the second sequence from output PIR file (seq.pir).
Sample PIR file can be obtained from here: FliF.pir
4) Take output PIR file, go again to same above link,
convert PIR file into clustal alignment file and use it for your purpose
hereafter.
The command to run this utility is
> java redundancy_c
I generated this utility to remove redundant
sequences during phylogenetic analysis of certain proteins. Please
do file conversion only from link given above. I tested the program for
files converted from that website only. Thank you !