download @redundancy remover

@redundancy remover: redundancy remover is a nice program to remove the redundant sequences in the fasta files. Often in a huge fasta file, its difficult to remove exactly identical sequences by manually comparing each and every character. redundancy remover can come in handy at such times to blast every sequence against remaining sequences, compare these sequences and thrash out redundant sequences. Program takes a PIR file as an input. This PIR file has to be generated from an aligned fasta file.

The step by step procedure to use this utility is explained below:

1) Get the fasta file, align it using clustal software.

2) Go to following link. Input your alignment file and convert into NBRF/PIR format.

http://bioweb.pasteur.fr/seqanal/interfaces/clustalw_convert.html

3) Input this PIR file to this utility. Program also prompts to input a number for maximum characters which don't match between the sequences. For eg. If you input 5, program compares two sequences and finds if the number of non-matching sequences are greater than this number. If not, it lists the two similar sequences in command prompt and thrash out the second sequence from output PIR file (seq.pir).

Sample PIR file can be obtained from here: FliF.pir

4) Take output PIR file, go again to same above link, convert PIR file into clustal alignment file and use it for your purpose hereafter.

The command to run this utility is

> java redundancy_c

I generated this utility to remove redundant sequences during phylogenetic analysis of certain proteins. Please do file conversion only from link given above. I tested the program for files converted from that website only. Thank you !