Blast-ClustalW workflow


Purpose
Execute blastn against DDBJ database with a given DNA sequence and compare the alignment regions of high similar sequences by using ClustalW.

Composition
Workflow is composed of the steps in the following activity diagram. The activity diagram is one of the diagrams in UML and it is used for system analysis. Click here for details.


Target
Any nucleotide sequence such as 16S rRNA

Procedure of workflow
1. Execute BLAST against BCT division
Execute Blast:searchParam service with any nucleotide sequence suc as 16S rRNA. Reference database is ddbjbct(bacteria), Specified option is "-F F -e 0.001 -m 8".
2. Retrieve sequences of alignment region of subject entries.
2.1 Get the subject entry of BLAST result and extract a sequence of an alignment region. There are following two ways to get an entry.
a Use GetEntry:getDDBJEntry to get an entry.
b Use EMBL Fetch as alternative service from Meta-database to get an entry.
2.2 Extract a DNA sequence of an alignment region from the previous entry.
3. Execute ClustalW
Execute ClustalW:analyzeSimple with step 2's result.
Introduction of program(GetEntry version)
There are programs of Java version and perl version. Both needs FASTA of 16S rRNA as argument.
Java sourcetar.gzzip
Perl sourcetar.gzzip

Introduction of program(EMBL fetch version)
Java sourcetar.gzzip

How to use
Java:
java RunBlastClustalW test.txt 
Perl:
perl BlastClustalW.pl test.txt
Construct with Taverna
The following image was generated by Taverna GUI.

This workflow's xml file for Taverna is here.
Execution result is here. (Last update: Feb. 23, 2010.)