How to use field search by using ARSA


The method searchByXMLPath in ARSA provides a function of field search.
The item in the field search can be specified in the XML tag(XML path).
The searchByXMLPath provides 4 parameters as follows.

1. queryPath: Query XML path.(You can specify plural parameters)
2. returnPath: Return XML path.(Result is shown with tab separator.)
3. offset: Specify a start position of getting result.
4. count: Specify a getting number of results. (Max 1000)

Please specify the parameter queryPath, returnPath, offset and count in the REST service.
Click here to see sample program of using REST service.

WSDL URL for SOAP is as follows.
http://xml.nig.ac.jp/wsdl/ARSA.wsdl

By using XML path in your retrieval query, you are able to do "detailed retrieval" that specify various items.
Moreover, you are able to get results include various items (not only accession number but also definition, organism and so on.)
by using XML path in your return query.

About the item that can be specified by each database

Please see the following links for the item that can be specified by each database.
The pages are described about the description of each item and examples of XML path.

Sequence Libraries
DDBJ DAD PRF
UniProt/Swiss-Prot UniProt/TrEMBL IMGT/LIGM-DB
Sequence Related
PROSITE PROSITEDOC BLOCKS
PRINTS PFAMA PFAMB
PRODOM ENZYME
Protein 3D Structures
PDB HSSP FSSP
Metabolic Pathways
KEGG PATHWAY LENZYME LCOMPOUND

Besides, you can see XML path examples by using Advanced Search or Cross Search of ARSA System.
e.g,
/ENTRY/DDBJ/definition = 'Nostoc sp\. \'Peltigera canina\' sample 13 trnL gene, intron sequence' 
However, you should exclude escape sequence except for one before single quotation in Web API.
e.g,
/ENTRY/DDBJ/definition = 'Nostoc sp. \'Peltigera canina\' sample 13 trnL gene, intron sequence' 

QueryPath

Query path can be specified as follows.
[XPath] [Relational Operators] [Query] [Logical Operators] [XPath] [Relational Operators] [Query]...

Relational Operators

For word searches There are partial matches and perfect matches.

Type Relational Operators Description
Partial match  =  Return entries when the keyword exists in the content of XPath elements
 !=  Return entries when the keyword does not exist in the content of XPath elements
Perfect match  ==  Return entries when the keyword and the content of XPath elements match perfectly
 !==  Return entries when the keyword and the content of XPath elements do not match perfectly

For value searches there are matches and magnitude comparisons

Type Relational Operators Description
Match = Return entries when the entered numerical value and the content of XPath elements match
!= Return entries when the entered numerical value and the content of XPath elements do not match
Magnitude comparison <, <=, >, >= Compares the magnitude of the content of XPath entities and entered numerical values

Logical Operators

Logical operators are used to connect keywords when specifying multiple keywords for a search.

Type Logical operators Description
Conjunction AND Keywords can be connected using word &quot;AND&quot;.Space character and capital letters are required such as " AND ".
Disjunction OR Keywords can be connected using word &quot;OR&quot;.Space character and capital letters are required such as " OR ".

Query

You can specify any keywords.
You should enclose your search query with " ' ", if you want to search as 'word search' like DEFINITION..
For example, if you want to get DDBJ Entry which including '16S ribosomal RNA' in DEFINITION.
/ENTRY/DDBJ/definition = '16S ribosomal RNA'
You should not enclose your search query with " ' ", if you want to search as 'numeric search' like sequence length.
For example, if you want to get DDBJ Entry which has sequence length of between 1000bp to 20000bp.
/ENTRY/DDBJ/length > 10000 AND /ENTRY/DDBJ/length < 20000

There are three restrictions.

Query word length

Please specify two or more characters for query word.

Upper limit of query number

The query number which can be specified at a time is up to 20.

Add Escape sequence

You should add "Escape sequence" to single-quotation in your query.
For example, if single-quotation exists in your query like "'Peltigera canina'", please add escape sequence like "\'Peltigera canina\'".

ReturnPath

You can get plural items by using ','.
e.g, /ENTRY/DDBJ/primary-accession,/ENTRY/DDBJ/moltype

Search example

See details of XMLPath

Access from perl

To get entries which feature key is 'rRNA' , qualifier name is 'product' and qualifier value has '16S ribosomal RNA', specify as follows.
use LWP::UserAgent;
$ua = new LWP::UserAgent;

# make request
my $req = new HTTP::Request POST => 'http://xml.nig.ac.jp/rest/Invoke';
$req->content_type('application/x-www-form-urlencoded');
# set parameters

# you should encode your query.
$query = "/ENTRY/DDBJ/feature-table/feature/f_key=='rRNA' AND ";
$query .= "(/ENTRY/DDBJ/feature-table/feature{/f_key=='rRNA' AND ";
$query .= "/f_quals/qualifier{/q_name=='product' AND /q_value='16S ribosomal RNA'}})";
$query =~ s/([^\w ])/'%'.unpack('H2', $1)/eg;
$query =~ tr/ /+/; 

$return = "/ENTRY/DDBJ/primary-accession,/ENTRY/DDBJ/definition";
$return =~ s/([^\w ])/'%'.unpack('H2', $1)/eg;
$return =~ tr/ /+/; 

$offset = "1";
$count = "100";

$req->content("service=ARSA&method=searchByXMLPath&queryPath=$query&returnPath=$return&offset=$offset&count=$count");

# send request and get response.
my $res = $ua->request($req);
# If you want to get a large result. It is better to write to a file directly.
# my $res = $ua->request($req,'file_name.txt');

# show response.
print $res->content;
Download source code