Queries to multiple databases

From: Tony Linde <Tony.Linde-at-leicester.ac.uk>
Date: Fri, 27 May 2005 20:23:17 +0100


This is an issue which I raised during the dicussions of ADQL at Kyoto. I don't think I explained myself very well, nor understood Bob's response properly. I've since talked to the AstroGrid Science Advisory Group (AG-SAG) which met last week and (think) I understand it more, so let's try again.

I initially asked that we discuss whether we need to be able to formulate queries in ADQL using UCDs rather than just column names. The reason for this was to enable one query to be formulated and sent to more than one database, without the user having to reframe the query with a new set of column names each time.

I did say (or meant to say) that the astronomer might want to formulate the query in generic terms (UCDs are all we have now - data models in future) then specify criteria for database selection (presumably including whether they include those UCDs), and have an agent (portal or workflow or client software or ...) go determine which databases met those criteria (using the metadata in the registry) and submit the query to them (without reference back to the user), returning the results along with metadata explaining which databases were chosen and why (or, in the case of workflow, simply logging the metadata while using the results in downstream apps).

Bob shot down this idea, saying that astronomers would never want to submit queries to unknown databases. The AG-SAG, however, agreed that this, though not a crucial and immediate requirement, would be required in the future in some form. I think I explained better at that meeting about being able to set up front the criteria against which the databases are chosen, including quality criteria (perhaps using the validationLevel / validatingOrganisation metadata added to the registry during the interop).

Does this now work for you, Bob? Have I explained it better?

As I see it then, ADQL needs to support the construction of:

  1. a single _conceptual_ query, restated manually by the user (to name known columns) for each selected database;
  2. a single query stated in generic terms and sent to multiple known and pre-selected databases;
  3. a single query stated in generic terms and sent to an unknown set of databases selected on basis of user-specified criteria (as part of workflow): user only knows which databases by inspecting metadata returned from workflow.

Although it is not the job of VOQL (though in the context of SkyNode, maybe it is) to specify how queries are constructed, we could imagine the following ways in which a portal or query-builder could operate:

  1. user selects databases; query builder only allows query to be built using common UCDs; if any UCD has multiple instances in a database user is asked to resolve which column name is meant.
  2. user enters query using UCDs; query builder allows user to select from list of databases which support the UCDs used; again, if any UCD has multiple instances in a database user is asked to resolve which column name is meant.
  3. user enters query using UCDs; user enters criteria for selecting databases (based on registry metadata for SkyNodes); queries are automatically sent to databases which use their own methods of resolving multiply-allocated UCDs, and the metadata returned with results reflects this reasoning.

I daresay there are other ways of approaching the issue.

I would also say that in the above cases (A and B anyway) the portal or query-builder could simply translate the single UCD-based query into separate ones using column names before sending them.

But at the moment there is a conflict between the SkyNode and ADQL specs. In SkyNode, we have the AcceptsUCDs flag which indicates whether a node can accept queries formulated using UCDs, but there is no way in ADQL of indicating that a given term in the query is a UCD rather than a column name (or is there?).

Cheers,
Tony.

-- 
Tony Linde
Phone:  +44 (0)116 223 1292      Mobile: +44 (0)7753 603356
Fax:    +44 (0)116 252 3311      Email:  Tony.Linde-at-leicester.ac.uk
Skype:  callto:tonylinde (home)  callto:tonylinde2 (work)
Post:   Department of Physics & Astronomy, University of Leicester
        Leicester, UK   LE1 7RH
Web:    http://www.star.le.ac.uk/~ael
            
Project Manager, EuroVO VOTech   http://eurovotech.org 
Programme Manager, AstroGrid     http://www.astrogrid.org 
Co-Director,
 Leicester e-Science Centre      http://www.e-science.le.ac.uk/ 
Received on 2005-05-27Z21:23:44