Re: Architecture of IVOA version 0.4 - ADQL & Registries

From: Martin Hill <mchill-at-dial.pipex.com>
Date: Thu, 20 May 2004 10:23:50 +0100


Robert Hanisch wrote:

I agree with your comments about most astronomers. For adoption to work, I expect we need to make sure that our systems provide:

  1. APIs for existing commonly-used tools (eg IDL, FORTRAN)
  2. Decent UIs so that astronomers don't have to wade through VOTables, VOResources, ADQLs, etc.

> Hi Clive.
>
> On the ADQL topic, I agree with you -- we should work toward a query
> language that can work against all VO databases: catalogs, observation logs,
> registries. I do not see anything so unique about these databases that they
> cannot be queried via the same language.

You can only use the same query language if it can handle the various data forms. Catalogs are relational tables; registries are heirarchical; images are just plain difficult. It *is* possible (I think) to use a common language to query these, but:

  1. It will require a more flexible attitude to ADQL; a willingness to make it something other than just an XML reflection of SQL.
  2. Given we want to do different things with them (we want to query them in different ways, do different things with the results), what is 'driving' the desire to have one query language? *If* we can query a Registry using only XPath say, then why mangle a different language so it can do the same thing?

> On the interfaces to local relational databases, as I have stated many times
> previously, I think this is best negotiated via follow-on (e.g.
> OpenSkyNode-type) queries once a user has determined that a database or
> collection is of interest based on higher level metadata. Comparisons
> between complex databases are not going to be made trivial, at least not in
> our lifetimes. The best we can do is expose - on request - the DB
> structures to knowledgable users, and provide them with the tools to do the
> cross-correlations and comparisons of interest.
<snip>
> But when data and service
> providers see that registering a resource requires many hours, if not days,
> of work, and when someone has to review what they've done to make sure it is
> correct, the system will break down.

Which rather implies that we deny interoperability to non-knowledgable users. It *can* be a pain for data owners to go and find out all the details about UCDs, units, etc for some of the columns in their data bases that they haven't used for a while. But they're going to have to do this anyway, whether for the 'follow on' low level metadata in your preferred model, or just to make sure the right things are in the results' VOTable header.

> The AG approach reminds me of the
> folders you find in cheap hotels sometimes, which include menus of nearby
> restaurants. The restaurants are probably all still there, serving the same
> type of cuisine at the same address and same phone number. The menus have
> probably all changed, and thus what you see in the hotel room is partly
> useful, and mainly (in terms of the bulk of information) useless. Yes, you
> can ping them regularly to find out their latest menus, but if they have
> given up complying with your standard for publishing their menu because it
> is too complicated, then you have only the essential information -- where to
> go to get some food, and when the restaurant is open. The menu does you no
> good -- out of date or not -- if you arrive at 2 am and the restaurant
> closed at 9pm.

This is a rather good analogue. Using modern technology ;-) we can keep these folders up to date within minutes of a change at source.

How are the data providers going to 'give up' complying? Once they have, that's it done. If the 'fine grained' standard changes, registries can provide a versioning interface (tools expecting old & new versions of metadata standards can use the registry which can transform from whichever the datacenter provides); without it, all the tools have to handle all the different forms at all the different datacenters. Such m x n connections *do not work* over distributed systems and have been proved so again and again.

When I'm sitting in my cheap hotels it's extremely useful to know more than just the restaurant name and telephone number; the prices, the selection, whether they do vegetarian (something for me to avoid!), perhaps I feel like a duck*. Leafing through a single form, rather than having to ring each restaurant and communicate with each one in it's own language, will be much easier for your ordinary astronomer. Even better is being able to press the 'I want duck' button and getting a list back with prices and pointers to reviews where available.

We're not asking any of the restaurants/datacenters to do more than they would anyway:

>
> My point is, higher level metadata is the key. Detailed metadata about the
> resource should be provided on request, from the resource provider.

...we're just asking them to provide this detailed metadata in a common form.

Yes we need to provide 'wizards' to help data owners write & publish metadata. If the data changes, we need to make sure they only need to change the relevent bits. Datacenters are, after all, in the business of publishing their data; there is nothing new in the work they have to do to describe that data sufficiently so that people can use it.

Ideally, we want to make it *easier* to describe data using VO tools than having to write up a load of web page prose and keep that up to date.

Cheers,

Martin

*Quack

-- 
Martin Hill
www.mchill.net
07901 55 24 66
Received on 2004-05-20Z09:24:02