RE: Architecture of IVOA version 0.4

From: Clive Page <cgp-at-star.le.ac.uk>
Date: Thu, 20 May 2004 10:28:05 +0100 (BST)


On 19 May 2004, Bob Hanisch wrote:

> I strongly support your suggestion of having a document that conveys to
> the working astronomer what the VO is all about. The IVOA architecture
> document is not the thing. The purpose of this is to make sure that we
> are covering all bases, and that connections are understood.

OK, I had hoped that we could have one document which told both us and the outside world what we were doing, but maybe that's just too ambitious. Two documents, one in English, one in Computerese, are obviously what everyone (except me) wants.

A much more serious issue is the scope of the registry and how to satisfy metadata queries. The current architecture draft does not discuss this adequately: there is no point at all in sweeping the problems under the carpet, especially if this is just the jargon-ridden document for our internal use. I think we need to sort this out very urgently, such as by the end of next week.

I have been frustrated by observing interminable discussions about how registries harvest and replicate information to/from other registries, while ignoring the vital issue of how on earth we are going to get the stuff into _any_ registry in the first place. The Arch Doc blithely says "people can publish to a registry by filling in web forms". This really isn't realistic, even for a coarse-grained registry.

On Thu, 20 May 2004, Tony Linde wrote:

> As I've consistently said, we need a registry spec that allows NVO to
> implement its coarse-grained registry and AstroGrid to implement its
> fine-grained registry.
>
> If NVO also doesn't want to use the registry as a channel for apps to get
> the richer resource metadata but for every app to interface directly with
> the resource themselves then, again, let's design the spec so that both
> approaches are accomodated.

I agree entirely, but unless I missed something, this has fallen between the cracks between Query Language and Registry groups. As I was trying to point out with my query involving proper motions, almost any query beyond a simple cone-search will require knowledge of the units of the columns involved - at the very least the astronomer has to be told the units to pose a meaningful query, but it would be much better to have the "system" know the units so automatic conversion is possible, or at very least a warning of possible inconsistency. Automatic conversion is essential for any distributed querying, such as is theoretically possible using SkyNode.

We need a specification which supports queries such as:

What is the units string of column C of table T of database D at site S? (I'll call that a level 3 query, see context below).

I think that we need lots of other metadata queries as well (data-types, UCDs, data ranges, nullability, presentation formats, etc.) but the units one is fundamental and can serve as a model.

We should specify this in a way which means that it can be executed: (a) by the fine-grained registry if one exists; (b) by the data access layer of the DBMS otherwise.

Bob feels that keeping information in a fine-grained registry up-to-date will be unsupportable in the long run (thank you Bob for sharing with us your obviously painful experiences of satisfying the inner man in strange motel rooms). I entirely accept this point - unless we can arrange for automatic harvesting of DBMS information by the fine-grained registry. This, of course, can use the interface of type (b) above; so that becomes rather fundamental. And automatic harvesting would be highly desirable for a registry of whatever granularity.

I agree also that we need higher-level metadata queries as well, e.g.

 What are the columns names of table T of database D at site S? (level 2)  What are the names of tables of database D at site S. (level 1)  What are the names of databases at site S? (level 0)

If we implement all these levels, the coarse-grained registry can populate itself using the first couple of levels; the fine-grained registry can populate itself by going all the way to level 3 (or further?). If the fine-grained registry populates itself regularly in this way, one can regard it as a sort of metadata cache, which might make it easier to sell the concept to data centre managers, since I foresee problems in getting the fine-grained registry widely adopted otherwise. This regular self-population is, of course, just what internet search engines do so well. In our case the job will be so much simpler as we will have a small set of top-level URLs to start from, and a well-defined query interface.

Actually, if we implement all of these metadata queries, we have the makings of what we might call a "VO Explorer", along the lines of Windows Explorer, which would allow users to browse the VO from any web browser. I think this might be quite a popular service.

> And of the data centre managers here in the UK that I've spoken to or heard
> of, they all want to get their metadata into VObs compliant format so that
> the data *can* be used with other datasets: all they want is to be told what
> that format is.

Absolutely. There are two ways of satisfying that need:

(1) Decide upon a way of representing metadata within a DBMS, so that a suitable extended-VOQL query can extract it. or
(2) Decide upon a metadata query interface, and leave data centres to implement it as best they can (most will knock together a bunch of Perl scripts, is my guess; some will get properly debugged in time).

The problem with (1) is that it is rather DBMS-dependent; the problem with option (2) is that it leaves hard-pressed data centre managers to do the work. I think we should try to help them with this; I'm sure Tony will disagree.

Bob said:

> > On the ADQL topic, I agree with you -- we should work toward
> > a query language that can work against all VO databases:
> > catalogs, observation logs, registries. I do not see
> > anything so unique about these databases that they cannot be
> > queried via the same language.

Agreed. That gives the VOQL group something substantial to get its teeth into. Let me see if I can come up with a proposal by Monday. But I won't be at all upset if your Architure Group meeting on Sunday solves the whole problem in some other way.

Regards

-- 
Clive Page
Dept of Physics & Astronomy,
University of Leicester,    Tel +44 116 252 3551
Leicester, LE1 7RH,  U.K.   Fax +44 116 252 3311
Received on 2004-05-20Z09:28:23