Re: TAP information schema

From: Patrick Dowler <patrick.dowler-at-nrc-cnrc.gc.ca>
Date: Thu, 11 Oct 2007 09:40:04 -0700


On 2007-10-10 06:10, Keith Noddle wrote:
> Cases so dictate. Finally, it was made abundantly clear to us in Beijing
> - and it remains the case - that the priority for TAP V1.0 is to define
> how we handle ADQL querying. Period. No arguments.

I agree with this 100%. We all agree that TAP 1.0 should be a minimal spec we can move forward with and at the core this means doing ADQL querying.

As for metadata, one really does need more than tables and columns in the general case. Specifically, some RDBMSs require that the SQL contains the schema name (DB2, eg) on the front of every table name. I do not think that ADQL requires this (maybe shouldn't) but as a site using such a database I need to be able to tell people what the schema name is. Now, I could stretch the table name to include it (eg mySchema.myTable) but that actually throws a lot of stuff away (like the fact that I use different schemata for different versions) and would like to describe what each each schemameans, and that maybe the schema as a whole implements some data model -- as would likely be the case since few data models can be sensibly stored in a single table).

That's not a big deal right now, but if we ignore it and force services and apps to ignore schema names then in future we could have some problems when we try to expose it. The same goes for what metadata tells people how to write more complex queries with joins etc... we probably should not standardise now but we need to do it in a way that doesn't make the future detailed metadata still the definitive metadata.

So, my gut feeling eight now is that basic resource discovery in the registry is going to use VOResource (or some specialisation of that) and users need to be able to see what the content is (tables and columns) for that task. We should aim to support that task only -- suitable content discovery -- and we should not try very hard to make that VOResource description the way to actually formulate queries (just "accidentally on purpose" as a friend used to say :-)

What I am thinking is this: the "suitable content discovery" will describe content, which effectively means tables and columns: assuming there was detailed metadata for building queries elsewhere, you still need to ask for it so the VOResource needs to have the scheme (namespace) and table names and because people will be looking for things via utype and/or ucd of columns... the only thing not really needed for discovery that we can stick in so people can write queries are the actual column names*. Once we have a detailed metadata system for TAP 1.1 we could deprecate the column names in the VOResource, or not if no one cares enough.

Summary: VOResource describes tables and columns (maybe namespaces aka schemata) aimed at "suitable content discovery", but we stick in column names and units for completeness/symmetry with the table description. The service emits this document via the standard service method. This is good enough for full ADQL queries of single tables, with joins reserved for users that actually knows the target schema or care to learn it via documentation.

This would be "good enough" and not shut off any future development.

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)
Received on 2007-10-11Z18:38:29