Re: Draft TAP document inputs

From: Kona Andrews <kea-at-roe.ac.uk>
Date: Mon, 23 Apr 2007 13:37:29 +0100


Dear all,

Some followup on the recent posts regarding TAP, starting with a couple of general questions and followed by some more particular comments.

General question: Support for multiple catalogs


Here I am using "catalog" as the contents of a single database in a DBMS, "table" as a table within a database, and "column" as a column within a table. It seems like the term "dataset" sometimes means table and sometimes means catalog...

For convenience, a data provider might want to install a single data service and use it to export the tables from multiple databases (where the data service implementation itself supports this).

Trivially, the service could publish multiple different TAP endpoints, one per catalog - so the hardwired stem of the TAP access URL would indicate which catalog/database was being requested. In the case of the simple parameterbased  TAP, I can't think of a compelling reason why the user should need to get at columns from multiple different catalogs in one query.

However, in the case of ADQL-TAP, there are advantages to the user being able to access tables from different catalogs - notably, in effect doing some kind of crossmatch / cross-DB join as a single ADQL query (where the underlying DBMS supports cross-database operations).

We need to consider this use case and make sure we don't preclude it in TAP (or the ADQL specification!). For example, the TAP metadata methods needs to be able to return metadata for more than one catalog, and clients need to be able to accept this multi-catalog metadata.

General question: indirection in catalog, table and column names


 It is quite feasible for a service to publish descriptions of its catalogs, tables and columns using names that are different from the
"real" names of the databases, tables and columns in the back-end
DBMS (I'm calling this "name indirection"). In name indirection, incoming queries (simple parameter-based TAP or full ADQL TAP) would contain the published (indirected) names; the service itself would need to perform the necessary name transformations from published to back-end database, table and column names when extracting the requested data from the DBMS.

 Name indirection could be useful in a number of situations, including (a) running a mirror service for some already-published dataset where, for administrative reasons, the names of the database and/or tables in the mirror DB had to be locally customised, and (b) publishing a
"data-model view" of some dataset (by publishing metadata containing the
"logical" column names under some data model, rather than the real column
names represented in the DB).

 Generally, name indirection would be hidden from the client / astronomer; the client simply formats the query based on the supplied metadata in the normal way. However, it does impinge on point 3.2.4 (the directSQL query method) - because a direct sql query based on metadata with indirected table and column names would fail.  

 TAP needs to allow for name indirection (implicitly or explicitly); however, I don't think that the *client* should need to know whether or not the metadata it receives is indirected. For simplicity, services that use indirection should *not* support the directSQL method; in other words, the directSQL method should be an *optional* method in TAP rather than a compulsory one. Services not supporting the directSQL method should return an appropriate error code if they receive a directSQL query.

In general, I don't think the DB administrators would be happy with a compulsory directSQL method anyway, so we need to allow for system administrators wanting to switch off the directSQL bit of TAP.

COMMENTS RE ESAVO'S DOCUMENT


Point re using "204 No Content" for empty results



Is this the most useful thing to do? What about if somebody is running a fairly automated series of jobs - an empty votable can be quite useful (because the scripting doesn't fall over looking for a file that doesn't exist, and can therefore be simpler...) I'm not sure what is best here, just flagging up that empty votables can sometimes be useful.

Points re metadata



Ray has posted some useful "VOResource Schemas and Example Instances" here: http://www.ivoa.net/twiki/bin/view/IVOA/RegUpgradeToV10#VOResource_Schemas_and_Example_I

I imagine that the VODataService schema is the correct one to use for describing metadata in VOResource format, and in particular the Catalog element - see Ray's example here:
  http://www.ivoa.net/internal/IVOA/RegUpgradeToV10/catalog.xml

Are we suggesting that the plain getCapabilities method will return a VOResource, and the getCapabilities&table=results will return a description in the custom tabular format suggested by Doug?

Also, as I understand it, a data service might have other kinds of capabilities than just its tabular metadata, and indeed Doug points up the distinction between service metadata and dataset metadata (and I associate getCapabilities more with service metadata...) It's a minor point, but would it therefore be clearer to have the TAP metadata method named something different (e.g.
"getMetadata" or whatever)?

COMMENTS RE DOUG'S DOCUMENT


stageData:



Provision of stageData capability should be optional; some service-providers don't mind providing a querying interface, but don't want to provide significant system resources for a temporary staging area.

Some kind of deliver-to-VOSpace method should additionally be supported as an alternative to staging for large-resultset delivery (so that instead of being staged at the service, the data is streamed direct to storage owned by the user).

metadata methods



I believe that the standard getMetadata method (whatever it ends up being called) should return the *full* dataset metadata. I feel strongly that more complex "querying on the metadata", e.g ADQL querying on it, should *not* be a compulsory part of TAP (though it may be an optional extension).

It should be *compulsory* for a service to return dataset metadata in VOResource format (though it need not be the default format). That way, the issue of coarse-grained vs. fine-grained registry does not become a barrier to interoperability (because VOResource-format dataset metadata is available for each service, whether or not that metadata is kept in a registry).

Cheers,
Kona Received on 2007-04-23Z14:37:45