Dear all,
Some followup on the recent posts regarding TAP, starting with a couple of general questions and followed by some more particular comments.
General question: Support for multiple catalogs
Here I am using "catalog" as the contents of a single database in a DBMS, "table" as a table within a database, and "column" as a column within a table. It seems like the term "dataset" sometimes means table and sometimes means catalog...
For convenience, a data provider might want to install a single data service and use it to export the tables from multiple databases (where the data service implementation itself supports this).
Trivially, the service could publish multiple different TAP endpoints, one per catalog - so the hardwired stem of the TAP access URL would indicate which catalog/database was being requested. In the case of the simple parameterbased TAP, I can't think of a compelling reason why the user should need to get at columns from multiple different catalogs in one query.
However, in the case of ADQL-TAP, there are advantages to the user being able to access tables from different catalogs - notably, in effect doing some kind of crossmatch / cross-DB join as a single ADQL query (where the underlying DBMS supports cross-database operations).
We need to consider this use case and make sure we don't preclude it in TAP (or the ADQL specification!). For example, the TAP metadata methods needs to be able to return metadata for more than one catalog, and clients need to be able to accept this multi-catalog metadata.
General question: indirection in catalog, table and column names
It is quite feasible for a service to publish descriptions of its
catalogs, tables and columns using names that are different from the
"real" names of the databases, tables and columns in the back-end
DBMS (I'm calling this "name indirection"). In name indirection,
incoming queries (simple parameter-based TAP or full ADQL TAP) would
contain the published (indirected) names; the service itself would need
to perform the necessary name transformations from published to back-end
database, table and column names when extracting the requested data from
the DBMS.
Name indirection could be useful in a number of situations, including
(a) running a mirror service for some already-published dataset where,
for administrative reasons, the names of the database and/or tables in
the mirror DB had to be locally customised, and (b) publishing a
"data-model view" of some dataset (by publishing metadata containing the
"logical" column names under some data model, rather than the real column
names represented in the DB).
Generally, name indirection would be hidden from the client / astronomer; the client simply formats the query based on the supplied metadata in the normal way. However, it does impinge on point 3.2.4 (the directSQL query method) - because a direct sql query based on metadata with indirected table and column names would fail.
TAP needs to allow for name indirection (implicitly or explicitly); however, I don't think that the *client* should need to know whether or not the metadata it receives is indirected. For simplicity, services that use indirection should *not* support the directSQL method; in other words, the directSQL method should be an *optional* method in TAP rather than a compulsory one. Services not supporting the directSQL method should return an appropriate error code if they receive a directSQL query.
In general, I don't think the DB administrators would be happy with a compulsory directSQL method anyway, so we need to allow for system administrators wanting to switch off the directSQL bit of TAP.
COMMENTS RE ESAVO'S DOCUMENT
Point re using "204 No Content" for empty results
Points re metadata
I imagine that the VODataService schema is the correct one to use for
describing metadata in VOResource format, and in particular the Catalog
element - see Ray's example here:
http://www.ivoa.net/internal/IVOA/RegUpgradeToV10/catalog.xml
Are we suggesting that the plain getCapabilities method will return a VOResource, and the getCapabilities&table=results will return a description in the custom tabular format suggested by Doug?
Also, as I understand it, a data service might have other kinds of capabilities
than just its tabular metadata, and indeed Doug points up the distinction
between service metadata and dataset metadata (and I associate getCapabilities
more with service metadata...) It's a minor point, but would it therefore
be clearer to have the TAP metadata method named something different (e.g.
"getMetadata" or whatever)?
COMMENTS RE DOUG'S DOCUMENT
stageData:
Some kind of deliver-to-VOSpace method should additionally be supported as an alternative to staging for large-resultset delivery (so that instead of being staged at the service, the data is streamed direct to storage owned by the user).
metadata methods
It should be *compulsory* for a service to return dataset metadata in VOResource format (though it need not be the default format). That way, the issue of coarse-grained vs. fine-grained registry does not become a barrier to interoperability (because VOResource-format dataset metadata is available for each service, whether or not that metadata is kept in a registry).
Cheers,
Kona
Received on 2007-04-23Z14:37:45