Re: Draft TAP document inputs

From: Doug Tody <dtody-at-nrao.edu>
Date: Mon, 23 Apr 2007 12:56:01 -0600 (MDT)


On Mon, 23 Apr 2007, Kona Andrews wrote:

> General question: Support for multiple catalogs
> =================
>
> Here I am using "catalog" as the contents of a single database
> in a DBMS, "table" as a table within a database, and "column" as a
> column within a table. It seems like the term "dataset" sometimes
> means table and sometimes means catalog...

In common usage within astronomy, "catalog" and "table" are used interchangeably. For example, a catalog of objects is a source catalog. I think what you are talking about here is a catalog of database tables (since a database is a collection of tables). A primary TAP "dataset" is a Table, which may also be a catalog of something.

In the scheme I described in my earlier analysis, a catalog of the tables managed by a service is returned by a simple queryData operation, and will return standard metadata describing each table. This is the direct analogue of SIA/SSA etc., where the dataset type being operated upon is Table. The only thing new about TAP is the need to be able to query the fields (columns) of a single table.

> We need to consider this use case and make sure we don't preclude it in
> TAP (or the ADQL specification!).

I think TAP should support services which can access multiple tables. Of course, a simple service may only provide access to a single table.

> In general, I don't think the DB administrators would be happy with a
> compulsory directSQL method anyway, so we need to allow for system
> administrators wanting to switch off the directSQL bit of TAP.

Perhaps what is needed instead, is a way to specify the content model of a query string, including the version used. This would default to "ADQL-1.0" or whatever is the most recent version, but could be anything, if supported as an optional capability by the service. Then we don't need directSQL any longer, and have a more general mechanism.

> Point re using "204 No Content" for empty results
> ------------------
> Is this the most useful thing to do? What about if somebody is running
> a fairly automated series of jobs - an empty votable can be quite
> useful (because the scripting doesn't fall over looking for a file that
> doesn't exist, and can therefore be simpler...) I'm not sure what is
> best here, just flagging up that empty votables can sometimes be useful.

There is a more general problem here, which is compatibility with the other DAL protocols. Standard practice elsewhere in DAL is that it is perfectly legal (not an error) for a query to return no results. Likewise, error returns should be handled at the TAP level where possible (returning a VOTable or other XML response), so that TAP-specific error responses can be returned. HTTP errors alone are not sufficient. For SSAP, we ended up using VOTable for all service error returns, noting that more fundamental protocol errors could return an HTTP-level error instead.

> stageData:
> ----------
> Provision of stageData capability should be optional; some service-providers
> don't mind providing a querying interface, but don't want to provide
> significant system resources for a temporary staging area.

Yes of course, stageData and all async activities should be optional.

> Some kind of deliver-to-VOSpace method should additionally be supported as
> an alternative to staging for large-resultset delivery (so that instead of
> being staged at the service, the data is streamed direct to storage owned
> by the user).

This is staging - data can be staged either to local server storage, or directly to any client-specified VOSpace.

> metadata methods
> ----------------
> I believe that the standard getMetadata method (whatever it ends up being
> called) should return the *full* dataset metadata. I feel strongly that
> more complex "querying on the metadata", e.g ADQL querying on it, should
> *not* be a compulsory part of TAP (though it may be an optional extension).

Certainly use of ADQL for metadata queries should be optional, if it is permitted at all. It is not clear at this point if we even need to permit this, although it might not be hard to do if the service already supports ADQL.
> It should be *compulsory* for a service to return dataset metadata in
> VOResource format (though it need not be the default format). That way,
> the issue of coarse-grained vs. fine-grained registry does not become a
> barrier to interoperability (because VOResource-format dataset metadata
> is available for each service, whether or not that metadata is kept in a
> registry).

If people really think this is necessary we could do so of course. One could also argue however, that so long as the information content is the same, it is pretty trivial for a smart client to accept table metadata in either format. Making this feature "recommended" (i.e., required for a fully-compliant service) would mean that that it would be supported by all production-grade services, but might not be provided for minimally-compliant services provided by small sites.

Received on 2007-04-23Z20:56:25