Hi All -
As Pedro mentioned, I have performed an analysis of how a TAP protocol might look when approached as a member of the family of (second generation) DAL protocols.
Two documents presenting this high level analysis are attached:
o A strawman TAP protocol based on an analysis of the standard
DAL service profile and how this might function when applied
to table data.
o A set of worked-out use cases showing how the proposed protocol
would be used for simple cone search like queries, more general
discovery queries posed against multiple catalogs, submission
of batch jobs to process large queries, and so forth.
Note, these have been revised somewhat from the first draft. The documents are not polished, but are intended only as the basis for a discussion. Hopefully the basic approach proposed is communicated.
My starting point for this is a desire to have TAP be consistent with the planned second generation of DAL protocols (SSAP, SIA V2, etc.), unless there is a compelling technical reason to diverge. From the DAL perspective this is essentially a requirement. This is important both for consistency in what is delivered to the community, as well as to maximize code sharing in implementations of multiple services. It will also greatly ease the eventual addition of ADQL support to the other DAL services.
Other goals which drove this analysis include the following:
o At the most basic level, a simple TAP query should be
not much more complicated than the current cone search.
That is, a simple parameter-based query is issued against a
single table, and table data comes back in VOTable format.
A simple ADQL-based query should be equally simple, e.g.,
passing the ADQL expression as a URL-encoded string.
In reading the documents please bear this point in mind!
While I talk about advanced capabilities as well, nonetheless
a simple query mechanism is also provided.
o It must also be possible to query table metadata.
This includes listing the tables managed by the service,
providing uniform metadata for each; and for each data table,
listing the table fields, including providing standard metadata
describing each field.
o In a more advanced service implementation, it must (optionally)
be possible to support large queries. This includes a query
against a single very large table which returns a large amount
of data, and evaluation of more complex SQL expressions, e.g.,
a join against two tables. Large datasets can be returned in a
variety of ways, e.g., by a simple streaming GET (in principle
this can return an arbitrarily large dataset), or via more
sophisticated techniques such as VOSpace. A large query may
require advance execution planning, to adjust the size of the
job to be submitted, and support for asynchronous execution.
The analysis attempts to address all these points. I think it is important to consider all of these in the design phase, to ensure that our overall design will support the full range of planned capabilities. It is probably not necessary however, to fully specify all the advanced capabilities in the initial version of the specification.
It would be good to have some discussion, hopefully leading to a consensus within the TEG on the high level TAP design. Every detail does not have to be worked out at this stage, and probably should not be in any case until we interate once within the broader VOQL and DAL WGs. Pedro and I still have an action (on behalf of both the VOQL and DAL WGs) to try to follow-on with an initial TAP specification in advance of the interop. While this need not be complete at this stage, it would be good if we can at least present the overall design in sufficient detail so that we can evaluate how it would work for typical use cases.