Re: Response to TAP presentations

From: Doug Tody <dtody-at-nrao.edu>
Date: Wed, 16 May 2007 18:03:34 -0600 (MDT)


I think it could help advance the TAP discussions enormously if we could reach agreement on the following key issues:

  1. Parameterized queries (POS,SIZE etc.)

There are two main reasons why we might want to include support for parameterized queries:

    o	To "lower the bar" for simple services.  TAP is mainly about ADQL,
    	but ADQL is complex.  A minimal TAP service could still be useful
	if it provided only a cone search like capability (POS,SIZE).

    o	To better support simple client use-cases.  A typical use case is
    	to display (or otherwise analyze) a region of an image, and "cutout"
	an equivalent region of one or more source catalogs to overlay on
	the same region.  The parameter mechanism provides a simple and
	uniform way to support this common use-case.

The UTYPE-based ADQL query, with a REGION syntax, can do the same thing and is ultimately much more powerful, but is also much more complex.

Since both table data and table metadata are tabular, it is possible to use the the same mechanism to query both. This simplifies things for the client and will promote code-sharing at all levels, plus it is more flexible if we wish to describe more than just tables/columns in the future (e.g., views, indexes, etc.).

In a HTTP-based protocol there are many ways one can do things like return errors, report null queries, request the format in which data is to be returned, etc. However it makes little sense to do this differently for every service protocol, especially if they are all doing much the same thing. For SSAP for example, rather than reinvent the wheel, we adopted much of the HTTP semantics from the OpenGIS protocols, which are also HTTP-based and very similar in form to the IVOA protocols. There are only minor differences, e.g., we retained use of VOTable for error responses whereas GIS uses their own small XML status-return packet.

Roy has a good point that asynchronous data staging is complex, and supporting this initially expands the scope of TAP and may delay the standard. In particular, we will probably need a HTTP/REST-based version of VOSpace, and we probably cannot manage the general issue of data staging without also solving the problem of authentication. We will need to specify and prototype all of these related Grid technologies before we can integrate them into the data services (not just TAP). Nonetheless, as many have pointed out, we need these capabilities if we are ultimately to deliver robust, capable data services.

---
On Tue, 15 May 2007, Roy Williams wrote:


> This is my response to a double presentation this morning on the TAP protocol
> evolution, one from Tody, the other from Stebe/Osuna. I believe that this
> project has gained unacceptable mission-creep from its original conception as
> SIMPLE table access. I was VERY disturbed when Osuna joked today that this be
> renamed "General Access Protocol".
>
> Roy Williams
> -----------------------------------------
>
> (1) It was my impression at the beginning of the discussion a year ago that
> TAP would be a protocol to allow ADQL and/or SQL queries to be sent to a
> database, and a response to be obtained in table form.
>
> (2) The parameterized queries presented this morning should not be part of the
> TAP protocol. The language is not well defined, and would be difficult to
> implement. If a cone search is wanted, then the IVOA already has that
> specification, and TAP is not a replacement for cone search. It is no easier
> to write x="2.5/" than "select * where x>2.5", but adds a considerable burden
> in implementation, and the requirment to implement an open-ended
> parameter-based "language" that is not well-defined.
>
> (3) The presentation this morning suggested that error responses from TAP be
> encoded into the HTTP transport, for example HTTP 204 means "No Content". This
> is problematic on several grounds. First, no other IVOA protocol is bound to
> HTTP in this wat, so we have a new concept where, I believe, other ways
> already exist. Second, why should we bind ourselves to a particular transport
> layer like this? Third, the HTTP messages leave no room for elaboration -- for
> example *why* is there no content? I propose that errors can be reported as
> with other IVOA protocols: a VOTable with an INFO element.
>
> (4) I very much like the suggestion of three classes of query: the ADQL, the
> Utype query, the NativeSQL. The Utype method is a natural expression of the
> Source Catalog Data Model, so that the same query can be sent to many
> databases, and the NativeSQL allows an extremely easy implementation of TAP
> for some providers. I believe that none of these three should be mandatory.
> Thus each column of the table can have two names: the arbitrary one in the
> database table or view itself, and the IVOA standard Utype name.
>
> (5) Tables and table metadata are both tables, and the IVOA has adopted a
> standard representation for tables, it is called VOTable. I believe VOTable
> should be the principle way that relational schema and the table data should
> be returned from TAP, although other formats may be offered by implementers.
> Because tables and table metadata are unified, it means that querying the
> table metadata is no different from querying table data itself.
>
> (6) Please note that the IVOA Recommendation VOResource does NOT include a
> mechanism for expressing table metadata; rather it is VODataService that
> suggests this, and that is not yet defined even as Working Draft. It would be
> unwise to make TAP dependent on a controversial suggestion that is not yet
> even defined or documented. Obviously this could in the future be an optional
> expression of table metadata, but I suggest the IVOA should stick with the
> standards it has already ratified.
>
> (7) The asynchronous query mechanisms presented were rather different, and
> neither was satisfactory to me. Tody suggested a warping of what has been
> drafted (but not implemented) for other DAL services. Stebe/Osuna suggested a
> mechanism with no monitoring or notification that did not seem well thought
> out. I would like to suggest leaving asynchronous TAP for a future version,
> and concentrate on getting the plain, simple, synchronous version to
> Recommendation.
>
> (8) Neither presentation considered TAP queries on private data; how the query
> protocol can include an authentication token so that only a select group of
> people can launch queries. This is just as important as batch jobs on public
> data. I believe that this too should be handled in the next version of TAP.
>
>
Received on 2007-05-17Z02:04:19