Re: On clarification for getCapabilities and stageData

From: Doug Tody <dtody-at-nrao.edu>
Date: Wed, 25 Apr 2007 10:20:25 -0600 (MDT)


Hi Kona -

On Wed, 25 Apr 2007, Kona Andrews wrote:
>> These are still mostly meta-discussions; very little of substance has been
>> said here on the TAP design.
>
> Yes indeed, and to that end perhaps we should continue with some more
> focussed discussion on the prototype TAP document sections submitted
> by ESAVO - so that we can at least get some of the simpler aspects
> of TAP agreed?

The problem with this is that the draft document from ESAVO was written before we had these discussions about the need for compatibility of TAP with the other DAL interfaces; much of what is in this earlier draft is effectively ruled out if we agree on the importance of a common approach for the second generation DAL interfaces (SSAP, SIAV2, TAP, etc.). I think we should start with the basic DAL service profile and see how this looks for table access, as has already been done in the analysis I performed. This was done after reading the ESAVO document (I adopted some of the approach related to queryData) and after the sub-group of us met and discussed this at ESAC. Why are we ignoring this more recent and detailed analysis?

To simplify this (as Pedro indicicated earlier, an abstract analysis can be hard to follow), I suggest a good place to start would be with some simple examples of what such an interface might look like, for example as illustrated in the mail which I posted yesterday. I copy this here again below for reference (with some typos fixed). I suspect that if we consider a few sample queries, things will rapidly become more clear.

I suggest that these sample queries are quite simple, and already meet most of the requirements of the basic TAP interface. In addition, since they are consistent with the more detail analysis I posted earlier, this approach is extensible to support advanced capabilities such as large queries, asynchronous data staging, multiple table operations, and so forth.


>From dtody-at-nrao.edu Tue Apr 24 12:31:58 2007
Date: Tue, 24 Apr 2007 12:29:57 -0600 (MDT) From: Doug Tody <dtody-at-nrao.edu>
To: VOQL-TEG <voql-teg-at-ivoa.net>
Subject: Re: VOQL-TEG Meeting #5

On Tue, 24 Apr 2007, Doug Tody wrote:
>
> For TAP, the key issue would appear to be the scope of queryData,
> and whether this can be used as a uniform mechanism to return both
> table data and metadata (this does not mean we would have to permit
> ADQL for metadata queries). In the other DAL interfaces, it is
> used for discovery, to return dataset metadata, and to propose/plan
> data products (often virtual) which can be computed and returned.
> All of these functions return the query results as a table, hence
> it is reasonable to use the same mechanism to directly query a data
> table as well. In the simplest case of a queryData on a single data
> table, this reduces to what is essentially a cone search, but with
> a somewhat generalized set of input parameters, plus an ADQL option.

So if we had a uniform query mechanism based on queryData, what might this look like? Some thoughts on this follow. - Doug

In general, a service may provide access to multiple tables. We want to be able to query both table data and metadata. We want a simple query to reduce to something not much more complex than a cone search. Taking SQL information schema as a guide (much simplified, as we probably don't want to expose the physical DBMS) we might have something like this:

     queryData

 	table = { "schema.tables", "schema.columns", <data-table-name>}
 	    schema.tables = table (or catalog) of tables
 	    schema.columns = table of all columns in all tables [1]

 	query = <query string, probably URL-encoded if parameter>
 	    e.g., query="SELECT+*+FROM+a+WHERE+foo%3D2" [2]

 	queryType = <"ADQL", ADQL-1.1", "nativeSQL", etc.> [3]

 	format = {votable, csv, xml, etc.} (default votable)

 	POS,SIZE,BAND,TIME, etc. (data-model based query) [4]

 	verbose = {1, 2, 3} (optional)

     [1] This is how SQL information schema does it.  If we depart
 	from this, we might instead have some other syntax like
 	"<data-table>.columns", or an additional tableName parameter
 	of some sort.

     [1] Not clear what the default should be here.  For a service with
 	a single data table a data query would be a reasonable default,
 	but this does not work for services with multiple data tables.

     [2] Large queries might require that a POST be used instead
     [3] Defaults to ADQL.
     [4] Parameter constraints probably not permitted if a query string
         is used.

Example data queries

     <baseURL>?REQUEST=queryData&TABLE=foo&POS=180,0&SIZE=0.2
     <baseURL>?REQUEST=queryData&query=SELECT+*+FROM+a+WHERE+foo%3D2

Example matadata queries

     <baseURL>?REQUEST=queryData&TABLE=schema.tables
     <baseURL>?REQUEST=queryData&TABLE=schema.tables&POS=180,0&SIZE=0.2
     <baseURL>?REQUEST=queryData&TABLE=schema.columns&TableName=foo
     <baseURL>?REQUEST=queryData&TABLE=schema.columns&TableName=foo&FORMAT=xml

In a query for metadata describing an entire table, we could return table metadata (number of rows, columns, etc.), dataset metadata (dataID, Curation, Char, etc.), access metadata (acref, size, cost, etc.), and so forth. Received on 2007-04-25Z18:21:03