Re: VOQL-TEG Meeting #5

From: Doug Tody <dtody-at-nrao.edu>
Date: Tue, 24 Apr 2007 12:29:57 -0600 (MDT)


On Tue, 24 Apr 2007, Doug Tody wrote:

>

> For TAP, the key issue would appear to be the scope of queryData,
> and whether this can be used as a uniform mechanism to return both
> table data and metadata (this does not mean we would have to permit
> ADQL for metadata queries). In the other DAL interfaces, it is
> used for discovery, to return dataset metadata, and to propose/plan
> data products (often virtual) which can be computed and returned.
> All of these functions return the query results as a table, hence
> it is reasonable to use the same mechanism to directly query a data
> table as well. In the simplest case of a queryData on a single data
> table, this reduces to what is essentially a cone search, but with
> a somewhat generalized set of input parameters, plus an ADQL option.

So if we had a uniform query mechanism based on queryData, what might this look like? Some thoughts on this follow. - Doug

In general, a service may provide access to multiple tables. We want to be able to query both table data and metadata. We want a simple query to reduce to something not much more complex than a cone search. Taking SQL information schema as a guide (much simplified, as we probably don't want to expose the physical DBMS) we might have something like this:

     queryData

 	table = { "schema.tables", "schema.columns", <data-table-name>}
 	    schema.tables = table (or catalog) of tables
 	    schema.columns = table of all columns in all tables [1]

 	query = <query string, probably URL-encoded if parameter>
 	    e.g., query="SELECT+*+FROM+a+WHERE+foo%3D2" [2]

 	queryType = <"ADQL", ADQL-1.1", "nativeSQL", etc.> [3]

 	format = {votable, csv, xml, etc.} (default votable)

 	POS,SIZE,BAND,TIME, etc. (data-model based query) [4]

 	verbose = {1, 2, 3} (optional)

     [1] This is how SQL information schema does it.  If we depart
 	from this, we might instead have some other syntax like
 	"<data-table>.columns", or an additional tableName parameter
 	of some sort.

     [1] Not clear what the default should be here.  For a service with
 	a single data table a data query would be a reasonable default,
 	but this does not work for services with multiple data tables.

     [2] Large queries might require that a POST be used instead
     [3] Defaults to ADQL.
     [4] Parameter constraints probably not permitted if a query string
         is used.

Example data queries

     <baseURL>?REQUEST=queryData&TABLE=foo&POS=180,0&SIZE=0.2
     <baseURL>?REQUEST=queryData&TABLE=foo&query=SELECT+*+FROM+a+WHERE+foo%3D2

Example matadata queries

     <baseURL>?REQUEST=queryData&TABLE=schema.tables
     <baseURL>?REQUEST=queryData&TABLE=schema.tables&POS=180,0&SIZE=0.2
     <baseURL>?REQUEST=queryData&TABLE=schema.columns&TableName=foo
     <baseURL>?REQUEST=queryData&TABLE=schema.columns&TableName=foo&FORMAT=xml

In a query for metadata describing an entire table, we could return table metadata (number of rows, columns, etc.), dataset metadata (dataID, Curation, Char, etc.), access metadata (acref, size, cost, etc.), and so forth. Received on 2007-04-24Z20:30:14