Re: ADQL WD v1.05

From: Jeff Lusted <jl99-at-star.le.ac.uk>
Date: Tue, 04 Jul 2006 12:25:21 +0100


Hi Yuji!

These are my initial comments on proposed v1.05 document. There are thirteen points in all, detailed below. In a nutshell, however, my gravest concerns are (a) The the ADQL/x schema should be as rich as the ADQL/s BNF description. (b) That the Core specification seems extremely limited (eg: see point 3 in particular). (c) That we have creepitis concerning implementation specifics (my experience has shown this to be a fundamental problem with RDBMS and one that I thought the ADQL xml approach had done a lot to eliminate).

(1) ADQL/x and ADQL/s.

There are now two ways of specifying ADQL: BNF and XML schema. Although you state "ADQL/s and ADQL/x are translatable to each other without loss of information", does this apply to the specifications themselves? This is extremely important to my work on the Astrogrid QueryBuilder and also to work on our server side implementation, both of which depend on the XML schema for ADQL/x. The QueryBuilder automatically adjusts to the schema, for example, to give the possibles comparison operators in a comparison. But this is just one example. The elimination of all the enumerations from the schema (but their inclusion in BNF, by the looks), would mean rewriting parts of the QueryBuilder and relying upon a human reading of the BNF specification. This looks like a step backwards to me.

My suggestion is that both specifications should have the same degree of richness. They should be kept in step. (With my developers hat on, has anyone thought of trying to generate one from the other, ie: ADQL/bnf spec from ADQL/schema, or vice-versa?).

"[T]his document is mainly devoted to describe syntax of the ADQL/s and the way of mapping from ADQL/s to ADQL/x is described briefly." I'm sorry, I couldn't find this mapping anywhere.

(2) Core and Extensions.

The description of Core and Extensions I find somewhat misleading. There appears to be no extension mechanism here. Rather, the specifications represent some idea of core and some idea of full. I believe the ideas of "extension" and "full" are somewhat different. A proper extensions mechanism would surely be more flexible. I think to be consistent, if we go along with this idea, one should talk of Core and Full.

An example:
"Selected data are either column values or the number of selected rows. An expression like a+b is not supported in the core syntax, which is supported as an extension."

(3) Core: From clause.

"Exactly one table SHALL be specified in the FROM clause. A table is specified by a table name followed by an alias name. The table alias name MUST be supplied."
I'm sorry I need clarification here. Are you suggesting only ONE table can be selected from in any one query?! In my opinion this is far too restrictive even for a Core specification.

(4) Core: Comparisons.

"<comparison_op> supported in the Core syntax is basic comparison operators listed in table 2."
Is this implying the Full specification may allow others?

(5) Core: Region Condition.

"Regional condition SHOULD be supported for a table that has a set of columns representing a position in a two dimensional space. Those columns SHOULD have metadata related to their coordinate frame." This implies there must be some way of retrieving metadata on a table in BOTH Core and Full versions.

(6) Core: Region and Frames.

"<frame> is a frame name defined in the STC specification". Will the STC specification be a part of ADQL? In the previous XML schema, it was quoted as an import. I found it of considerable use in driving region queries.

"A search table that supports regional search SHALL accept at least one of the frame names, which SHALL be provided through a metadata query. A list of all the supported frame names SHOULD also be provided through a metadata query."

This again implies metadata queries as a part of any Core implementation.

(7) Core: Functions.

I cannot see a good reason for restriction the Core specification to such a small subset of Aggregate functions (ie: only Count()).

You also state: "A function support is not mandatory in core ADQL specification, however RECOMMENDED to support basic functions list in table 5."
I'm sorry but I find this somewhat strange. The Core specification has no mandatory support, but recommends a range of functions? This is hardly an aid to interoperability.

(8) Full: INTO.

"INTO construct is defined as an extension to specify the VOSpace location where the result is stored. The exact syntax of the VOSpace location is defined in a separate specification.

SELECT g.* INTO VOS:/JHU/gal FROM galaxy g WHERE g.redshift > 3.5"

Ouch! I appreciate the problem here, but I wonder whether quoting an instance is the way forward. Aren't we here mixing up a query, hopefully aimed at the relational model and therefore abstracted from any implementation, with instances of files which are part of an implementation? I can think of a number of disadvantages to doing this:
(a) The file may or may not already exist. Implementations may differ as
to how they handle these situations! This is not explained, nor I think should it be, as the details should not be exposed in this fashion. (b) Sharing a query may get more difficult (we now have personal details embedded).(c) Running a query in more than one context would be problematical (eg: in a workflow, maybe in a loop,), but again the implementation specifics are here the bugbear.

My own comment... I think this reflects another problem: How can we fold variables into a query instance? Think of executing a script. No one wants to edit a complex script each time the script is run. That is one of the things that appears lacking almost completely in the versions of ADQL I've examined. What we need is a marker for things that can vary from one execution to another that abstracts implementation details.

(9) Full: Upload.

"#uploadUPLOAD keyword MAY be used at a FROM clause to represents votables. Using this syntax, table join between internal tables and external votables can be described. A votable name, which is an attribute of a TABLE element, may be followed to distinguish the multiple tables in votables. A syntax to refer to the votable and it example are:

UPLOAD [ <votable_name> ] [ AS ] <alias> FROM galaxy g, UPLOAD name1 vot1,
UPLOAD name2 vot2"

The same comments to my previous ones on INTO apply, so won't repeat them here: I'll take them as understood. However, I'd like to make a further emphasis. The idea of UPLOAD introduces an element of workflow within a query which is out of place. I've nothing against a completely new verb called UPLOAD at the same level as SELECT. But mixing the two makes for an added degree of complexity.

(10) Table Name and Service Identifier.

"Table name qualified by a service identifier MAY be supported to specify a table that belongs to another SkyNode service. A short name of the service MAY be specified, however note that it does not guaranty the uniqueness in the VO.
[( <service_identifier> | <short_name> ) : ] <table_name>
e.g. ivo://jvo/sxds:tableName
e.g. sxds:tableName"

I'm not sure about the differences between service identifier and short name. I've nothing against distributed queries, ie: quoting tables from different archives/databases, but are we here specifying a unique service that controls a table? This means that I could not automatically use elsewhere a query targetted at a specific copy of a database. Again, I'm against mixing relational and implementation specifics.

(11) XPath.

I need some further explanation of why we aim to support XPath expressions. On the face of it, it doesnt appear to be warranted in making astronomical queries. Is this something to do with interrogating the registry?

(12) Extensions.

"Supported extensions SHOULD be provided through a metadata query using extension IDs that are listed in table 1."

The table provides a list of 20 areas where metadata calls can be made to see whether a particular feature is supported by the underlying database.

Some use of metadata querying seems sensible. But I'm surprised we need such a wide range of constructs delimited by making metadata calls. It is obvious this is going to give us interoperability problems. (Perhaps we've thrown that dream away?) I'm also pretty sure an astute use of XSLT and workflow can overcome most of the restrictions. Surfacing implementation specifics in query building has got to be bad news.

(13) Data Types.

I'll probably need to keep studying this section for some time. It is obviously a critical part of the specification. What I want to know is whether the ADQL/x schema will supply this degree of richness, and how?

"Every column SHALL be assigned one of the Core data types, the extension data types or the service specific data types. The basic binary and unary operators shown in the table SHALL be supported."

How do we propose to guarantee this?

Regards
Jeff

---
Jeff Lusted                tel: +44 (0)116 252 5358
AstroGrid Project          mob: +44 (0)7973 492290
Dept Physics & Astronomy   email: jl99-at-star.le.ac.uk
University of Leicester    web: http://www.astrogrid.org
Received on 2006-07-04Z13:17:26