From: Jeff Lusted <jl99-at-star.le.ac.uk>
Subject: Re: ADQL WD v1.05
Date: Tue, 04 Jul 2006 12:25:21 +0100
> Hi Yuji!
>
> These are my initial comments on proposed v1.05 document. There are
> thirteen points in all, detailed below. In a nutshell, however, my
> gravest concerns are (a) The the ADQL/x schema should be as rich as the
> ADQL/s BNF description. (b) That the Core specification seems extremely
> limited (eg: see point 3 in particular). (c) That we have creepitis
> concerning implementation specifics (my experience has shown this to be
> a fundamental problem with RDBMS and one that I thought the ADQL xml
> approach had done a lot to eliminate).
>
> (1) ADQL/x and ADQL/s.
>
> There are now two ways of specifying ADQL: BNF and XML schema. Although
> you state "ADQL/s and ADQL/x are translatable to each other without loss
> of information", does this apply to the specifications themselves?
Yes.
> This
> is extremely important to my work on the Astrogrid QueryBuilder and also
> to work on our server side implementation, both of which depend on the
> XML schema for ADQL/x. The QueryBuilder automatically adjusts to the
> schema, for example, to give the possibles comparison operators in a
> comparison. But this is just one example. The elimination of all the
> enumerations from the schema (but their inclusion in BNF, by the looks),
> would mean rewriting parts of the QueryBuilder and relying upon a human
> reading of the BNF specification. This looks like a step backwards to me.
If we put enumerations to the schema, It will become difficult to introduce new operator without updating the schema. I don't like to change the schema so frequently just for introducing the new operators.
> My suggestion is that both specifications should have the same degree of
> richness. They should be kept in step. (With my developers hat on, has
> anyone thought of trying to generate one from the other, ie: ADQL/bnf
> spec from ADQL/schema, or vice-versa?).
>
> "[T]his document is mainly devoted to describe syntax of the ADQL/s and
> the way of mapping from ADQL/s to ADQL/x is described briefly." I'm
> sorry, I couldn't find this mapping anywhere.
I was trying to write, but there was not enough time. I will write.
> (2) Core and Extensions.
>
> The description of Core and Extensions I find somewhat misleading. There
> appears to be no extension mechanism here. Rather, the specifications
> represent some idea of core and some idea of full. I believe the ideas
> of "extension" and "full" are somewhat different. A proper extensions
> mechanism would surely be more flexible. I think to be consistent, if we
> go along with this idea, one should talk of Core and Full.
I don't understand why you say "no extension mechanis".... The FULL was used before the Victoria meeting, but it was decided to use a word "Extesion" for syntax beyond the Core.
> An example:
> "Selected data are either column values or the number of selected rows.
> An expression like a+b is not supported in the core syntax, which is
> supported as an extension."
> (3) Core: From clause.
> "Exactly one table SHALL be specified in the FROM clause. A table is
> specified by a table name followed by an alias name. The table alias
> name MUST be supplied."
> I'm sorry I need clarification here. Are you suggesting only ONE table
> can be selected from in any one query?! In my opinion this is far too
> restrictive even for a Core specification.
Yes. only one table.
Core spec is defined so that it apply all the enviroment. Some data might be stored on a text file, and query is implemented by just using a script. Some data might be store on a XML db, the main query is made by XPath but it allows an ADQL query. I don't think XML DB support table join thing.
> (4) Core: Comparisons.
>
> "<comparison_op> supported in the Core syntax is basic comparison
> operators listed in table 2."
> Is this implying the Full specification may allow others?
Yes. All the supported functions SHOULD be exposed as metadata.
> (5) Core: Region Condition.
>
> "Regional condition SHOULD be supported for a table that has a set of
> columns representing a position in a two dimensional space. Those
> columns SHOULD have metadata related to their coordinate frame."
> This implies there must be some way of retrieving metadata on a table in
> BOTH Core and Full versions.
Yes.
> (6) Core: Region and Frames.
>
> "<frame> is a frame name defined in the STC specification". Will the STC
> specification be a part of ADQL? In the previous XML schema, it was
> quoted as an import. I found it of considerable use in driving region
> queries.
ADQL schema import STC schema, but only a small part of the STC schema is used.
> "A search table that supports regional search SHALL accept at least one
> of the frame names, which SHALL be provided through a metadata query. A
> list of all the supported frame names SHOULD also be provided through a
> metadata query."
>
> This again implies metadata queries as a part of any Core implementation.
Yes.
> (7) Core: Functions.
>
> I cannot see a good reason for restriction the Core specification to
> such a small subset of Aggregate functions (ie: only Count()).
Count() is very easy to implemnet in any enviroment. If DBMS is used, it is to support all the aggregate function, but it is not assumed.
> You also state: "A function support is not mandatory in core ADQL
> specification, however RECOMMENDED to support basic functions list in
> table 5."
> I'm sorry but I find this somewhat strange. The Core specification has
> no mandatory support, but recommends a range of functions? This is
> hardly an aid to interoperability.
Well, I should have made a function support to an extension. I will move it to a extension, if no strong objection.
> (8) Full: INTO.
> "INTO construct is defined as an extension to specify the VOSpace
> location where the result is stored. The exact syntax of the VOSpace
> location is defined in a separate specification.
>
> SELECT g.* INTO VOS:/JHU/gal FROM galaxy g
> WHERE g.redshift > 3.5"
>
> Ouch! I appreciate the problem here, but I wonder whether quoting an
> instance is the way forward. Aren't we here mixing up a query, hopefully
> aimed at the relational model and therefore abstracted from any
> implementation, with instances of files which are part of an
> implementation?
VOS:/JHU/gal does not represent a file, just a kind of URL. Therefore it is not a part of implementation.
> I can think of a number of disadvantages to doing this:
> (a) The file may or may not already exist. Implementations may differ as
> to how they handle these situations! This is not explained, nor I think
> should it be, as the details should not be exposed in this fashion.
ADQL just definse how to write a query to store the result on a storage. How to handle such situation should be defined on an interface spec.
> (b) Sharing a query may get more difficult (we now have personal details
> embedded).
> (c) Running a query in more than one context would be
> problematical (eg: in a workflow, maybe in a loop,), but again the
> implementation specifics are here the bugbear.
> My own comment... I think this reflects another problem: How can we fold
> variables into a query instance? Think of executing a script. No one
> wants to edit a complex script each time the script is run. That is one
> of the things that appears lacking almost completely in the versions of
> ADQL I've examined. What we need is a marker for things that can vary
> from one execution to another that abstracts implementation details.
You can make a function to generate an ADQL from a set of parameters. Then call the function in the script. Does it not work ?
> (9) Full: Upload.
>
> "#uploadUPLOAD keyword MAY be used at a FROM clause to represents
> votables. Using this syntax, table join between internal tables and
> external votables can be described. A votable name, which is an
> attribute of a TABLE element, may be followed to distinguish the
> multiple tables in votables. A syntax to refer to the votable and it
> example are:
>
> UPLOAD [ <votable_name> ] [ AS ] <alias>
> FROM galaxy g, UPLOAD name1 vot1,
> UPLOAD name2 vot2"
>
> The same comments to my previous ones on INTO apply, so won't repeat
> them here: I'll take them as understood. However, I'd like to make a
> further emphasis. The idea of UPLOAD introduces an element of workflow
> within a query which is out of place. I've nothing against a completely
> new verb called UPLOAD at the same level as SELECT. But mixing the two
> makes for an added degree of complexity.
I don't understand what is the problem.
The query using the UPLOAD keyword is generated by the workflow system during the execution of distributed queries. The initial query may be desribed like:
select t1.a, t2.b
from ivo://jvo/subaru:table1 t1, ivo://jhu/sdss:table2 t2
The query exeucter divides it two queries:
select t1.a
from tabl1 t1
select t1.a, t2.b
from UPLOAD t1, table2 t2
Firust query is send to ivo://jvo/subaru, and the second query is to ivo://jhu/sdss. When sending a second query, VOTable that is a result of the first query is sent together.
What is the problem on your workflow system. JVO system can do the query without any problem.
> (10) Table Name and Service Identifier.
>
> "Table name qualified by a service identifier MAY be supported to
> specify a table that belongs to another SkyNode service. A short name of
> the service MAY be specified, however note that it does not guaranty the
> uniqueness in the VO.
> [( <service_identifier> | <short_name> ) : ]
> <table_name>
> e.g. ivo://jvo/sxds:tableName
> e.g. sxds:tableName"
>
> I'm not sure about the differences between service identifier and short
> name. I've nothing against distributed queries, ie: quoting tables from
> different archives/databases, but are we here specifying a unique
> service that controls a table? This means that I could not automatically
> use elsewhere a query targetted at a specific copy of a database. Again,
> I'm against mixing relational and implementation specifics.
Short name is a alias name to of the data resource. That is a part of resource metadata. Automatic service selection should be made before sending the query. User may specify a target service in an abstract way, ant client application finds a appropriate service from a registry, then create an ADQL for the service.
> (11) XPath.
>
> I need some further explanation of why we aim to support XPath
> expressions. On the face of it, it doesnt appear to be warranted in
> making astronomical queries. Is this something to do with interrogating
> the registry?
It is required in desribing the registry query.
> (12) Extensions.
>
> "Supported extensions SHOULD be provided through a metadata query using
> extension IDs that are listed in table 1."
>
> The table provides a list of 20 areas where metadata calls can be made
> to see whether a particular feature is supported by the underlying database.
>
> Some use of metadata querying seems sensible. But I'm surprised we need
> such a wide range of constructs delimited by making metadata calls. It
> is obvious this is going to give us interoperability problems. (Perhaps
> we've thrown that dream away?) I'm also pretty sure an astute use of
> XSLT and workflow can overcome most of the restrictions. Surfacing
> implementation specifics in query building has got to be bad news.
Interoperabiltiy is made by using Core ADQL spec. Extensions are not mainly for interoperability, rather for allowing each service to implement a richer query syntax appropriate for the service.
> (13) Data Types.
>
> I'll probably need to keep studying this section for some time. It is
> obviously a critical part of the specification. What I want to know is
> whether the ADQL/x schema will supply this degree of richness, and how?
A literal expression of all the non numeric data type is expressed by a "NonNumericLiteral" tag which has an attribute "dataType".
> "Every column SHALL be assigned one of the Core data types, the
> extension data types or the service specific data types. The basic
> binary and unary operators shown in the table SHALL be supported."
>
> How do we propose to guarantee this?
Sorry, I don't catch your last question...
Yuji Shirasaki.
>
> Regards
> Jeff
> ---
> Jeff Lusted tel: +44 (0)116 252 5358
> AstroGrid Project mob: +44 (0)7973 492290
> Dept Physics & Astronomy email: jl99-at-star.le.ac.uk
> University of Leicester web: http://www.astrogrid.org
>
>
Received on 2006-07-04Z18:03:33