RE: ADQL WD v1.05

From: Tony Linde <Tony.Linde-at-leicester.ac.uk>
Date: Tue, 4 Jul 2006 14:08:29 +0100


I'm in the middle of report writing and have not had time to read the full spec - this is only in response to Jeff's comments so I hope they're not inappropriate. I'll get to the spec later this week.

T.

> The description of Core and Extensions I find somewhat
> misleading. There appears to be no extension mechanism here.
> ... one should talk of Core and Full.

I don't agree - Full implies a single final spec. The idea of Extensions was to ensure that different implementations could support extensions which were more specific to the type of data while ensuring that broadcast queries in the Core format were supported by all.

I agree that the Extension mechanism ought to be specified. The spec should include how these extensions are registered and accessed. OR the spec simply leaves out everything about full or extensions and simply delivers the specification for the Core ADQL.

> I'm sorry I need clarification here. Are you suggesting only
> ONE table can be selected from in any one query?! In my
> opinion this is far too restrictive even for a Core specification.

I think I agree. As long as we leave out any sense of getting data from more than one *resource* or dataset then accessing multiple tables within a single database should not be a problem.

It would mean though that any dataset which cannot support queries from more than one table (i.e. a flat-file dataset) would need to register each table as a separate resource. Is this a problem?

> This implies there must be some way of retrieving metadata on
> a table in BOTH Core and Full versions.

All metadata for a table should be in the resource description of that table or the dataset it is contained within. Such a description must either be in the registry, placed there by the resource owner, or retrievable by the standard GetMetadata method (or whatever it is called).

> I'm sorry but I find this somewhat strange. The Core
> specification has no mandatory support, but recommends a
> range of functions? This is hardly an aid to interoperability.

I would agree. As much as possible in the Core should be mandatory.

> (8) Full: INTO.
> "INTO construct is defined as an extension to specify the
> VOSpace location where the result is stored. The exact syntax
> of the VOSpace location is defined in a separate specification.

If it isn't part of the Core spec, it should not be in this document.

> "Table name qualified by a service identifier MAY be
> supported to specify a table that belongs to another SkyNode

This doc should not refer to SkyNode. Is this part of Core ADQL? If not, it should be dropped.

> -----Original Message-----
> From: owner-voql-at-eso.org [mailto:owner-voql-at-eso.org] On
> Behalf Of Jeff Lusted
> Sent: 04 July 2006 12:25
> To: Yuji SHIRASAKI
> Cc: voql-at-ivoa.net; AstroGrid Engineers
> Subject: Re: ADQL WD v1.05
>
> Hi Yuji!
>
> These are my initial comments on proposed v1.05 document.
> There are thirteen points in all, detailed below. In a
> nutshell, however, my gravest concerns are (a) The the ADQL/x
> schema should be as rich as the ADQL/s BNF description. (b)
> That the Core specification seems extremely limited (eg: see
> point 3 in particular). (c) That we have creepitis concerning
> implementation specifics (my experience has shown this to be
> a fundamental problem with RDBMS and one that I thought the
> ADQL xml approach had done a lot to eliminate).
>
> (1) ADQL/x and ADQL/s.
>
> There are now two ways of specifying ADQL: BNF and XML
> schema. Although you state "ADQL/s and ADQL/x are
> translatable to each other without loss of information", does
> this apply to the specifications themselves? This is
> extremely important to my work on the Astrogrid QueryBuilder
> and also to work on our server side implementation, both of
> which depend on the XML schema for ADQL/x. The QueryBuilder
> automatically adjusts to the schema, for example, to give the
> possibles comparison operators in a comparison. But this is
> just one example. The elimination of all the enumerations
> from the schema (but their inclusion in BNF, by the looks),
> would mean rewriting parts of the QueryBuilder and relying
> upon a human reading of the BNF specification. This looks
> like a step backwards to me.
>
> My suggestion is that both specifications should have the
> same degree of richness. They should be kept in step. (With
> my developers hat on, has anyone thought of trying to
> generate one from the other, ie: ADQL/bnf spec from
> ADQL/schema, or vice-versa?).
>
> "[T]his document is mainly devoted to describe syntax of the
> ADQL/s and the way of mapping from ADQL/s to ADQL/x is
> described briefly." I'm sorry, I couldn't find this mapping anywhere.
>
> (2) Core and Extensions.
>
> The description of Core and Extensions I find somewhat
> misleading. There appears to be no extension mechanism here.
> Rather, the specifications represent some idea of core and
> some idea of full. I believe the ideas of "extension" and
> "full" are somewhat different. A proper extensions mechanism
> would surely be more flexible. I think to be consistent, if
> we go along with this idea, one should talk of Core and Full.
>
> An example:
> "Selected data are either column values or the number of
> selected rows.
> An expression like a+b is not supported in the core syntax,
> which is supported as an extension."
>
> (3) Core: From clause.
> "Exactly one table SHALL be specified in the FROM clause. A
> table is specified by a table name followed by an alias name.
> The table alias name MUST be supplied."
> I'm sorry I need clarification here. Are you suggesting only
> ONE table can be selected from in any one query?! In my
> opinion this is far too restrictive even for a Core specification.
>
> (4) Core: Comparisons.
>
> "<comparison_op> supported in the Core syntax is basic
> comparison operators listed in table 2."
> Is this implying the Full specification may allow others?
>
> (5) Core: Region Condition.
>
> "Regional condition SHOULD be supported for a table that has
> a set of columns representing a position in a two dimensional
> space. Those columns SHOULD have metadata related to their
> coordinate frame."
> This implies there must be some way of retrieving metadata on
> a table in BOTH Core and Full versions.
>
> (6) Core: Region and Frames.
>
> "<frame> is a frame name defined in the STC specification".
> Will the STC specification be a part of ADQL? In the previous
> XML schema, it was quoted as an import. I found it of
> considerable use in driving region queries.
>
> "A search table that supports regional search SHALL accept at
> least one of the frame names, which SHALL be provided through
> a metadata query. A list of all the supported frame names
> SHOULD also be provided through a metadata query."
>
> This again implies metadata queries as a part of any Core
> implementation.
>
> (7) Core: Functions.
>
> I cannot see a good reason for restriction the Core
> specification to such a small subset of Aggregate functions
> (ie: only Count()).
>
> You also state: "A function support is not mandatory in core
> ADQL specification, however RECOMMENDED to support basic
> functions list in table 5."
> I'm sorry but I find this somewhat strange. The Core
> specification has no mandatory support, but recommends a
> range of functions? This is hardly an aid to interoperability.
>
> (8) Full: INTO.
> "INTO construct is defined as an extension to specify the
> VOSpace location where the result is stored. The exact syntax
> of the VOSpace location is defined in a separate specification.
>
> SELECT g.* INTO VOS:/JHU/gal FROM galaxy g WHERE g.redshift > 3.5"
>
> Ouch! I appreciate the problem here, but I wonder whether
> quoting an instance is the way forward. Aren't we here mixing
> up a query, hopefully aimed at the relational model and
> therefore abstracted from any implementation, with instances
> of files which are part of an implementation? I can think of
> a number of disadvantages to doing this:
> (a) The file may or may not already exist. Implementations
> may differ as to how they handle these situations! This is
> not explained, nor I think should it be, as the details
> should not be exposed in this fashion. (b) Sharing a query
> may get more difficult (we now have personal details
> embedded).(c) Running a query in more than one context would
> be problematical (eg: in a workflow, maybe in a loop,), but
> again the implementation specifics are here the bugbear.
>
> My own comment... I think this reflects another problem: How
> can we fold variables into a query instance? Think of
> executing a script. No one wants to edit a complex script
> each time the script is run. That is one of the things that
> appears lacking almost completely in the versions of ADQL
> I've examined. What we need is a marker for things that can
> vary from one execution to another that abstracts
> implementation details.
>
> (9) Full: Upload.
>
> "#uploadUPLOAD keyword MAY be used at a FROM clause to
> represents votables. Using this syntax, table join between
> internal tables and external votables can be described. A
> votable name, which is an attribute of a TABLE element, may
> be followed to distinguish the multiple tables in votables. A
> syntax to refer to the votable and it example are:
>
> UPLOAD [ <votable_name> ] [ AS ] <alias> FROM galaxy g,
> UPLOAD name1 vot1, UPLOAD name2 vot2"
>
> The same comments to my previous ones on INTO apply, so won't
> repeat them here: I'll take them as understood. However, I'd
> like to make a further emphasis. The idea of UPLOAD
> introduces an element of workflow within a query which is out
> of place. I've nothing against a completely new verb called
> UPLOAD at the same level as SELECT. But mixing the two makes
> for an added degree of complexity.
>
> (10) Table Name and Service Identifier.
>
> "Table name qualified by a service identifier MAY be
> supported to specify a table that belongs to another SkyNode
> service. A short name of the service MAY be specified,
> however note that it does not guaranty the uniqueness in the VO.
> [( <service_identifier> | <short_name> ) : ] <table_name>
> e.g. ivo://jvo/sxds:tableName e.g. sxds:tableName"
>
> I'm not sure about the differences between service identifier
> and short name. I've nothing against distributed queries, ie:
> quoting tables from different archives/databases, but are we
> here specifying a unique service that controls a table? This
> means that I could not automatically use elsewhere a query
> targetted at a specific copy of a database. Again, I'm
> against mixing relational and implementation specifics.
>
>
> (11) XPath.
>
> I need some further explanation of why we aim to support
> XPath expressions. On the face of it, it doesnt appear to be
> warranted in making astronomical queries. Is this something
> to do with interrogating the registry?
>
> (12) Extensions.
>
> "Supported extensions SHOULD be provided through a metadata
> query using extension IDs that are listed in table 1."
>
> The table provides a list of 20 areas where metadata calls
> can be made to see whether a particular feature is supported
> by the underlying database.
>
> Some use of metadata querying seems sensible. But I'm
> surprised we need such a wide range of constructs delimited
> by making metadata calls. It is obvious this is going to give
> us interoperability problems. (Perhaps we've thrown that
> dream away?) I'm also pretty sure an astute use of XSLT and
> workflow can overcome most of the restrictions. Surfacing
> implementation specifics in query building has got to be bad news.
>
> (13) Data Types.
>
> I'll probably need to keep studying this section for some
> time. It is obviously a critical part of the specification.
> What I want to know is whether the ADQL/x schema will supply
> this degree of richness, and how?
>
> "Every column SHALL be assigned one of the Core data types,
> the extension data types or the service specific data types.
> The basic binary and unary operators shown in the table SHALL
> be supported."
>
> How do we propose to guarantee this?
>
>
> Regards
> Jeff
> ---
> Jeff Lusted tel: +44 (0)116 252 5358
> AstroGrid Project mob: +44 (0)7973 492290
> Dept Physics & Astronomy email: jl99-at-star.le.ac.uk
> University of Leicester web: http://www.astrogrid.org
>
>
Received on 2006-07-04Z15:09:04