RE: Latest ADQL BNF

From: Jeff Lusted <jl99-at-star.le.ac.uk>
Date: Wed, 05 Dec 2007 10:11:28 +0000


Hi Colleagues!

I'm fairly fortunate in that my work on ADQL has been against a background of developing a parser. So I get to see the intricacies of BNF as used in SQL92, which occasionally is not straightforward. Also I have the technology right at hand to engineer an ADQL solution (and to test out the weaknesses), at least at the level of syntax (and to some extent semantics).

I've come to the task of ADQL with the supposition that the language would be based upon SQL92 (at the moment a subset) with astronomical extensions. My colleagues within Astrogrid, I'm sure, expect to see something astronomical in the language, and I would be disappointed if it remained just a subset of SQL92. It might be difficult to justify the appellation ADQL. Region seems to be the most significant astronomical construct that we have considered.

Regarding region, my personal preference is for the more abstract operator notation that Pat describes. But an examination of Inaki's draft of 14th November also shows accommodations of region as a type and region as a user defined function. I don't see why we should not stretch ourselves to cover all of these possibilities, though it is more work.

As the developer of a parser, I look upon all the constructs within the BNF as requiring translation, even the so-called user-defined-functions. As far as I am concerned, ADQL is not a straight pass-through to SQL. For Astrogrid at least, there is a stage where a parser is operating, and we attempt within the data center implementation, to emit the SQL dialect that is acceptable to the local RDBMS.

I see this as essential, as I do not think that the subset of SQL92 we have selected will suffice for too long. I can think, for instance, of bit manipulation, which we have not so far accommodated; and seems to be syntactically addressed somewhat differently by each SQL manufacturer. And that is not the only area. I think we need an approach which does not rule out extending beyond a subset of SQL92, and which does not see the language as a straight pass through to the underlying RDBMS.

Admittedly, this is more work, but with some effort does free us from the constraints of choosing a subset of SQL which is the lowest common denominator. I don't see this as a step backwards. When seen from this view point, treating Region as an explicit construct requiring parsing and translation (into some underlying implementation specific details) is a natural extension of developing a parser.

Regards
Jeff

On Mon, 2007-12-03 at 15:14 -0500, Alex Szalay wrote:
> I respectfully disagree. I think what I have shown that we need to do a few
> MINOR changes to get it fully compliant with SQL-92. IT does allow a trivial
> implementation, and I did build one.
>
> What you suggest here is to go back to something where we need a totally
> ADQL parser that has nothing to do with SQL syntactically. I really think
> this would be a major step backwards, where we were 6 months ago.
>
> --Alex
>
> -----Original Message-----
> From: owner-voql-teg-at-eso.org [mailto:owner-voql-teg-at-eso.org] On Behalf Of
> Patrick Dowler
> Sent: Monday, December 03, 2007 3:10 PM
> To: voql-teg-at-ivoa.net
> Subject: Re: Latest ADQL BNF
>
>
> Well, it has been a little quiet but I have been working and thinking and
> going over my notes from the Cambridge Interop... I have also drafted most
> of
> the English text to describe region in the ADQL document.
>
> In Cambridge Alex showed that it is not so simple to specify language
> constructs that map directly to a function in the target database,
> especially
> with the variable number of arguments for polygon. Thus, it appears that
> implementors of services that understand ADQL will have to perform some
> translation of ADQL for their implementation. In ADQL we should try to make
> the language expressive and unambiguous so this is as easy as possible.
>
> Originally, we weighed the more abstract operator notation for
> region-related
> constructs against the more pragmatic functional notation and were persuaded
>
> to adopt the functional notation because it would allow for the trivial
> implementation. But, as Alex has shown, this is not really the case and my
> feeling now is that there is no good reason left to adopt a functional
> notation for region. With that in mind, I feel we would be better served by
> the operator notation since it is more abstract and clear, expressed as
> follows.
>
> This first part also incorporates the concept of a point/position as I
> recall
> general consensus in Cambridge that it was desireable to complete the
> picture.
>
> <point> ::= <longitude> <comma> <latitude>
>
> <position_expression> ::= POSITION
> <left_paren> <coordsys> <comma> <point> <right_paren>
>
> eg POSITION('ICRS', 12, 34)
>
> <circle_expression> ::= CIRCLE
> <left_paren>
> <coordsys> <comma> <point> <comma> <radius>
> <right_paren>
>
> eg CIRCLE('ICRS', 12, 34, 5)
>
> <polygon_expression> ::= POLYGON
> <left_paren> <coordsys> <comma>
> <point> <comma> <point> <comma> <point>
> [ <comma> <point> ...]
> <right_paren>
>
> eg POLYGON('ICRS',
>
> So polygon has 3+ vertices, all in the same coordinate system. Rectangle was
> a
> specific case of polygon with issues, so I propose we drop it entirely. The
> <coordsys> construct is just a string, literal or column reference (not
> defined here). I opted to keep circle simple rather than use
> position,radius;
> for longitude, latitude, and radius it really should be any numeric value
> (expression or column) as there is no need to introduce extra symbols.
>
> As in Inaki's email dated Nov 14, we have symbols like:
>
> <region> ::= <region_expression> | <region_value>
> <region_expression> ::= <circle> | <polygon>
> <region_value> ::= <column_reference> | <user_defined_function>
>
> and the same for <position> (at least) since a position is not a region.
> Now
> on to the operators:
>
> <contains> ::= <region> CONTAINS <position>
>
> <intersects> ::= <region> INTERSECTS <region>
>
> If it is not clear, either of the arguments to the operator can come from
> database column(s) and either can be constants (expressions) as described
> above.
>
> Due to the issues with standardising UDFs, I am not sure what we should do
> about the DISTANCE function. We could specify it thus:
>
> <distance> ::= DISTANCE
> <left_paren>
> <position> <comma> <position>
> <right_paren>
>
>
> and implementors would have to map that to however they compute such things
> (or not if they do not implement it). However, use outside the select clause
>
> is very non-scalable (performance-wise) and I for one would be hesitant to
> implement this myself. I would prefer to just leave it out and hopefully a
> separate document can be developed to standardise UDFs within the community.
>
>
> That is basically it: I propose we go back to the more abstract and clear
> operator notation for ADQL 1.0 predicates.
>
> Pat
>
> Examples for the WHERE clause, using the latter circle definition:
>
> WHERE observations.shape CONTAINS POSITION('ICRS', 123, 45)
>
> WHERE observations.shape CONTAINS sources.position
>
> WHERE CIRCLE('ICRS', 12, 34, 0.5) CONTAINS sources.position
>
> WHERE CIRCLE('ICRS', 123, 45, 0.5) INTERSECTS observations.shape
>
> WHERE CIRCLE(sources.csys, sources.ra, sources.dec, sources.err) CONTAINS
> POSITION('ICRS', 12, 34)
>
>

-- 
Jeff Lusted               tel: +44 (0)116 252 3581
Astrogrid Project         mob: +44 (0)7973 492290
Dept Physics & Astronomy  email: jl99-at-star.le.ac.uk
University of Leicester   web: http://www.astrogrid.org
Received on 2007-12-05Z11:51:54