RE: Latest ADQL BNF

From: Alex Szalay <szalay-at-jhu.edu>
Date: Mon, 3 Dec 2007 15:14:18 -0500


I respectfully disagree. I think what I have shown that we need to do a few MINOR changes to get it fully compliant with SQL-92. IT does allow a trivial implementation, and I did build one.

What you suggest here is to go back to something where we need a totally ADQL parser that has nothing to do with SQL syntactically. I really think this would be a major step backwards, where we were 6 months ago.

--Alex

-----Original Message-----
From: owner-voql-teg-at-eso.org [mailto:owner-voql-teg-at-eso.org] On Behalf Of Patrick Dowler
Sent: Monday, December 03, 2007 3:10 PM
To: voql-teg-at-ivoa.net
Subject: Re: Latest ADQL BNF

Well, it has been a little quiet but I have been working and thinking and going over my notes from the Cambridge Interop... I have also drafted most of
the English text to describe region in the ADQL document.

In Cambridge Alex showed that it is not so simple to specify language constructs that map directly to a function in the target database, especially
with the variable number of arguments for polygon. Thus, it appears that implementors of services that understand ADQL will have to perform some translation of ADQL for their implementation. In ADQL we should try to make the language expressive and unambiguous so this is as easy as possible.

Originally, we weighed the more abstract operator notation for region-related
constructs against the more pragmatic functional notation and were persuaded

to adopt the functional notation because it would allow for the trivial implementation. But, as Alex has shown, this is not really the case and my feeling now is that there is no good reason left to adopt a functional notation for region. With that in mind, I feel we would be better served by the operator notation since it is more abstract and clear, expressed as follows.

This first part also incorporates the concept of a point/position as I recall
general consensus in Cambridge that it was desireable to complete the picture.

<point> ::= <longitude> <comma> <latitude>

<position_expression> ::= POSITION

        <left_paren> <coordsys> <comma> <point> <right_paren>

eg POSITION('ICRS', 12, 34)

<circle_expression> ::= CIRCLE

	<left_paren> 
	<coordsys> <comma> <point> <comma> <radius> 
	<right_paren>

eg CIRCLE('ICRS', 12, 34, 5)

<polygon_expression> ::= POLYGON

	<left_paren> <coordsys> <comma>
	<point> <comma> <point> <comma> <point> 
	[ <comma> <point> ...]
	<right_paren>

eg POLYGON('ICRS',

So polygon has 3+ vertices, all in the same coordinate system. Rectangle was a
specific case of polygon with issues, so I propose we drop it entirely. The
<coordsys> construct is just a string, literal or column reference (not
defined here). I opted to keep circle simple rather than use position,radius;
for longitude, latitude, and radius it really should be any numeric value (expression or column) as there is no need to introduce extra symbols.

As in Inaki's email dated Nov 14, we have symbols like:

<region> ::= <region_expression> | <region_value>
<region_expression> ::= <circle> | <polygon>
<region_value> ::= <column_reference> | <user_defined_function>

and the same for <position> (at least) since a position is not a region. Now
on to the operators:

<contains> ::= <region> CONTAINS <position>

<intersects> ::= <region> INTERSECTS <region>

If it is not clear, either of the arguments to the operator can come from database column(s) and either can be constants (expressions) as described above.

Due to the issues with standardising UDFs, I am not sure what we should do about the DISTANCE function. We could specify it thus:

<distance> ::= DISTANCE

	<left_paren>
	<position> <comma> <position> 
	<right_paren>


and implementors would have to map that to however they compute such things (or not if they do not implement it). However, use outside the select clause

is very non-scalable (performance-wise) and I for one would be hesitant to implement this myself. I would prefer to just leave it out and hopefully a separate document can be developed to standardise UDFs within the community.

That is basically it: I propose we go back to the more abstract and clear operator notation for ADQL 1.0 predicates.

Pat

Examples for the WHERE clause, using the latter circle definition:

WHERE observations.shape CONTAINS POSITION('ICRS', 123, 45)

WHERE observations.shape CONTAINS sources.position

WHERE CIRCLE('ICRS', 12, 34, 0.5) CONTAINS sources.position

WHERE CIRCLE('ICRS', 123, 45, 0.5) INTERSECTS observations.shape

WHERE CIRCLE(sources.csys, sources.ra, sources.dec, sources.err) CONTAINS POSITION('ICRS', 12, 34)

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)
Received on 2007-12-03Z21:14:53