RE: REGION

From: Alex Szalay <szalay-at-jhu.edu>
Date: Sat, 5 May 2007 03:40:36 -0400


This is a good start, but I think that we need a much clearer focus. Also, after reading this I still feel confused what a REGION datatype is. I will try to keep my comments short.

In a typical spatial framework there are several different spatial datatypes

(POINTSET, LINESET, POLYGON). These datatypes are typically not simpole,
even
the description of a point can be quite complex (see STC), not to mention a complex region. Of course these can be serialized into a string. But I would not want to put the coordinates into "ra dec" strings.

Of course here Pat and Benjamin also want to extend this to even more abstract concepts like time and energy intervals, that none of the GIS systems do, although for intervals I think the BETWEEN clause (or several for a more complex interval set) might just do the job.

One can then define various RELATIONS and various OPERATIONS between them. The relations can be (CONTAINS, TOUCHES, DISJOINT, INTERSECT,...) understood

as an enumerated return value from an operation between two different spatial
objects.

One can also have OPERATIONS among spatial objects, these are (INTERSECTION,

UNION, DIFFERENCE) which form a Boolean algebra, with some restrictions. These return another spatial object.

If we only restrict ourselves to POINTSETS (our catalogs) and POLYGONS (say =REGIONS) there are still many different things we might want to do. These are all questions that the SDSS users have neen asking from the database as part of their research

(1) Give me all the POINTS within a REGION from a certain set of tables
(2) Given me all the POINTS which are within 10 arcsec to a REGION (errors)
(3) Tell me if this POINT is within this REGION
(4) Which REGIONS in the database contain this POINT (is it in the photo

        footprint but not in the spectro, for example)
(5) What is the distance of this point to the boundary
(6) What percent of this points 30" neighborhood is inside the survey
footprint

One can also think of storing REGION (POLYGON) data in the database, and perform operations on those plus the incoming user defined regions. This is a
very complex task and to do this efficiently, one typically needs a binary representation inside the DB, i.e. an object oriented or an object relational
DB. I do not want to go there, since my one page is up.

I think this is a very hard problem and requires further discussion.

--Alex

-----Original Message-----
From: owner-voql-teg-at-eso.org [mailto:owner-voql-teg-at-eso.org] On Behalf Of Patrick Dowler
Sent: Friday, May 04, 2007 5:33 PM
To: VOQL-TEG
Subject: REGION

note: I had to violate my one-screen email limit on this one, but it is a "report" :)

Benjamin and I exchanged a few emails off-line about region, and came up with this preferred format for expressing a condition:

   something OVERLAPS REGION("...")

where something is a column name or alias from the table, OVERLAPS is an operator, and REGION("...") is thus a literal value. REGION is a reserved word used to form literals (above) and to declare the type of "something". That is, a TAP service would say that there is a column of type REGION and that tells the user exactly how to formulate the condition.

We considered other reserved words for the operator (INTERSECT, IN) but discounted IN because it implies complete inclusion which we thought it not the general meaning when both the column and the literal are extended regions (rather than points). INTERSECT in SQL is used to mean "set intersection" (if I recall) so this would not be so bad if you think of a region as the "set of all points" within a boundary. Using INTERSECT would mean overloading the meaning (ie it means something special if the arguments are regions). We nominally adopted OVERLAPS (although the term does appear in the SIA 1.0 document at least). In geometry, I think INTERSECT is the general term one would use and it has all the correct implications whether you are talking about points, lines/segments, curves, or arbitrary shapes. We also looked at but rejected the PostgreSQL overlaps operator && as being obtuse.

Since I prefer with the trailing S, OVERLAPS seems slightly better (than INTERSECT). Some other reserved word might be better, but overlap is suitably general (it also appears in the SIA 1.0 doc and means the same thing there as here).

As for STC, it is just the (one?) way to specify the REGION literal. That is, STC says what to put in the string "...".

REGION is a datatype and literals are REGION("...") where ... is specified by STC. We add an operator OVERLAPS that is used between two REGIONs
(typically a column of type REGION and a literal). It should work for
columns of energy and time or whatever else is in STC. A TAP service declares (logical) columns of type REGION to say exactly where/how the OVERLAPS operator can be used with no ambiguity.

You can have multiple REGION columns in a table (in theory) and there is no need to say that 2 or more columns go together (eg ra and dec): you just have a column like "position" of type REGION. In an observation catalog you could in principle have columns like "bounds" and "center" and "target_position"
all of type REGION and all with different values.

A TAP service could in principle have columns of type REGION (for output) and yet not support the OVERLAPS operator. I think it is good to decouple this as all DBs can store them but not all can do decent spatial querying. It is up to the TAP service to decide.

I realised (but didn't express to Benjamin so he hasn't see this) that this actually works as is for the energy and time axes that STC also covers. You can declare a column named energy (for example) of type REGION, and then use STC to write the literal (interval or single value) and thus form a condition that is valid. Thus, one should be able to use

   energy OVERLAPS REGION("<serialised STC energy region>")

as well. The column metadata (utype) would indicate what kind of literal
(which STC coordinate axis) to use.

Alex mentioned a few things, 3 of which fit in fine and the 4th -- expressing unions and intersections and such -- we thought maybe too much for the query language, but could be discussed.

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)
Received on 2007-05-05Z09:40:54