RE: REGION

From: Alex Szalay <szalay-at-jhu.edu>
Date: Tue, 8 May 2007 09:25:47 -0400


I have an other concern about how we set out to do REGION. In the WHERE clause of SQL every item is a Boolean expression which evaluates to true or false. These have the form of value OPERATOR (value|set), the latter for the case of IN. In any case, the left hand side of a where clause is always a scalar value evaluated from the next tuple. In our current REGION syntax we have a list of columns on the left hand side, not a scalar, quite different from anything else in the SQL WHERE.

After being negative, here is a suggestion, that came up today in a discussion with Benjamin. He did not quite like it, but I think it is worth mentioning it. This may have come up before I was involved in the VOQL, and discarded, but let me try it anyway. These are of course rough examples.

Consider a boolean built-in REGION function, with possibly a variable number of arguments. We can then write a spatial query like

	SELECT objid, ra,dec FROM Catalog c
	WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','ICRS',c.ra, c.dec)
or	
	SELECT objid, ra,dec FROM Catalog c
	WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','ICRS',c.x,c.y,c.z)
or 
	SELECT objid, gl,gb FROM Catalog c
	WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','GII',c.gl,c.gb)

One can also do the "get all regions which contain the point" query. Assume that we have a table of regions. Then

	SELECT regionid FROM regionTable r
	WHERE REGION.CONTAINS(r.regionDef,'ICRS',185,32.5)

We can also have an intersect function, which can say if two regions intersect, like

	SELECT regionid FROM regionTable
	WHERE REGION.INTERSECTS(regionDef,'CIRCLE ICRS 185 32.5 0.5')

These are all automatically valid SQL statements, provided that the mapping of the actual parameters on to the precise definition of the function is correct. One can even think of implementing this directly... and extending it later is trivial (REGION.DISJOINT, etc).

The . notation is also arising in the INFORMATION_SCHEMA.* cloud, so it is not unlike existing stuff. By the way, in SDSS we have 30,000 regions stored, and I believe that Gretchen has over 100K in her HST exposures database. So I think we will store regions in the database.

Let me know what you think. Sorry for the two page size.

--Alex

-----Original Message-----
From: owner-voql-teg-at-eso.org [mailto:owner-voql-teg-at-eso.org] On Behalf Of Patrick Dowler
Sent: Monday, May 07, 2007 2:06 PM
To: 'VOQL-TEG'
Subject: Re: REGION

Many small comments interspersed...

On Saturday 05 May 2007 00:40, Alex Szalay wrote:
> This is a good start, but I think that we need a much clearer focus.
> Also, after reading this I still feel confused what a REGION datatype is.
> I will try to keep my comments short.
>
> In a typical spatial framework there are several different spatial
> datatypes
>
> (POINTSET, LINESET, POLYGON). These datatypes are typically not simpole,
> even
> the description of a point can be quite complex (see STC), not to mention
a
> complex region. Of course these can be serialized into a string. But I
> would not want to put the coordinates into "ra dec" strings.

REGION is the base class of all these things you mention. If that is not the

correct word (from STC or otherwise) we can pick a different one, but the idea is that it is the base class, which makes it the most general and we can
do quite a lot now and in future without adding more reserved words (eg without changing ADQL).

Yes, they are complex (in STC) but I don't see any alternative to that. I have
not looked at STC closely and considered it more or less pre-ordained that we
should use whatever VO-standard coordinate system spec is available. I don't

want to put a huge chunk of XML inside the REGION("...")...

> Of course here Pat and Benjamin also want to
> extend this to even more abstract concepts like time and energy intervals,
> that none of the GIS systems do, although for intervals I think the
BETWEEN
> clause (or several for a more complex interval set) might just do the job.
>
> One can then define various RELATIONS and various OPERATIONS between them.
> The relations can be (CONTAINS, TOUCHES, DISJOINT, INTERSECT,...)
> understood
> as an enumerated return value from an operation between two different
> spatial objects.

Yes, all these operations are possible with geometry. However, I think OVERLAPS comparison (aka INTERSECT) covers 95% of the uses without blocking future enhancements (eg if a future version of ADQL introduces CONTAINS it would keep OVERLAPS and the meaning would not change). I am happy enough to re-use the INTERSECT reserved word if others are happy with that.

> One can also have OPERATIONS among spatial objects, these are
> (INTERSECTION, UNION, DIFFERENCE) which form a Boolean algebra,
> with some restrictions. These return another spatial object.

IMO, too fancy for ADQL 1.0

> If we only restrict ourselves to POINTSETS (our catalogs) and POLYGONS
(say
> =REGIONS) there are still many different things we might want to do. These
> are all questions that the SDSS users have neen asking from the database
as
> part of their research
>
> (1) Give me all the POINTS within a REGION from a certain set of tables

SELECT * FROM <table>
WHERE <position column> OVERLAPS REGION("...")

I don't see any problem treating points as regions (with 0 area). Some geometry libraries do this, others do not.

> (2) Given me all the POINTS which are within 10 arcsec to a REGION
(errors)

This is just a different region, the client software can (should) implement this feature.

> (3) Tell me if this POINT is within this REGION

I don't see how this differs from #1. Is it just that both args to OVERLAPS are literals? That should be OK.

> (4) Which REGIONS in the database contain this POINT (is it in the photo
> footprint but not in the spectro, for example)

SELECT * FROM <table>
WHERE <bounds column> OVERLAPS REGION("123 45 ICRS")

(for example, I don't know if that is the correct string version of a point).

> (5) What is the distance of this point to the boundary

IMO, too fancy for ADQL 1.0

> (6) What percent of this points 30" neighborhood is inside the survey
> footprint

IMO, too fancy for ADQL 1.0

> One can also think of storing REGION (POLYGON) data in the database, and
> perform operations on those plus the incoming user defined regions. This
is
> a
> very complex task and to do this efficiently, one typically needs a binary
> representation inside the DB, i.e. an object oriented or an object
> relational DB. I do not want to go there, since my one page is up.

I agree technically but conceptually it is quite straightforward to define the
basic comparison operator (overlaps) that is applicable to the base type (region). That is what we have done and it covers ~95% of the ADQL use cases

and is "future proof" in the sense that further advanced features do not make
it incorrect or out of place (at least, that is a concern and I think the proposal satisfies it).

Also, despite there being a column of type REGION, I don't envision people actually storing regions in the DB... it is intended to be a logical declaration only.

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)
Received on 2007-05-08Z15:28:39