Dear all,
the idea is simply to be able to create clients that can access distributed data using ADQL(VOQL) queries, without caring about whether the data are in an RDBMS, ODBMS, ascii file on a filesystem, votable or whichever the format they might be stored in.
This is the reason why, using this approach, the only thing that the server-side (data provider) is obliged to do is to build a translation:
From To
---- --
For instance, in the case of the CDS, they have their own language (why not?) to access their catalogues (whether in DB or files or whatever) called ASU_QUERY. What we've done is to help them create an ADQL- -server with a layer to translate from ADQL to their ASU_QUERY extracting the relevant Source Catalogue Data Model metadata, therefore creating a server that understands "Source Catalogue Data Model-aware ADQL queries".
Once we have n-servers speaking this "VO Source Catalogue Data Modelaware" language (just VOQL over a specific data model), any client will be able to call them (with a cone search for instance), get results and handle those results.
One of the applications that could "handle" those results could be a cross-match (certainly not the only application that will consume VO data through IVOA VOQL language, I hope). This would allow to have different clients doing crossmatch, without imposing a specific crossmatch algorithm.
This would demonstrate one face of applicability of the IVOA-VOQL which we -the IVOA in general- are after, which should be -in the end- a query language to access any type of VO-compatible data, in particular those following specific IVOA data models, and regardless of their storage mechanism.
(by the way, the system also allows to get data which have not been "translated" to Source Catalogue DM. The only problem with those is that clients do not necessarily know "what" exactly they are (not what "type of" data they are, but "what exactly" they are)).
Regarding cross-match of whole catalogues, we are more interested in cone-search type cross-match (identify sources within a certain region) but have in mind also a mechanism to handle big cross-matches, whose feasibility we are currently studying.
With respect to specific implementation details, I'd suggest any of those questions be sent outside the list, again to not annoy people unnecessarily.
I hope this clarifies some of the issues in the different mails. In any case, and although I will not be on holidays (bad boy :-(), I would suggest people to enjoy theirs and resume discussions back in January. Holidays are holy, isn't it?.
Cheers,
P.
On Thu, 2005-12-22 at 08:39 -0500, Maria A. Nieto-Santisteban wrote:
> Hi,
>
> I'm on "vacations" with very limited internet connection so I have tried
> to summarize in one single mail my comments. After Dec 28th I will
> be back to a fast link in case my comments generate rivers of bits that I
> cannot respond :-)
>
> Cheers
>
> Maria
>
> Legend:
>
> M - Maria
> P - Pedro
> MT - Mark Taylor
> C - Clive
>
> >M - How does Catalogue Data Model used look like, especially what is the
> >M common set of attributes and the associated metadata.
>
> >P The point is in the (Source) Catalogue Data Model, with emphasis in the
> >P "Source" part. This one is the one I showed on behalf of the Catalogue
> >P DM subgroup at our last interop meeting here at ESAC. I attach a pdf
> >P with the initial proposal, but please use it only for temporal
> >P reference, as the whole document will be changed (according to
> >P requirements from Jonathan after the interop meeting).
>
> >M Unfortunatelly, I'm in a dial-up connection and I cannot get the 6.6MB
> pdf
> >M but from Patricio's email and what I remember from the last IVOA I can
> imagine.
> >M Being more specific, what I am interested is to know how the mapping
> >M "original catalog - SCDM" is done for its two aspects: scientific and
> technical.
>
> >M By scientific I mean: How did you map USNOB and Tycho-2 columns into
> the model?
> >M I'm very interested in seeing this mapping. This is the very first step
> to
> >M have mechanisms that allow for common query. If all collumns are called
> the same
> >M and represent the same, running engines asking the same ADQL question
> >M is trivial.
>
> >M By technical: Do the original catalogs remain the same and you compute
> on the
> >M fly the new columns? I assume some relationships "original-model" will
> not be
> >M direct. I personally would create new columns and pre-compute the
> transformations
> >M to make things faster but probably not all catalog providers are
> willing to do so.
>
>
> >M - What are the plans about registration? Will these nodes (Basic?) be
> >M registered and therefore accessible through Open SkyQuery? How many?
>
> >P yes, they will. How many, I don't know. In Strasbourg, Inaki and
> >P Aurelien worked on a couple of them, Tycho-2 and USNOB, but the CDS
> >P colleagues will work on more.... Francois will answer to this question
> >P at some point I presume.
>
> >M This is good but brings two issues:
>
> >M - 1) If many Basic SkyNodes are going to be registered, we need to plan
> >M how to do it.
>
> >M - 2) Having a second USNOB skynode which is not exactly the same USNOB
> as
> >M the one currently working.
>
> >M Both issues, how to deal with many skynodes and how to deal with
> "mirrors" has
> >M been "avoided" but it is about time we start attacking the problem.
>
> >P n-catalogue cross-match is what we are trying to get at; it will be a
> >P client based cross-match, and therefore the cross-match function will
> be
> >P designed and run at the client side (i.e., servers do not need to worry
> >P about implementing one specific cross-match or the other).
>
> >M The client based cross-match is a good idea. You cannot be dissapointed
> with
> >M your own specific cross-match. However, I wonder what is the plan
> >M to cross-match your own "big" source catalog (let's say 700.000 rows
> >M as Mark mentions) against USNOB 1000 millions rows (If I remmember
> correctely)
> >M If your objects are in a region, I can see making 1 query and get all
> objects inside a
> >M region or few but without that ... I hope the idea is not to make
> 700.000 ADQL queries.
>
> >P At the current status, the client sends an ADQL to the server to
> discern
> >P which type of cross-match it can do with it (whether only positional,
> >P positional with errors, etc.), and takes the corresponding action.
>
>
> >M Let's see, ADQL is the language. In principle, an ADQL query will not
> >M tell you what cross-match can be performed. You can use ADQL to gather
> the
> >M information you are thinking of like ra, dec, ra_err, dec_err, only
> >M if the SkyNodes(databases) contain tables with this type of metadata. I
> hope
> >M the proposal to make this mandatory is successful and publishers
> actually follow
> >M it. In any case, what it is mandatory are the Tables and Columns
> methods which
> >M should give you this information, but that is not ADQL. It is a call to
> a Web
> >M service interface.
>
>
> >MT STILTS provides this functionality from a command-line
> >MT tool (tmatch2), but a public java API is also available for
> >MT programs that want access to it within a JVM.
>
> >M What would be worth a try is using Mark's library to set up a server
> that
> >M does the cross-match when providers don't want to use a DBMs, because
> as
> >M Clive mentioned "if the data are already in a relational DBMS
> >M then by far the simplest way to do the cross-match, and in many cases
> >M also the fastest, is to use R-tree indexing and a spatial join."
>
> >M I will not get now into the R-tree indexing, HTM, Zones, Healpix debate
> but
> >M without a question if the data is already in a database then probably
> will
> >M be less bourden for the system doing the job that answer millions of
> >M individual queries. This is the MyDATA skyNode approach which putting
> aside
> >M the problem of uploading big tables, it is much more efficient.
>
> >M However, I'm kind of interested (proabably, eassier than working in
> writting my thesis ;-))
> >M in this other debate
>
> >C Support for spatial indexing is now included in or readily available
> for
> >C DB2, Oracle, Informix, Sybase, MySQL, and Postgres, i.e. just about all
> >C the DBMS widely used in astronomy (with perhaps just one exception,
> >C which Jim can tell you about :-).
>
> >M It would be nice to know what exactly widely mean.
> >M So I volunteer to have an inventory (catalog :-) ) with information
> >M about
>
> > Catalog Name, Acess point (URL), Default position, DBMS, Host
> Organization
>
> >M This could give us an excellent test bed to compare data access and
> >M cross-match functionalities provided by different DBMS and
> >M organizations
> >M So if you guys sent me a list with those 4 data points.
> >M I will collect and make public the information. Since I'm a database
> girl
> >M please send me a file in CVS format if you have many catalogs and
> >M I will import the data into a database.
>
> >J But, getting objects into a node dominates all other costs (moving
> >J stuff thru xml is expensive).
>
> >C Indeed that is a very serious problem. I wonder if we can't solve this
> >C by using, instead of XML, some more efficient data format, e.g. one
> >C which holds tabular data in binary form with just the metadata in plain
> >C text.
> >C There's something called the "FITS table" with exactly these properties
> >C which perhaps astronomers should investigate :-)
>
> >M I do agree something needs to be done about this as well.
>
>
>
-- Pedro Osuna Alcalaya ESA Science Archives System Engineer Science Archive Team European Space Astronomy Centre (ESAC/ESA) e-mail: Pedro.Osuna-at-esa.int Tel + 34 91 8131314 --------------------------------- European Space Astronomy Centre European Space Agency P.O. Box 50727 E-28080 Villafranca del Castillo MADRID - SPAINReceived on 2005-12-22Z17:43:02