Re: VOTable session @ Interop.Moscow

From: Bob Mann <rgm-at-roe.ac.uk>
Date: Thu, 21 Sep 2006 14:32:34 +0100 (BST)

> On Thu, 21 Sep 2006, Pierre Didelon wrote:
>>It could be use by example to "materialise" in a generic

>> VOTAble way, the cross-identification between two catalog (row by row),
>> or allow anykind of link from row to row.
>> Does it make sens?

On Thu, 21 Sep 2006, Mark Taylor wrote:
> I'd like to see a concrete use case for this kind of thing

         How about the following use case, which is fairly concrete in that I had a PhD student doing essentially this a year or so ago.

         An astronomer wants to cross-match sources in catalogue A with objects in catalogue B using an algorithm which cannot be executed inside either database or as part of an ADQL query. She runs a conesearch (or equivalent) on each database, to extract the entries from each lying in the area of sky in which she is interested, and obtains two VOTables, votA and votB. She passes these both to her cross-matching service, which returns a VOTable recording pairs of entries in votA and votB which her algorithms judges to be safe matches.

         As Mark points out, it would be perfectly possible for the final VOTable to have entries of the form (row N in votA, row M in votB), but this would limit the future utility of the cross-matches. If votA and votB have been extracted by simple conesearches (or any equivalent query that does not have an "order by" clause) then the ordering of the rows in votA and votB is arbitrary, in the sense that re-running those queries will not necessarily yield the same ordering.

         This means that the cross-match pairs can only be used by people having access to votA and votB, not their functional equivalents generated by running the same conesearch queries at a later date. It would be much nicer if the IDs for the rows in votA and votB referred to some (assumed persisent) identifier for rows in A and B, since then the cross-match pairs could be re-used with any data extracted from A and B, and not just the particular VOTable votA and votB.

         Now, Mark may counter - and I apologise for putting words into his mouth - that the VOTable format should not be influenced by wider concerns like that, and should only be concerned with how these particular files votA and votB are used....and he may be right in saying that, but as, in practice, VOTables often (usually?) contain subsets of larger datasets contained in databases, I think it would be very useful for there to be a mechanism whereby rows in VOTables could be identified using references to rows in their parent database.

         cheers

         Bob Received on 2006-09-21Z15:35:42