I just had a brief discussion with Jonathan and we may have a solution
that is a variation on Ray's, but generalized and formalized.
The problem really is how we can ensure that associations that are unique within a document remain unique when elements are copied to, or concatenated into, a new document.
Let's assume, for the sake of argument, that we are using ID/IDREF pairs, though that is not essential (as I said before, the issue is that the association needs to be unambiguous, not what the particular datatype is).
If we require that all IVOA documents contain a document URI, assigned
by the publisher, then we can solve the problem by setting a rule that
all ID and IDREF tags, when extracted from the document, should
receive the document URI as a prefix.
Another way of putting it is that all tags should be URIs, but that
the common root may be omitted, provided that it is presented in a
document URI.
So, the STCResourceProfile from this document:
<MyResource ... documentURI="ivo://ncsa/MyResource">
...
<STCResourceProfile>
<AstroCoordSystem xlink:type="simple"
xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
id="UTC-FK5-TOPO"/>
<AstroCoordArea coord_system_id="UTC-FK5-TOPO">
<AllSky/>
</AstroCoordArea>
</STCResourceProfile>
...
gets extracted into the registry as:
<STCResourceProfile>
<AstroCoordSystem xlink:type="simple"
xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
id="ivo://ncsa/MyResource/UTC-FK5-TOPO"/>
<AstroCoordArea coord_system_id="ivo://ncsa/MyResource/UTC-FK5-TOPO">
<AllSky/>
</AstroCoordArea>
</STCResourceProfile>
We believe that this would be a global and general solution for all associations, in the registry and elsewhere.
Ray Plante wrote:
> Hi RWGers, > > So we have a bit of a crisis to contend with regarding our use of STC > within a VOResource record which is standing in the way of our upgrade > to RI v1.0. To catch folks up, I'm going to summarize the problem and > review some useful input that others have made, and then try to > conclude with our current set of alternatives. > > I. The Problem > > We use the Space-Time Coordinates schema (STC) to describe a resource's > coverage of the sky, time, and frequency. In STC, this is done by first > defining "coordinate systems" for each of these things and then listing > how the resource maps onto those systems. A single, simple instance looks > like this: > > <stc:STCResourceProfile > xmlns="http://www.ivoa.net/xml/STC/stc-v1.30.xsd"> > > <AstroCoordSystem xlink:type="simple" > xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO" > id="UTC-FK5-TOPO"/> > > <AstroCoordArea coord_system_id="UTC-FK5-TOPO"> > <AllSky/> > </AstroCoordArea> > > </stc:STCResourceProfile> > > The <AstroCoordSystem> defines a system on the sky by refering to a > "standard system", via the xlink attributes. The <AstroCoordArea> > describes the actual coverage on that system. The two are linked through > the id value, "UTC-FK5-TOPO", which by convention, matches the local > identifier part of the xlink:href attribute. > > An STC description may require multiple coordinate systems to describe its > coverage, so it needs a way to uniquely connect a particular coverage > description to a single coordinate system. This is done with a little > XML magic by making <AstroCoordSystem>'s id of type xs:ID and > <AstroCoordArea>'s coord_system_id of type xs:IDREF. For this to work, > there must be only one id="UTC-FK5-TOPO" in the entire document. > > This is easily satisfied when we have single VOResource records; however, > the problem comes when we concatonate records into a single document. > If every record follows the conventional choice, there will be many > occurances of id="UTC-FK5-TOPO". We could change this convention; > however, we have to realize that the individual VOResource records are > created independently, so some coordination is needed to ensure > uniqueness. > > Concatonation of VOResource records happens in two cases in the Registry > Interface, within a harvesting response and within a search query > response. As Paul Harrison has pointed out, there is an analogous problem > with VOEvent's use of STC, so this is likely to be a more general problem. > > II. Discussion > > Paul Harrison posted this very useful summary of suggested alternatives: > > On Tue, 5 Dec 2006, Paul Harrison wrote: > > As I see it, there a several solutions to this, > > > > 1. The registry always rewrites the id and coord_system_id within a > > single record with unique values - e.g. ascending integers for a > > particular harvest set - this is relatively simple to implement, but > > is rather a shame to loose the "human readable" ids, however the > > document will be xml valid. > > > > 2. Gather all of the AstroCoordSystem definitions into a special > > record and retain their human readable IDs and then do not emit the > > individual AstroCoordSystem elements in the individual records - > > though for a normal query to the registry (returning one record), it > > must remember to insert the appropriate AstroCoordSystem(s) from the > > special record. This would be an extra level of complexity in the > > registries housekeeping that it has not had to deal with so far > > though. > > > > 3. Change the STC schema so that it does not use xs:ID and xs:IDREF > > types for the cross referencing, but use xs:unique and xs:keyref > > constraints to ensure integrity of the ids and references - this has > > the advantage that the scope of the uniqueness can be defined rather > > than it having to be global to the XML document, so that the ids > > could be scoped to be unique just within each registry record. This > > solution seems best to me as it retains XML parser checking of id > > uniqueness, allows "human readable" ids within each record, and > > requires no special processing by the registries. > > Here are a few comments about these alternatives: > > 1. Rewriting IDs. > > This would have to be done at both publishing time and harvesting > time since the IDs would have to be unique within the entire > registry. Note that you can't just take what another registries id > when you harvest; consider: > > o you have to make sure that the remote registry's locally unique > id doesn't clash with yours. > o when you reharvest a record, you don't know what has changed or > added, so every id must be at least examined and perhaps > undated. > > This might be made easier if we augment the id with the registry's > IVOA ID; e.g: id="nvo.ncsa/registry/5:UTC-FK5-TOPO". In this case, we > would only need to set the ID at publishing time; subsequent rewriting > is not necessary. Note that the ID part does not need refer to the > registry; it could be the ID of the resource itself. If you used the > resource id, then you shouldn't need the additional "/5". > > My biggest misgivings are: > > o this requires special processing for a special subset of records > o we have to explain how (and why) to do this to publishers. It's > not simple. > > These are not insurmountable. > > 2. Restructure the records. > > I belive Paul included this for completeness and for further > illustrating the problem. Nevertheless, this would require > significant processing by both the sender and receiver to combine and > then split the records. So (unless I've misunderstood something), > this is not particularly appealing. > > 3. Changing STC to use xs:keyref and xs:unique. > > In principle this is possible because these types allow you to say > that combinations of values--e.g. STC id and VOResource > identifier--must be unique. However, this would require coordination > across these two schemas, which would break their respective designs. > Any use of xs:keyref within just STC (I believe) would inevitably > encounter the same problem. > > III. Current Options > > We need a solution pretty much right away as this problem is standing > in the way of our registry upgrade work. I think the simplest > solution available is Paul's suggestion #1, with the variation I > suggest to incorporate the registry's (or the resource's) IVOA ID. > > Arnold could in principle, change the STC schema not to use the > xs:ID/IDREF types. It could retain the data model, but impose rules > of uniqueness that are outside the capabilities of a an XML > Schema-aware parser to check; this would require an > application-specific validater to check. This is not unprecedented as > we have this in VOResource now. However, I'm not sure this is > practical on a short timescale, and if the #1 solution above is > viable, then changing the STC schema may not be wise and worth the > extra validater development required. > > If we assume #2 and #3 above are not viable (especially given our > schedule), the only other option is to drop the use of STC altogether > from VOResource until a solution can be found. We still have the > ability to point to a footprint service. Personally, I'm not ready to > go here, yet. I'm not about to propose an alternate schema to STC > (for one, this is not a quick solution). More importantly, I'm not > ready to drop an important set of metadata--coverage--recommended by > the RM because of a technical glitch in STC. > > In conclusion, if you guys agree that solution #1 is the way to go, > then we will need to get out (quickly) a concise, unambiguous > description of how form and use these IDs. > > cheers, > Ray > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots-at-head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ --------------------------------------------------------------------------Received on 2006-12-14Z20:28:34