Hi RWGers,
So we have a bit of a crisis to contend with regarding our use of STC within a VOResource record which is standing in the way of our upgrade to RI v1.0. To catch folks up, I'm going to summarize the problem and review some useful input that others have made, and then try to conclude with our current set of alternatives.
I. The Problem
We use the Space-Time Coordinates schema (STC) to describe a resource's coverage of the sky, time, and frequency. In STC, this is done by first defining "coordinate systems" for each of these things and then listing how the resource maps onto those systems. A single, simple instance looks like this:
<stc:STCResourceProfile
xmlns="http://www.ivoa.net/xml/STC/stc-v1.30.xsd">
<AstroCoordSystem xlink:type="simple"
xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
id="UTC-FK5-TOPO"/>
<AstroCoordArea coord_system_id="UTC-FK5-TOPO">
<AllSky/>
</AstroCoordArea>
</stc:STCResourceProfile>
The <AstroCoordSystem> defines a system on the sky by refering to a "standard system", via the xlink attributes. The <AstroCoordArea> describes the actual coverage on that system. The two are linked through the id value, "UTC-FK5-TOPO", which by convention, matches the local identifier part of the xlink:href attribute.
An STC description may require multiple coordinate systems to describe its coverage, so it needs a way to uniquely connect a particular coverage description to a single coordinate system. This is done with a little XML magic by making <AstroCoordSystem>'s id of type xs:ID and <AstroCoordArea>'s coord_system_id of type xs:IDREF. For this to work, there must be only one id="UTC-FK5-TOPO" in the entire document.
This is easily satisfied when we have single VOResource records; however, the problem comes when we concatonate records into a single document. If every record follows the conventional choice, there will be many occurances of id="UTC-FK5-TOPO". We could change this convention; however, we have to realize that the individual VOResource records are created independently, so some coordination is needed to ensure uniqueness.
Concatonation of VOResource records happens in two cases in the Registry Interface, within a harvesting response and within a search query response. As Paul Harrison has pointed out, there is an analogous problem with VOEvent's use of STC, so this is likely to be a more general problem.
II. Discussion
Paul Harrison posted this very useful summary of suggested alternatives:
On Tue, 5 Dec 2006, Paul Harrison wrote:
> As I see it, there a several solutions to this,
>
>
>
Here are a few comments about these alternatives:
This would have to be done at both publishing time and harvesting time since the IDs would have to be unique within the entire registry. Note that you can't just take what another registries id when you harvest; consider:
o you have to make sure that the remote registry's locally unique
id doesn't clash with yours.
o when you reharvest a record, you don't know what has changed or
added, so every id must be at least examined and perhaps
undated.
This might be made easier if we augment the id with the registry's IVOA ID; e.g: id="nvo.ncsa/registry/5:UTC-FK5-TOPO". In this case, we would only need to set the ID at publishing time; subsequent rewriting is not necessary. Note that the ID part does not need refer to the registry; it could be the ID of the resource itself. If you used the resource id, then you shouldn't need the additional "/5".
My biggest misgivings are:
o this requires special processing for a special subset of records o we have to explain how (and why) to do this to publishers. It's
not simple.
These are not insurmountable.
2. Restructure the records.
I belive Paul included this for completeness and for further illustrating the problem. Nevertheless, this would require significant processing by both the sender and receiver to combine and then split the records. So (unless I've misunderstood something), this is not particularly appealing.
3. Changing STC to use xs:keyref and xs:unique.
In principle this is possible because these types allow you to say that combinations of values--e.g. STC id and VOResource identifier--must be unique. However, this would require coordination across these two schemas, which would break their respective designs. Any use of xs:keyref within just STC (I believe) would inevitably encounter the same problem.
III. Current Options
We need a solution pretty much right away as this problem is standing in the way of our registry upgrade work. I think the simplest solution available is Paul's suggestion #1, with the variation I suggest to incorporate the registry's (or the resource's) IVOA ID.
Arnold could in principle, change the STC schema not to use the xs:ID/IDREF types. It could retain the data model, but impose rules of uniqueness that are outside the capabilities of a an XML Schema-aware parser to check; this would require an application-specific validater to check. This is not unprecedented as we have this in VOResource now. However, I'm not sure this is practical on a short timescale, and if the #1 solution above is viable, then changing the STC schema may not be wise and worth the extra validater development required.
If we assume #2 and #3 above are not viable (especially given our schedule), the only other option is to drop the use of STC altogether from VOResource until a solution can be found. We still have the ability to point to a footprint service. Personally, I'm not ready to go here, yet. I'm not about to propose an alternate schema to STC (for one, this is not a quick solution). More importantly, I'm not ready to drop an important set of metadata--coverage--recommended by the RM because of a technical glitch in STC.
In conclusion, if you guys agree that solution #1 is the way to go, then we will need to get out (quickly) a concise, unambiguous description of how form and use these IDs.
cheers,
Ray
Received on 2006-12-14Z15:57:29