Jonathan, hello.
On 2007 Sep 18, at 13:38, Jonathan McDowell wrote:
> I have prepared a note on my view of the role and syntax of UTYPEs
I think this is an excellent plan -- it feels as if UTYPEs (or utypes, or UTypes, ...) have long been the magic spell that's going to solve the VO's modelling problems, always _next_ year, so some explicitness is probably overdue.
I prepared a Note proposing a UType syntax earlier this year <http:// www.ivoa.net/Documents/latest/utype-uri.html>. That Note proposed a UType syntax as well as discussing some of the benefits this would bring, so it may have tried to do too many things at once.
Here are a few comments on your UType proposal.
Section 1.2, Namespacing
It seems impossible for there _not_ to be namespacing support, since without namespacing, we can have no versioning, profiling or extension. Omitting namespacing requires that the first version of a data model be perfect, and that astronomy will not change thereafter.
Yes, namespaces should be URLs (and dereferenceable). XML is the well-known example of this, but the notion is much more general.
Section 1.1, syntax
Although in section 1.6 you make the point that syntax and semantics are separate things (and I heartily agree with you), the discussion in section 1.1 seems to me to crush the two things together.
Defining the UType as a structured string implies, and indeed imposes, a hierarchical structure on the UTypes. The potential for cropping and the case-insensitivity mean that an application has to do some normalisation before two UTypes can be compared. The fact that a UType has any structure at all means that applications will be obliged to parse it to some extent, which means they'll get it wrong sometimes, and who's to clear up the mess? The parsing is not complicated, of course, but it's more trouble than just using the string as-is. This section doesn't discuss why these extra costs are necessary -- it's not as if users will routinely be typing these things in (surely).
I propose that a UType be, in principle, simply a URI with fragment. The part before the hash acts as the namespace (and when dereferenced could give human- or machine-readable documentation for the namespace in question), and the opaque sequence of characters after it is the within-namespace part of it.
You never know: you might be able to get the UType spec onto a single page!
How that URI appears in a document would depend on the serialisation. In the case of any XML document it could use the XML namespace mechanisms:
<element xmlns:xx='http://example.edu/myns/1.0#'>
<subelem utype='xx:utype'/>
</element>
VOTable might have a different mechanism, and the same UType might appear in FITS as:
TUTNS4='http://example.edu/myns/1.0#'
TUTYP4='utype'
That is, although the UType is a URI for specification purposes, it need never appear as such in a real VO document. This means the namespacing mechanism barely needs specifying at all (less syntax to specify and get wrong), and the required processing can be handled by any language which can do string concatenation (which even includes Fortran).
I'm not suggesting that the within-namespace UType be a completely opaque blob of characters. It would be wise for a UType spec to give some very firm guidance about the format of UType strings -- for example indicating that they should reflect any hierarchical structure within the data model. This make it easier for DM maintainers and application authors to manage or generate them. But I see no need for DM authors to be second-guessed by having the syntax mandated in advance in the UType spec.
1.3 and 1.4, combining models and introducing UFIs
I feel that the notational complexity of these two sections comes about from defining syntax and semantics at the same time.
The two examples in these sections are 'the thing which is the Resolution.PosAngle.Value of a RedshiftFrame.CustomRefPos.Coordinate' and 'the Char.CharacterizationAxis which has UCD X'. These two examples are instances of the same problem: how do you take a complicated idea and turn it into a sequence of characters? Sections 1.3 and 1.4 are two separate ad-hoc solutions, each of which is about as complicated as it could get, and both of which present separate parsing challenges.
A bold solution to this problem is to straightforwardly rewrite the two quoted sentences above in formal terms (yes, in RDF, that's what it's for), which can be as expressive and as extensible as we will ever need, and then as a conceptually separate step, discuss how to serialise that in an XML, VOTable, or FITS file. The first step is well understood, and the second problem is already solved.
OK: that would make the spec two pages long, but who's counting?
Best wishes (see you next week!),
Norman
-- ------------------------------------------------------------ Norman Gray : http://nxg.me.uk eurovotech.org : University of Leicester, UKReceived on 2007-09-22Z19:19:42