Re: A suggested revision for UCDs

From: Norman Gray <norman-at-astro.gla.ac.uk>
Date: Wed, 22 Oct 2003 18:43:48 +0100 (BST)

Greetings, all, and Tom in particular.

On Tue, 21 Oct 2003, Thomas McGlynn wrote:

>
> A few minutes ago I uploaded a version of my suggested revised
> proposal for UCDs to the Twiki. This is just a Word version since
> I don't have a PDF generator handy. The URL is
> http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9b.doc

I've appended a (longish) set of comments below. I've just noticed that Bob has forwarded a long set of comments to the list. I haven't read those yet.

By the way, I notice that this announcement/discussion has been posted to no fewer than _three_ lists, namely ucd, dm and dal. It would be at the very least neater if it were on only one -- ucd-at-ivoa.net is the obvious one. What do folk think -- are there folk on the other two lists who have an interest in this and aren't on the ucd-at-ivoa.net list?

I'm sure I'm not the only one to find Tom's proposal very thought-provoking. The suggestions bring up several new use-cases; and the idea of the `local' atom in particular is valuable, and a gap in the 1.9.9 proposals (though I'd put it in a different place). I think there are very likely several places in the 1.9.9 proposals which are underspecified, and some where I personally would probably explain things slightly differently from Roy and Sebastien, but these are editorial matters.

I have a few difficulties with some aspects of Tom's proposal, however, which I'll discuss here, and add a few more general remarks at the end. I'm speaking for myself of course, rather than the group of authors, and thus it's probable that my opinion and interpretation of some 1.9.9 points is at variance with others in the group, or goes beyond what the document aims to say (which would be a useful datapoint).

Most urgent, I think, is Tom's discussion, in his section 4.5, of the distinction between his proposals and the 1.9.9 ones. These are crucial, since these criticisms are what would ultimately justify replacing the 1.9.9 proposals with Tom's more complicated ones.

In the 1.9.9 proposals, the function of a word is always the same: some things such as `src' are concepts (and only concepts), and every other word names a property. The distinction is that concepts can't have a value, but can have properties; and a property always has a value. Now, the property;concept _pair_ also names a concept, which can therefore have properties in turn (this has the same potential as Tom's proposals for generating long UCDs in principle, but probably very unlikely in practice). There will doubtless be some rather formal language which makes this cast-iron, but it's actually fairly intuitive once you get the property/concept dichotomy and read `;' as `of a' or something like that.

Section 3.1 in the 1.9.9 proposals -- the crucial section of the document, for which everything else is to some extent just scaffolding, and without which the rest of the document makes rather less sense -- is what attempts to describe this. Perhaps that explanation needs work. At any rate, I do not believe that one has to sign up to the (basically ontology-inspired) language in that section in order to use the UCDs thus justified. Indeed, it might be useful for that section to be split into two, one to communicate the underlying idea to folk who simply want to _use_ UCDs, and another to reexpress it more formally for the ontology enthusiasts.

In his section 4.5, Tom also remarks that ``Indeed I'm not sure that any string of words can be determined to be illegal in the old scheme''. I'd probably agree in outline: there are significantly fewer rules necessary in the 1.9.9 proposals than in Tom's proposals. The only place a base concept can go is in the right-most position, and thus you can't have a concept sitting on its own, since the left-most position is the name of the property, the value of which is the number/column/whatever which has been annotated by this UCD (the syntactic mechanism for making that annotation is outside of scope for the UCD proposals, I'd think). Also, there are some property-concept pairs that make no sense, such as stat.err;src. But that's about it -- you don't need any more rules than that.

Tom constructs an `arith.diff;arith.sum;phot.flux;...' UCD. That does look unwieldy (but note there's no need for parentheses in the 1.9.9 proposals), but I get the impression that the `arith' UCD tree was to some extent a kite being flown, and I for one would be surprised if it made it much beyond this version, partly because it would seem to encourage such odd-looking UCDs. Also, there's no tying of one table to another in the 1.9.9 proposals -- I'd think that was out of scope for UCD (and quite properly so: I'll mention this below).

The 1.9.9 proposals allow no ambiguity in the way that UCDs are written: properties queue up in front of the single base concept, and ordering matters, so that stat.max;stat.err;phot.flux is different from stat.err;stat.max;phot.flux.

More specific points in Tom's proposal, in document order rather than any other (section references are to Tom's document):

Section 4.1: Bringing the number of terms up to three -- concept, attribute and modifier -- reminds me of the qualifier/modifier idea that was in previous versions of the draft, which I still think is an unstable distinction, and which Roy and Sebastien thankfully managed to get rid of by simplifying the syntax down to just concept plus properties (but see below). Also, there's no syntactic distinction between modifiers and attributes, so in order to apply the extra ordering rules for those, or even to break the UCD into its three parts, you have to know which words are of which type. That is, you can't do it at parse time.

Section 4.1.2 (not an important point, I don't think): I'm puzzled at the requirement that words in the non-standard namespace must be distinct from all words in the IVOA namespace. The point of having a namespace is to make this possible, or (since such duplication would surely be condemned as bad practice) at least not an error. The rule also means that if a new word were added to the IVOA namespace which happened to match a word in a private namespace, the namespaced UCDs would thereby suddenly become invalid, with no change in the spec.

Section 4.2.2: The `intent' modifier has no corresponding notion in the 1.9.9 proposals, but it's not clear to me where in those proposals this would fit in, and I think this is a _problem_ for the 1.9.9 proposals. I can see how it would fit in to what I take the underlying 1.9.9 model to be, but not into the serialisation of that model that the 1.9.9 syntax represents. I can see three approaches to this problem within the general framework of the 1.9.9 proposals. (i) Rule it out of scope: it's not UCD's problem to talk about what values are intended to be, since they're only for data discovery, and are not required to be capable of driving analysis, so that if this `intent' distinction matters to you, you're going to have to understand the utype somehow. (ii) Add modifiers like this to the 1.9.9 model and syntax: that's potentially quite a lot of work, since it would require thinking very clearly about just what the distinction is between modifiers and properties, _and_ working out a usable syntax for adding them in -- they _have_ to be distinguishable at parse time. (iii) Think about it more and discover a way they can be viewed as properties in a principled way. The point isn't just about this `intent' modifier: if we can convince ourselves that there are things like `intent' (and that they're in scope) which are in principle qualitatively distinct from properties (and I would at least dispute that `em' and `frame' count here), then that has to be dealt with. Perhaps this example will help us find the stable distinction between `qualifiers' and `modifiers' that escaped us in earlier versions.

Section 4.2.3: The `value', `vector', `instance' and `multiplet' attributes seem overly complicated. The `value' attribute is not required in the 1.9.9 proposals because all properties have a value, namely the value they're being used to annotate. The other three seem artefacts of the `complex UCDs' which Tom is introducing in these proposals. These complex UCDs seem problematic to me because they seem tightly bound to VOTable. That destroys the orthogonality of the UCD and VOTable specs (the W3C has had _terrible_ trouble with non-orthogonal specs, tying itself in knots trying to resolve their dependencies on each other), and makes it harder to use UCDs in other contexts, such as queries. I feel that UCDs should be seen as annotating a `thing', whether that `thing' be a value, a column, a group, or a query `phrase', and it should be the responsibility of whatever defines the syntax of that annotation (that is, VOTable or SIA) to define precisely what the thing is that the annotation applies to. Thus, VOTable might say that when a UCD appears in a <field> then it indicates a set of relationships between the corresponding entries of the table; when it appears in a <group> it means something different; and so on. Dealing with the typing and complexity issues of this in a general way within the UCD spec would surely make it impossibly unwieldy and limit its scope. This is also a general worry for all of Tom's Section 5; I really think this should be out of scope for UCD, to the extent that Tom's ``The grouping does not describe the semantics of the relationship. That is the role of UCDs'' would be much better as ``The grouping describes (some of?) the semantics of the relationship. That is not the role of UCDs''. This is a can of worms.

Section 4.2.3 (local): I agree this is a gap in the 1.9.9 proposals. Another way of dealing with it would be to say that a UCD <word> `local.X' meant exactly the same as the <word> `X', but was not comparable with it.

More general points:

Tom's document seems to discuss his proposals in object terms. However the property-concept parts of the UCD proposal are _not_ an object model, and if you cram them into an object model, they won't fit, and the result will inevitably look like a mess, and look backwards. The model is simpler than this, however: things which are purely concepts (such as `src') don't have values. Concepts do have properties though, and these properties have numeric values, namely the numeric values we're trying to annotate with this UCD.

As regards ordering, yes, as Tom said, it doesn't fundamentally matter, and it's just a matter of syntax, rather than of the model. However having the property first seems natural, since it's this which posesses the numerical value which is being annotated, and so it's this which I would have thought it best would be shown up-front.

Now, there is a _vague_ object model implicit in the construction of the UCD words like `pos.eq.ra', but this is only because, along with the replacement of underscores with dots, came the explicit freedom to crop each word at a dot from the right, and use the result as a UCD word also. This prompts a natural perception of the words as hierarchical, or object-oriented if you must. The actual words are basically little changed from the original UCDs, though there's a review of these under way. These words weren't the main point of the UCD2 proposals.

At present these words are those mined from the column names actually occurring in the databases in the CDS collection; they are thus unprincipled. Whether this is a good or a bad thing is an open question. I'm sure it is this which causes some people (I'm thinking of Gerard Lemson and Pat Dowler) to gasp and, in their poster, pick out pos_eq_ra for special deprecation as incoherent. If you believe that principled generation of UCD words would be a Good Thing (and that would probably be my prejudice), then I suspect that paths in (say) Gerard and Pat's model would be a good way to do it (do Gerard and Pat claim that every UCD word is thus expressible?). If you believe, on the other hand, that the mined nature of the words is of primary importance (and I can see the force of that, too), then they might need little more than a review or tidy-up, to make sure that the `croppability' is reasonable in fact, and that the implications, or suggestions, of the words chosen do in fact fit in with a properties-based model (or whatever we end up with).

Phew! I think that's probably quite enough for just now -- I should let someone else get a word in.

All the best,

Norman

-- 
---------------------------------------------------------------------------
Norman Gray                        http://www.astro.gla.ac.uk/users/norman/
Physics and Astronomy, University of Glasgow, UK     norman-at-astro.gla.ac.uk
Received on 2003-10-22Z17:49:21