Hi Norman,
Thanks for your comments. I've got some responses
for some, I'm glad you found some good in this!
In keeping with your suggestion
I've only sent this to the UCD group. I sent
the original proposal to the other groups
since I thought the use of complex UCDs is relevant,
but I'm sure we can live without three copies and hopefully
they have subscribed so they can hear this interesting(!?)
debate.
Tom
Norman Gray wrote:
...
>
> Most urgent, I think, is Tom's discussion, in his section 4.5,
... That was really meant to be at the end of section 4, rather than
part of 4.5...
>... of the
Well there are two distinct issues here.
First lexically how can I tell what is a property and what is a concept? In 1.9.9 there is no way to tell. The only lexical statement it makes is that the first word is a property. What about the second word, or the third? The 1.9.9 proposal allows properties that modify other properties.
The second issue is the semantic confusion.
The second word can be (according to section 4.2 in 1.9.9)
a concept referred to
another property referred to
information related to the primary word.
In section 3.4 in defining the error in RA of a galaxy we have the phrase
We identify the central property as "error", and the concept as right ascension", with a subsidiary word about "galaxy".
So in fact version 1.9.9 has all three of concept property and modifier but just tries to hide the fact and doesn't give you any way of telling which is which... Suppose I give one a UCD of
word1;word2;word3
Does word3 modify word1 or do both of them modify word1? No way to tell.
stat.error;phot.flux;em.optical (word3 modifies word2) or
phot.flux;em.optical;src.galaxy (word3 modifies word1)
There is an explicit statement that some things (pos.eq.ra) are sometimes concepts and sometimes secondary words and the rules make it trivial to build UCDs that are simply incomplete semantically.
stat.error
is a perfectly valid UCD but it has no rooted semantic content. You may say
"Well what about meta.id, isn't that the same?" Not really because
I can suggest (and indeed I do suggest) that UCDs need to be
interpreted in the context of what the table is about. So for a
source table meta.id refers to the id for a source, for an observation
table meta.id refers to the id for a source. But what does
stat.error refer to.... The error in the source? That doesn't make
sense.
>
> Section 3.1 in the 1.9.9 proposals -- the crucial section of the document,
> for which everything else is to some extent just scaffolding, and without
> which the rest of the document makes rather less sense -- is what attempts
> to describe this. Perhaps that explanation needs work. At any rate, I do
> not believe that one has to sign up to the (basically ontology-inspired)
> language in that section in order to use the UCDs thus justified.
> Indeed, it might be useful for that section to be split into two, one to
> communicate the underlying idea to folk who simply want to _use_ UCDs,
> and another to reexpress it more formally for the ontology enthusiasts.
>
> In his section 4.5, Tom also remarks that ``Indeed I'm not sure that any
> string of words can be determined to be illegal in the old scheme''.
> I'd probably agree in outline: there are significantly fewer rules
> necessary in the 1.9.9 proposals than in Tom's proposals. The only place
> a base concept can go is in the right-most position, and thus you can't
> have a concept sitting on its own, since the left-most position is the
> name of the property, the value of which is the number/column/whatever
> which has been annotated by this UCD (the syntactic mechanism for making
> that annotation is outside of scope for the UCD proposals, I'd think).
> Also, there are some property-concept pairs that make no sense, such
> as stat.err;src. But that's about it -- you don't need any more rules
> than that.
>
> Tom constructs an `arith.diff;arith.sum;phot.flux;...' UCD. That does
> look unwieldy (but note there's no need for parentheses in the 1.9.9
> proposals), but I get the impression that the `arith' UCD tree was to
> some extent a kite being flown, and I for one would be surprised if it
> made it much beyond this version, partly because it would seem to encourage
> such odd-looking UCDs. Also, there's no tying of one table to another
> in the 1.9.9 proposals -- I'd think that was out of scope for UCD (and
> quite properly so: I'll mention this below).
>
Sorry that's a typo... Should have said tying of one column to another.
> The 1.9.9 proposals allow no ambiguity in the way that UCDs are
> written: properties queue up in front of the single base concept, and
> ordering matters, so that stat.max;stat.err;phot.flux is different
> from stat.err;stat.max;phot.flux.
What I call attributes and the properties you specify here are indeed largely unambiguous in both cases. However what I call modifiers and what 1.9.9 calls either subsidiary words or 'information related to the primary word' are less clear. E.g., suppose I'm detecting circularly polarized light in the radio. That natural UCD for this would be:
phot.flux;em.radio;em.polarized;circular or is it phot.flux;em.polarized;circular;em.radio or do we have to multiply the size of the vocabulary to add polarized and polarized.circular (and the other variants) to every wavelength spec we have? That seems silly...
So we need to fix that.
There's a similar problem with arith (another nail for the coffin perhaps) is it arith.sum;property1;property2 or arith.sum;property2;property2
And this general idea that properties can refer to other properties in an uncontrolled way...
Here's a UCD describing the flux of galaxies...
phot.flux;em.optical;src.galaxy
or is it
phot.flux;src.galaxy;em.optical
?
E.g., suppose I have a column that is the maximum flux from any of three wavebands. Can I write
stat.max;flux.phot;em.optical;flux.phot;em.xray;phot.flux;em.radio
I hope not, but the document seems to encourage it. This would be illegal in the revision since it includes three base concepts.
>
>
>
>
> More specific points in Tom's proposal, in document order rather than
> any other (section references are to Tom's document):
>
> Section 4.1: Bringing the number of terms up to three -- concept,
> attribute and modifier -- reminds me of the qualifier/modifier idea
> that was in previous versions of the draft, which I still think is an
> unstable distinction, and which Roy and Sebastien thankfully managed
> to get rid of by simplifying the syntax down to just concept plus
> properties (but see below).
... but they haven't, they have just not told you about the difference. The words stat.max, em.optical, and phot.flux have distinct grammar rules in how they are used in 1.9.9 but you have no way to tell that.
I.e.,
phot.flux can appear as he initial word or any subsidiary word but can never appear before stat.max or any other word of the class I would call attribute.
em.optical can appear after words of the same class as phot.flux and possibly after words of the same class as itself.
stat.max can appear anywhere in a UCD but it really should appear either before either a word of the class of phot.flux or a word of its own class.
There are three kinds of words and we should just recognize that in the grammar.
Also, there's no syntactic distinction
> between modifiers and attributes, so in order to apply the extra
> ordering rules for those, or even to break the UCD into its three
> parts, you have to know which words are of which type. That is, you
> can't do it at parse time.
Sure you can. At least if the number of modifiers remains small. Note that table writers should have access to appropriate documentation when writing their tables (or writing the software that writes tables) so even if it gets more complex the writers have no problems and the readers don't care. See my response to Bob on this issue.
I've suggested that all modifiers be put in the frame tree, though largely
to address this issue.
>
> Section 4.1.2 (not an important point, I don't think): I'm puzzled at
> the requirement that words in the non-standard namespace must be
> distinct from all words in the IVOA namespace. The point of having a
> namespace is to make this possible, or (since such duplication would
> surely be condemned as bad practice) at least not an error. The rule
> also means that if a new word were added to the IVOA namespace which
> happened to match a word in a private namespace, the namespaced UCDs
> would thereby suddenly become invalid, with no change in the spec.
>
This idea is copied from the previous proposal. I think the idea is that we don't want proliferation of new uncontrolled UCDs. I put this in a separate section, but I believe the content is the same as the previous proposal. I leave it to others to decide which is right.
> Section 4.2.2: The `intent' modifier has no corresponding notion in
> the 1.9.9 proposals, but it's not clear to me where in those proposals
> this would fit in, and I think this is a _problem_ for the 1.9.9
> proposals. I can see how it would fit in to what I take the
> underlying 1.9.9 model to be, but not into the serialisation of that
> model that the 1.9.9 syntax represents. I can see three approaches to
> this problem within the general framework of the 1.9.9 proposals. (i)
> Rule it out of scope: it's not UCD's problem to talk about what values
> are intended to be, since they're only for data discovery, and are not
> required to be capable of driving analysis, so that if this `intent'
> distinction matters to you, you're going to have to understand the utype
> somehow.
That's not acceptable for observation tables. We frequently have multiple columns in a table which differ only in intent (proposed and actual exposure times), predicated and actual times of events, predicated and actual fluxes and we need to know which to use for various purposes. Spectral fitting will be sadly served if we can't put distinguish the calculated and actual spectra. What happens when we want to compare simulated and real data?
(ii) Add modifiers like this to the 1.9.9 model and syntax:
> that's potentially quite a lot of work, since it would require
> thinking very clearly about just what the distinction is between
> modifiers and properties, _and_ working out a usable syntax for adding
> them in -- they _have_ to be distinguishable at parse time.
I'm happy to trade intent for frame.human and put all the modifiers in frame.
(iii)
> Think about it more and discover a way they can be viewed as
> properties in a principled way. The point isn't just about this
> `intent' modifier: if we can convince ourselves that there are things
> like `intent' (and that they're in scope) which are in principle
> qualitatively distinct from properties (and I would at least dispute
> that `em' and `frame' count here), then that has to be dealt with.
> Perhaps this example will help us find the stable distinction between
> `qualifiers' and `modifiers' that escaped us in earlier versions.
Personally I take a modifier as something that limits the context of
a concept.
>
> Section 4.2.3: The `value', `vector', `instance' and `multiplet'
> attributes seem overly complicated. The `value' attribute is not
> required in the 1.9.9 proposals because all properties have a value,
> namely the value they're being used to annotate.
The word value is the price I pay for making sure attributes, concepts are distinct. Personally I think it's worth it.
The other three seem
> artefacts of the `complex UCDs' which Tom is introducing in these
> proposals.
Vector is not... It's simply to warn the user that the column has is a vector. While VOTables have a array attribute that does this, I don't want to tie this proposal to VOTables... More on that below.
.These complex UCDs seem problematic to me because they
> seem tightly bound to VOTable. That destroys the orthogonality of the
> UCD and VOTable specs (the W3C has had _terrible_ trouble with
> non-orthogonal specs, tying itself in knots trying to resolve their
> dependencies on each other), and makes it harder to use UCDs in other
> contexts, such as queries. I feel that UCDs should be seen as
> annotating a `thing', whether that `thing' be a value, a column, a
> group, or a query `phrase', and it should be the responsibility of
> whatever defines the syntax of that annotation (that is, VOTable or
> SIA) to define precisely what the thing is that the annotation applies
> to. Thus, VOTable might say that when a UCD appears in a <field> then
> it indicates a set of relationships between the corresponding entries
> of the table; when it appears in a <group> it means something
> different; and so on. Dealing with the typing and complexity issues
> of this in a general way within the UCD spec would surely make it
> impossibly unwieldy and limit its scope. This is also a general worry
> for all of Tom's Section 5; I really think this should be out of scope
> for UCD, to the extent that Tom's ``The grouping does not describe the
> semantics of the relationship. That is the role of UCDs'' would be
> much better as ``The grouping describes (some of?) the semantics of
> the relationship. That is not the role of UCDs''. This is a can of
> worms.
I think this is completely wrong. The grouping proposal has no special
relationship to VOTables other than that they happen to support it.
[Or they may soon!] Any other
structure that supports groupings of tables would do just as well.
This is a fairly natural attribute of object relational as well as
hierarchical databases. It just that VOTables have finally decided
to enable the natural abilities that XML's hierarchical structure
supports.
>
> Section 4.2.3 (local): I agree this is a gap in the 1.9.9 proposals.
> Another way of dealing with it would be to say that a UCD <word>
> `local.X' meant exactly the same as the <word> `X', but was not
> comparable with it.
>
That's essentially what my proposal mod the order difference.
>
>
>
>
> More general points:
>
> Tom's document seems to discuss his proposals in object terms.
> However the property-concept parts of the UCD proposal are _not_ an
> object model, and if you cram them into an object model, they won't
> fit, and the result will inevitably look like a mess, and look
> backwards. The model is simpler than this, however: things which are
> purely concepts (such as `src') don't have values. Concepts do have
> properties though, and these properties have numeric values, namely
> the numeric values we're trying to annotate with this UCD.
Sounds like objects and attributes to me...
What's the difference here?
But the old proposal doesn't agree anyway!
Is phot.flux a concept? Seems like it to me. But it's
a property in 1.9.9. Sometimes... In the UCD phot.flux.
But it's sort of a concept in stat.err;phot.flux
Or is it a property there? I'm not sure and there is no
way to tell!
By specifying a value attribute I've cleared away this confusion.
>
> As regards ordering, yes, as Tom said, it doesn't fundamentally
> matter, and it's just a matter of syntax, rather than of the model.
> However having the property first seems natural, since it's this
> which posesses the numerical value which is being annotated, and
> so it's this which I would have thought it best would be shown
> up-front.
This is not critical, but since I believe the model is
analogous to the object/attribute relationship using the
same order that has conventionally been used there is helpful.
>
> Now, there is a _vague_ object model implicit in the construction of
> the UCD words like `pos.eq.ra', but this is only because, along with
> the replacement of underscores with dots, came the explicit freedom to
> crop each word at a dot from the right, and use the result as a UCD
> word also. This prompts a natural perception of the words as
> hierarchical, or object-oriented if you must.
Well I don't have to but I sure would like to!
.. The actual words are
> basically little changed from the original UCDs, though there's a
> review of these under way. These words weren't the main point of the
> UCD2 proposals.
>
> At present these words are those mined from the column names actually
> occurring in the databases in the CDS collection; they are thus
> unprincipled. Whether this is a good or a bad thing is an open question.
> I'm sure it is this which causes some people (I'm thinking of Gerard
> Lemson and Pat Dowler) to gasp and, in their poster, pick out pos_eq_ra
> for special deprecation as incoherent. If you believe that principled
> generation of UCD words would be a Good Thing (and that would probably
> be my prejudice), then I suspect that paths in (say) Gerard and Pat's
> model would be a good way to do it (do Gerard and Pat claim that every
> UCD word is thus expressible?). If you believe, on the other hand, that
> the mined nature of the words is of primary importance (and I can see
> the force of that, too), then they might need little more than a review
> or tidy-up, to make sure that the `croppability' is reasonable in fact,
> and that the implications, or suggestions, of the words chosen do in
> fact fit in with a properties-based model (or whatever we end up with).
>
>
As in 1.9.9 I didn't build a complete list but I agree that most words will transfer between the two proposals. Received on 2003-10-22Z20:20:11