Re: Draft list of ucd-words

From: Norman Gray <norman-at-astro.gla.ac.uk>
Date: Wed, 29 Jun 2005 18:32:14 +0100

Greetings,

On 2005 Jun 29 , at 16.47, Rob Seaman wrote:

>> If a table is published in ApJ that defines the "altitude" of
>> their observatory (pos.earth.altitude), it is not for us to say
>> the writer was a fool, it is for us to make sure there is a UCD
>> for that concept.
>
> Right - and I agree with Steve that this should rather be called
> "height".

The Gene Ontology folk have described a few key ideas that helped them create a big and very successful ontology. Though GO has problems -- some quite fundamental -- non-ontologists do actually use it to do actual science. Some of these `rules' are rather obvious, such as `involve users', but Rob's message is a hook for a couple of the non-obvious ones.

GO uses opaque labels for every concept, so that pos.earth.altitude and pos.earth.height would both be ucd12345 or something. They do this (i) to avoid arguments about which noun it should be, (ii) so they can version them easily (when are we going to have fight about replacing pos.earth.height with pos.earth.height2? whereas no-one has a sentimental attachment to ucd12345), (iii) so users aren't tempted to use their intuition about what the labels mean, and are forced to use the project's tools to discover and resolve labels, and (iiia) to make it transparently clear to everyone that the label really doesn't matter -- it's just a string which keys to a careful description elsewhere.

I'm not suggesting replacing all the UCDs with numbers (don't worry), but suggesting that it might be best to avoid `obvious' names, and even that it might be a virtue than a vice to have names which are mnemonic but still vague enough to force folk to look up careful definitions _before_ they use them.

> But this begs the question of whether we need to provide UCDs for
> *both* height above the Earth's surface *and* distance from its
> center.

Agreeing with what I take to be Rob's argument, I think this is settled by an observation that these distinct concepts do or do not appear in multiple published tables, where `multiple' represents some vague threshold number less than ten but definitely more than one.

> And dozens of other questions for this one little identifier such
> as how (or whether) to express the shape of the earth such that
> "height" from one table means even approximately the same thing as
> "altitude" from another.

However I think any discussion about the geoid implies an attempt to push UCDs well beyond what I believe to be their valuably constrained scope. Thus the description of pos.earth.height should carefully _not_ say which datum it is relative to, and draw attention to the fact that it is not saying this!

This is because...

> By attaching the same UCD to values from two different tables we
> are asserting that for [all | most | many | some | any] purposes
> that these values represent identical "things".

It seems to me that the purpose in question is `find tables I might be interested in', and that the equivalence relation here is not one of identity. My perception is that the UCD word set has been successful, and should remain successful, to the extent that it abides by two principles: that the word list is small enough and mnemonic enough that folk are likely to remember a respectable proportion of the words, and that it service those use-cases where false positives are good. That set includes searching for tables, and excludes driving processing.

It might be a bit late to edit things now, but I have said before that I feel the UCD document would benefit from a bolder statement of its scope, and of its non-goals.

> The way to extend a robust UCD process into the future is to
> provide an explicit namespace mechanism from the start. This will
> allow capturing controversial or peripheral identifiers such as
> VOConcept, but will also allow us to wipe the etch-a-sketch clean
> with new version(s) of subject dependent list(s) of UCDs. If later
> we decide that v1:pos.earth.height should indeed have been
> v2:pos.earth.altitude, there needs to be a lightweight way to make
> the transition.

Hear, hear. One of the other GO maxims -- up there with `involve the users' -- was `version the ontology'. It seems the GO is in more-or- less continuous flux, but this simply doesn't matter, because they recognised at the outset that they were going to have to do this, and so designed their formalism and tools to cope/help with it.

I hope this helps. All the best,

Norman

-- 
----------------------------------------------------------------------
Norman Gray  :  Physics & Astronomy, Glasgow University, UK
http://www.astro.gla.ac.uk/users/norman/  :  www.starlink.ac.uk
Received on 2005-06-29Z19:32:41