Re: Draft list of ucd-words

From: Rob Seaman <seaman-at-noao.edu>
Date: Wed, 29 Jun 2005 08:47:17 -0700


Hi Roy,

> I see questions on this list about precise definitions of UCD, and
> I do not think it is relevant. We are not building a self-
> consistent data model of astronomy, but rather we are pragmatically
> taking the list of quantities that real astronomers have used in
> 5000 real tables at Vizier.

That's precisely my point. If the idea is merely to construct a complete (and indeed, consistent) list of the minimal number of identifiers necessary to capture some large (and growing) set of tables from the astronomical literature - well, then, what is the point of discussion at all? Harvest all the identifiers, correct typos, remove synonyms and you're done. Rather, we appear to be chartered to do something different - to construct a poor man's ontology. Certainly UCDs are often offered up as the pragmatic path out of ontological jungles.

> If a table is published in ApJ that defines the "altitude" of
> their observatory (pos.earth.altitude), it is not for us to say the
> writer was a fool, it is for us to make sure there is a UCD for
> that concept.

Right - and I agree with Steve that this should rather be called "height". But this begs the question of whether we need to provide UCDs for *both* height above the Earth's surface *and* distance from its center. And dozens of other questions for this one little identifier such as how (or whether) to express the shape of the earth such that "height" from one table means even approximately the same thing as "altitude" from another.

> Similarly, the STC is irrelevant here. If astronomical tables
> contain RA and Dec, then there should be pos.eq.ra and pos.eq.dec.
> It is not for us to say that RA and Dec do not always have
> unambiguous definitions.

But naming has the implication of defining. By attaching the same UCD to values from two different tables we are asserting that for [all | most | many | some | any] purposes that these values represent identical "things". Certainly VO users will take it this way. Why else provide common class identifiers other than to assert a common class?

We are clearly seeking some semantic middle ground between instantiating a separate UCD for each column of each table (e.g., Hertzsprung.1947.fig_1.pos.eq.ra) and the evanescent dream of a full astronomical ontology. If that is not the case, I would argue that rather than seat a committee to decide these issues that some purely mechanical process be sought to convert the column headings from each month's new batch of tables from the literature into UCDs.

I think we are precisely building a simple kind of data model of astronomy. It is indeed intended to be self-consistent, but certainly not to be complete (either in extent or expressive ability). Ideally we will understand the holes in the data model as well as the consensus descriptors. In general, holes should be left where simple UCD expressions don't suffice. We will fail if we aim too close to completeness. We will also fail if the UCD "data model" (or whatever folks want to call it) diverges too far from a coherent expression of the domain of astronomy.

The quickest way to wrap up the current exercise is to severely limit the list of UCDs to only contain the least controversial. If columns from 5000 tables have left holes in the list of UCDs, one might expect that this reflects underlying difficulties of astronomical semantics that we will not more easily overcome.

The way to extend a robust UCD process into the future is to provide an explicit namespace mechanism from the start. This will allow capturing controversial or peripheral identifiers such as VOConcept, but will also allow us to wipe the etch-a-sketch clean with new version(s) of subject dependent list(s) of UCDs. If later we decide that v1:pos.earth.height should indeed have been v2:pos.earth.altitude, there needs to be a lightweight way to make the transition. This is also the way to ensure a speedy process in the future. The alternative is a heavyweight process such as FITS standardization - talk about it for ten years and still leave some folks unhappy and eager to violate the standard.

Better an incomplete list of UCDs than an incorrect one. Better two lists than one.

Rob Seaman
NOAO Received on 2005-06-29Z17:48:06