UCDs vs ontologies?

From: Rob Seaman <seaman-at-noao.edu>
Date: Wed, 1 Jun 2005 08:32:36 -0700


On Jun 1, 2005, at 6:54 AM, Pierre Didelon wrote:

> But do/must UCD be the solution to (all?) natural langage parsing?

It would be premature to begin the work of the board by willy-nilly reassigning identifiers such as "em". It seems likely that we will need to be able to express the concepts of both "electromagnetism" and "emission". If there is emission, there is absorption. And sometimes we may want to express the combination of "electromagnetic emission" at the same time.

On the other hand, we're told that the role of UCDs is distinct from that of ontologies. An ontology is an (attempt) at expressing the complete range of some knowledge domain. Astronomy is a big subject - its ontology will be big. Perhaps by analogy we can view an ontology as the unabridged dictionary for some subject, whereas UCDs are simply one way to build a glossary for a specific purpose. Glossaries are often small enough to be appended to a brief document.

Personally, I think the VO community will need to develop several separate ontologies over time as well as several separate glossaries of UCDs or UCD-like constructs. It is not obvious that a glossary of UCDs for tabular convenience is equivalent to a glossary of UCDs for VOEvent convenience. An ontology can afford to be large and unwieldy to reach its goal of being complete and accurate. A UCD style glossary, on the other hand, will eventually reach an optimum size. Its utility will pass a point of diminishing returns. Too much precision engenders confusion. The availability of too many options results in overlapping shades of meaning.

I gather the current list of UCDs was generated by looking at actual tables in the literature. This is just how the unabridged OED was created from words sieved from millions of quotations. Just like a dictionary, the work of maintaining the list of UCDs will continue indefinitely as new tabular usage is coined.

I would suggest that the creation of this new list of UCD-like entities to describe astronomical "concepts" is fundamentally a different exercise. We may not be trying to generate a complete ontology with all interrelationships clearly drawn between all concepts, but we are trying to be complete in the sense of not leaving any gaps in the web of concepts. "Star" and "galaxy" will clearly make the final cut. "Star.white_dwarf" and "galaxy.spiral" most likely, too. But it won't take many levels to exhaust the utility of compiling such a list. I expect the final list to have hundreds of entries, not tens of thousands.

One final point. The nature of this board is to participate in the process of certifying an official list of terms. I think the true utility of both glossaries and dictionaries will be achieved when facilities are available for creating and maintaining *unofficial* lists. For VOEvent, for instance, it seems likely that each project publishing events will adopt its own glossary pertinent to its own instrumentation and observations. We should support these activities and provide a framework for project specific glossaries. They will spring into existence whether or not we do so. At least if we support the creation of project specific glossaries, we can have some say in controlling a common semantic structure and a standardized distribution mechanism. This might also naturally lead to the next step of layering UCD glossaries on top of our emerging ontology (ies). A glossary, after all, is nothing but a well chosen list of words out of the dictionary. It is the dictionary that provides etymology, synonyms and antonyms, classification by part of speech, tenses and gender, pronunciation, ...

Sorry for the cross-posting. If we can't restrain ourselves from generating all these mailing lists, I'm not sure what hope we have for a coherent set of UCD lists :-)

Rob Seaman
NOAO Received on 2005-06-01Z17:33:23