Re: Draft draft 0.04

From: Rob Seaman <seaman-at-noao.edu>
Date: Thu, 7 Feb 2008 14:38:59 -0700


> I've put a vocabularies-0.04 at <http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies
> >, with the corresponding issues list at <http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/issues
> >. The latter has been extended by a couple of issues that seemed
> to arise in the last few days, and has the first two issues
> [masterformat-1] and [distformat-2] marked as provisionally resolved.

Let's see if my comments here can resolve an issue [dowereallywanttodefineeverything-1] before it can be added to the list :-)

> Question: Do you agree that I've included adequate mention of the
> issues of the last couple of days? If there's an issue you thought
> was live but which isn't acknowledge here, please do say (I've an
> uncomfortable feeling I've forgotten something important).

Looks good!

Ok, first a general statement. Humans layer language with multiple meanings. Computers are completely literal. We aren't the only community to face the dilemma of resolving this. We are, however, the only community with our particular mix of use cases. Our resolution of the issue of multiple definitions may differ from other communities.

The nature of astronomical issues will often simplify the problem. Our users are unlikely to make a fuss about distinguishing between focal plane images and FITS images, or between the coma of a telescope and the coma of a comet - to reference Ed's examples. Most of these straightforward synonyms will sort out in the hierarchies of UCDs or the separation of the problem into different vocabularies for solar system objects, instrumentation and data formats, for example.

On the other hand, facilities such as VOEvent (and technologies such as ontologies) exist precisely to help people discern and convey subtle distinctions. There is a hard version of the issue of synonyms that I don't think we can wish away. Ed's dissection of the wikipedia definition of GRB does a good job of indicating where those boojums come in.

The v0.04 text includes this statement, "The purpose of a thesaurus is to guide both the indexer and the searcher to select the same preferred term[s]". I think this need not be spelled out in the document. The roles of indexer and searcher don't map onto all the roles of interest to the VO community - and the notion that this is the sole purpose of a thesaurus is trivial to discount not just by appealing to Roget's authority (i.e., his "say so" or "countenance"), but also simply since different users may well be drawing from different vocabularies.

Rather, a key interest in thesauri is to provide mappings from source documents back to whatever controlled vocabulary is of interest for a particular use case. That is - precisely to separate the terms selected by the indexer from those chosen by the searcher. For example, I don't want to have to understand a VOEvent publisher's preferred term - and with the right thesaurus I may not ever become aware of what term the publisher actually used.

Today's brouhaha was about definitions. The Turtle #spiralGalaxy example from v0.04 has:

        skos:definition """A galaxy having a spiral structure."""@en;

This is a good example itself of my point about multiple definitions. The brouhaha was about one meaning of the word "definition" - an extended description, or as Marcus Aurelius says:

	"Ask yourself, what is this thing in itself, by its own special
	constitution? What is it in substance, and in form, and in matter?
	What is its function in the world? For how long does it subsist?"

(Also Hannibal Lector to Agent Starling in Silence of the Lambs.)

The meaning in SKOS, however, is more like "a brief description to distinguish one token from another".

To be blunt, I don't give a kangaroo rat's ass about the latter issue. (NB - a "kangaroo rat" is neither a "kangaroo", nor a "rat".) I find a definition that merely restates the obvious (a #spiralGalaxy is a spiral galaxy) to be without utility - but perhaps others disagree. (Also, isn't distinguishing terms the point of mappings like narrower, broader and related?)

Regarding the substantive issue of providing detailed, scientifically precise, consensus definitions for these thousands of terms, we simply do not have standing. For some limited set of terms, i.e., "planet", some commission or other of the IAU has spoken. For others, any random VO user may have more authority than this WG to speak to a term's definition. (see below under section 2.2)

Specific comments:

Title: ok

version: bump it up to something like 0.94 or nobody will take it seriously

authors; good job guys!

editors: we need to identify these ASAP, the document doesn't need much work IMHO

Abstract: good and to the point

Status: let's promote this to a public working draft ASAP

TOC: didn't verify

Intro: excellent analysis of the problem

1.1: ok

1.2:

Replace "<what/> element" with "<Why/> and <What/> elements".

Replace "Gamma Ray Burst" with "gamma-ray burst". Need to identify and follow a general policy for terminology, especially in a document about vocabularies :-)

Delete the word "modishly". No need to be snarky about folksonomies.

Delete ", with its systematising instincts, and aware of the benefits of standardisation,"

Replace "supernova1a" with " type 1a supernova".

BTW, how are plurals handled in SKOS? Meaning - I understand we're supposed to pick a coherent naming scheme, either plural or singular, but is there some way to recognize SNe as referring to multiple SN? Or is my question ill-posed?

SIMBAD is sometimes SIMBAD and sometimes Simbad. Which is it?

1.3: ok

2: good

2.1:

Delete "NOTE: The purpose..." or perhaps expand the following description.

I don't think "a vocabulary (SKOS or otherwise)..." needs to be bolded.

2.2:

Shouldn't be shy about announcing a preferred format (XML versus Turtle).

Further discussion about definitions:

A SKOS entry may contain a "definition for the concept, where one exists in the original vocabulary". This notion permits the many-to- many mapping we need. A particular vocabulary may include the limited "I am a spiral galaxy" type of definition. A separate vocabulary may include a concept with a more fleshed out definition, perhaps including (or simply plagiarizing) links covering all of what astronomical science has to say about the birth, life and death of spiral galaxies. Cross-linking the concepts provides enough slack to say whatever one wants.

2.3:

I remain a bit unclear about where the equivalences are going to live. What document will contain the example "iau93:#SPIRALGALAXY map:exactMatch ivoat:#spiralGalaxy"? I suggest the need for some document external to both iau93 and ivoat to make this equivalence. How are multiway equivalences conveyed?

3: ok

3.1: ok

3.2:

#5 - typo "defintions" (saw a similar typo somewhere else in the document)

#8 - Again, does the publisher of iau93 map to ivoat, or does ivoat map to iau93, or both, or one or more third parties?

4: ok (meaning looks like you guys did a vast amount of work)

Appendices: ok

Bibliography:

If the UCD authors are listed, so should the VOEvent authors.

Issues doc: Good job! Seem to be well in hand.

Rob Received on 2008-02-07Z22:39:22