RE: Format of tokens

From: Alasdair Gray <agray-at-dcs.gla.ac.uk>
Date: Wed, 14 Nov 2007 13:05:25 -0000


Hi Rick, All,

Again, comments preceded by an [AG].

-----Original Message-----
From: owner-semantics-at-eso.org [mailto:owner-semantics-at-eso.org] On Behalf Of Frederic V. Hessman
Sent: 14 November 2007 11:24
To: IVOA semantics
Subject: Re: Format of tokens

On 14 Nov 2007, at 11:24 am, Alasdair Gray wrote:
>> Number of TopConcepts: 1325
> I do not agree with this figure (see next comment).
>
>> Thus, you can't assume that the BT's and NT's are all present in
>> the original (trex.txt). Alasdair's figure of 512 top concepts
>> assumed that the IAU thesaurus was reasonably complete and self-
>> consistent.
> I cannot claim to have looked closely at the BT/NT relationships in
> the original (trex.txt) file. However, the IAU thesaurus also
> issues a hierarchy file (hierlist.txt). This file gives the
> hierarchy of the original thesaurus and it is this that has 516 top
> level concepts. Rick has assumed that a top level concept is one
> that does not have a broader term. For the IVOAT I would agree with
> this as it should result in a less confusing hierarchy that matches
> users expectations. However, for the IAU93 this is wrong as it
> results in a different number of top level concepts (although I
> would have thought that it would have been less then 516 since some
> of these terms appear as narrower terms of other concepts) and thus
> a different hierarchy from the original version of the thesaurus.
Aha! I'm sure I'll leave this fine point to the experts, but I would have thought that a "TopConcept" is one which is at the top of a connection-hierarchy (after being chastened, I won't say "ontological"). If there is a concept "gummi bears" but no "BT candy" then the authors of the vocabulary have obviously left "candy" out for some reason, making "gummi bears" pretty top-level to me.

[AG] In principle I agree that top level concepts should not have a broader term. However, we are trying to accurately model the IAU 1993 thesaurus not correct it. Thus, the top level concepts should be those that appear in the top level in their hierarchy list, not what we feel should be a top level concept. We can correct this for the IVOAT :)

Or is my naivite showing? I assumed that hierlist.txt was simply their best attempt back when all of this was much more painful (yes, this project has now forced me to learn lots of python, as I intended, but at least I'm not doing this on paper or with an Intel 286 under DOS).

[AG] They used a thesaurus management system (LEXICON) to generate their files. Maybe I'm showing my naivety in trusting the output of this software. (Just because it is old doesn't mean that it is wrong.) We are merely trying to produce a mapping from the output generated by that tool into the new SKOS format.

Dropping the TopConcept links to entries with no NT's is trivial - is this the general consensus?

[AG] Alasdair

Alasdair J G Gray
Research Associate: Explicator Project
http://explicator.dcs.gla.ac.uk
Computer Science, University of Glasgow
0141 330 6292 Received on 2007-11-14Z14:05:49