RE: Format of tokens

From: Alasdair Gray <agray-at-dcs.gla.ac.uk>
Date: Mon, 12 Nov 2007 15:40:51 -0000


Hi,  

I would like to follow up on my mail earlier today. I have done more detailed analysis of the IAU93 Thesaurus, this time going back to the original source files and going through the terms one by one. (Yes, I'm a bit cross eyed now.) This has resulted in the following  

Number of

IAU original files

IAU93 Rick's SKOS Model

Terms1

2950

(No equivalent in SKOS)

Top level concepts

516

1720

Concepts

25522

2947

Alternative labels

398

8583  

1 This includes terms that become concepts and those which become alternative labels

2This total does not include those terms which declare a Use relationship. This is because these terms should appear in the SKOS model as skos:altLabel.

3This total probably includes declared synonyms and it probably is less important for them to be the same.  

(Note, I have not been able to do a full analysis of the relationships due to the format that the IAU is available in and the limits of time.)  

My concern is that there is a discrepancy between Rick's SKOS model generated by his script and the original files. My feeling is that the SKOS model representing the IAU Thesaurus that is to be published by the IVOA should be an accurate model. If we cannot produce an accurate SKOS model but claim that it is, then people will not trust the IVOAT or any of the semantics works involving vocabularies and ontologies.  

Specific issues that need to be addressed in the SKOS model:

Please see the appropriate thread in the semantics list for a full discussion of this issue.  

Once we have agreement on these issues, then the results can be applied to the IVOAT.  

Another issue that my analysis of the IAU Thesaurus has shown up today is that there is minor discrepancies between the text files distributed from http://www.aao.gov.au/lib/thesaurus.html and the web version available from http://msowww.anu.edu.au/library/thesaurus/english/. We should probably which we are using as the definitive version for our work. (This only affects at most half a dozen entries.)  

I'll make the lists of

available, once I've had a chance to put them into an appropriate format.  

Cheers (I think I'm going to go for a long drink to recover from this),  

Alasdair    

Alasdair J G Gray <http://www.dcs.gla.ac.uk/~agray/>

Research Associate: Explicator Project

http://explicator.dcs.gla.ac.uk

Computer Science, University of Glasgow

0141 330 6292  

From: owner-semantics-at-eso.org [mailto:owner-semantics-at-eso.org] On Behalf Of Frederic V. Hessman
Sent: 12 November 2007 14:37
To: IVOA semantics; IVOA VOEvent List
Subject: Re: Format of tokens    

On 12 Nov 2007, at 11:18 am, Alasdair Gray wrote:

        I think there might be a slight problem with your script that generates the vocabularies. If you look at any relationship, you will find that it points to itself rather than another concept.

Whoops! Fortunately, an easy fix. Everything should be ok now (only affected the RDF files).

I have done a quick analysis of the IAU93 and IVOAT vocabularies.

Number of

IAU93 IVOAT Top level concepts

1720

1203

Concepts

2947

2892

Broader relationships

1716

2307

Narrower relationships

1716

2307

Associative relationships

7647

8040  

The major difference between IAU93 and IVOAT (other than the deletion of some errors and inclusion of new concepts) is that 1) many top level concepts in IAU93 were removed by moving things to aliases or noting the obivous BT's and NT's and 2) the number of BT's perfectly matches the number of NT's.  

The number of top level concepts in IVOAT could easily be halved or more if we went to a modest bit of editing effort.  

Rick Received on 2007-11-12Z16:41:12