Re: Vocabularies: next steps

From: Frederic V. Hessman <hessman-at-astro.physik.uni-goettingen.de>
Date: Tue, 27 Nov 2007 12:04:50 +0100


>> Set: a-Z, 0-9
> You're quite right. I meant the concept URI: the concept fragment
> should I believe/agree, be drawn from [a-z0-9], though I wouldn't
> push very hard against [a-zA-Z0-9]. The prefLabel and altLabel
> fields should be Unicode.
>
> [AG] I would probably argue for [a-zA-Z0-9]

"a-Z,0-9" was meant to mean exactly this. By now, I think we can all agree on this.

>>> The number of top concepts in the IAU thesaurus
>> Huh? The IAU thesaurus is the IAU thesaurus. If "top concepts"
>> are defined either as 1) not having a BT or 2) having a NT, then
>> the number is already fixed. Basta.
> [AG] I still feel that for the IAU93 Thesaurus we should adopt the
> list
> of tokens given in the web version. However, I agree with Norman that
> the top concepts are there to aid the navigation and for no other
> reason. When it comes to the IVOAT, I would think that the top
> concepts
> are those that do not have a BT.

For simplicity and consistency, I would argue that we define "top concepts" as those not having a BT.

This should be part of the IVOA vocabulary guidelines, e.g. (here's my first cut)

  1. A single SKOS document defines the vocabulary and must be publically available at some URI, preferably at the central IVOA vocabulary repository http://www.ivoa.net/????? at least as a copy.
  2. A concept token has the form

                        {URI-root}{vocabulary-name}#{token}

        where the token should consist only of the letters a-z, A-Z, and the numbers 0-9. The URI root and vocabulary

        name should be set centrally and not in the definition of each token. For example, if a nominal concept is

                        http://www.ivoa.net/Thesauri/Food#Apple

        (root="http://www.ivoa.net/Thesauri/", name="Food", token="Apple"), then the SKOS definition begins with

                        <skos:Concept rdf:about="#Apple">

3. One is encouraged to use human-readable forms for the tokens with some obivous connection to

        the preferred labels, e.g. conversion from the label via dropping characters not included in the

        above list and sub-token separation via capitalization (e.g. "My favorite idea-label #42" ->

        "MyFavoriteIdeaLabel42")

4. Vocabulary entries should be singular unless based on previously determined sources where the

        conversion to singular forms would impare the usefulness of the vocabulary.

5. Thesaurus entries (BT/NT/RT) are encouraged but not required.

6. If thesaurus entries are included, they should be complete (all BT links are reflected in corresponding

        NT links in the referenced entries).

7. "TopConcept" entries should normally be those not having a BT reference but the maintainers of

        a vocabulary can decide to restrict the choice of TopConcepts if appropriate.

8. Use of standard SKOS documentation is encouraged but not required: e.g.

	scopeNote		to clarify usage
	historyNote		to identify when the vocabulary entry was created
	changeNote		to identify changes in already created entries

9. The maintainers of a vocabulary should provide on-line documentation permitting the easy perusal of labels

        and any thesaurus and usage information. The IVOA will try to maintain a list of links to known vocabularies

        and may choose to provide it's own consistent on-line documentation based on the SKOS files alone.

  1. The maintainers of a vocabulary should attempt to cross-reference their vocabulary with one or more IVOA supported vocabularies, e.g. UCD1 and/or IVOAT.

Anything else? Having just Ten Commandments would be nice.

>>> The grammatical number of the concept names (singular or plural)
>> Singular, please! - it's a real pain to use the formal system of
>> singular concepts and plural countables and I agree that singular
>> should make the vocabulary simple to use
> I think this is also a non-issue. If a term is plural in the
> vocabulary we're adapting (IAU93 and A&A use this convention) then it
> should remain plural in the SKOS version, otherwise we're making
> gratuitious changes; if it's singular in the original vocabulary
> (AOIM) then it should remain singular in SKOS, for the same reason.
>
> [AG] The issue raises its head when it comes to the IVOAT. However,
> since this is based on the IAU93 thesaurus we could, as I believe
> is the
> case, just adopt the IAU93 practice.

No, in fact I want to remove the plural terms from IVOAT as soon as possible (I finally got to this point in my list of things to do). Any complaints?

External vocabularies like IAU93, AOIM and A&A are pre-defined and so are what they are. With IVOAT, we can choose to have what we want.

>>> I wouldn't want to bet which of the vocabularies will end up the
>>> most
> useful in the end...

Well, the whole purpose of IVOAT is to create something useful. If we're already failing, please tell me so I can stop now...... :-(

>> Interrelationships:
>>
>> Tricky question: we don't want to refer too much to IAU93,
>> because the suggestion will be that it's useful (which it really
>> isn't) and UCD1 really doesn't cover very many concepts contained
>> in the above vocabularies. Stationary targets like the first list
>> are admittedly much easier to do, but I've already started to
>> connect IVOAT and UCD1, which is a good exercise since they are
>> only partially matchable. IAU93 and IVOAT are so closely related -
>> even with the syntactic and content cleanups - that one could
>> automate that connection without too much trouble.
>
> I'm with you on the potential for trickiness. However, it might be
> simpler than this. Perhaps we should just declare as many
> correspondences as we can, and see if a reasoner agrees the result is
> consistent.

Sounds like a good idea to me: we stick in whatever we can manage and see if anybody notices/benefits. This is why I would like to test the UCD1<-->IVOAT connection so that one can ask questions like "I've got an UCD1 label in my VOTable - is there an IVOAT entry which would enable me to put it into a more general context?" or "I've got something easily described by an IVOAT token - can I trivially put this in a VOTable using some UCD1 label?". Andeas is interested in getting the A&A vocabulary convertable to some other vocabulary to show that, e.g., the MNRAS or ApJ vocabularies can be shown to be equivalent at some stage - the question is only what intermediary vocabularies are usable (we've been praying that IVOAT as the replacement of the SV would be this medium, since the others are not good/extensive enough).

Rick Received on 2007-11-27Z12:05:09