Brian, hello.
On 2007 Nov 21, at 15:09, Brian Thomas wrote:
> On Wednesday 21 November 2007, Norman Gray wrote:
>> 2. The grammatical number of the concept names (singular or plural)
>>
>> It seems that english-language thesauri `traditionally' have concepts
>> labelled with plurals, whereas French and German ones typically have
>> concepts labelled with singular terms.
>
> With all due respect, which thesaurus are you looking at? The one
> on my
> shelf, "The Random House Thesaurus", published 1984, has only singular
> terms for nouns (concepts). I did a quick 8 or 9 page random
> survey, then I started
> looking for astronomical terms...all of the following appear singular:
Ah yes, I was quoting, there, and referring to thesaurus as an information retrieval thing, rather than thesaurus as a book of synonyms. This is (British Standard) BS 8723-2 (which overlaps with ISO 2788 and ISO 5964), section 6.4.1:
> Different traditions exist in different languages concerning the
> use of singulars or plurals. Indexers in some language communities,
> for example French and German, tend to prefer the singular form so
> that the user can approach the thesaurus or index in the same way
> as a dictionary. In English-speaking countries, however, it is
> usual to base the choice on whether a particular term is a count
> noun or a non-count noun. The latter convention helps to
> distinguish between a process such as painting, which can only be
> expressed in the singular, and the product of the same process, in
> this case paintings.
My argument here is not `there's a standard so we should follow it' (though I'm always a sucker for that sort of argument), but that there is a set of best practices (amongst them this singular/plural thing) well established by the folk who index things for a living, and I'm quoting these standards not as Standards, but as fairly precise expressions of these practices. In other words: where we have a choice, it seems sensible to follow these standards, if only on a principle of least surprise.
In other other words: I don't claim to be advancing an overwhelmingly strong positive argument here, but instead disagreeing with your counterargument.
Interestingly, that same standard defines 'thesaurus' as a:
> controlled vocabulary in which concepts are represented by
> preferred terms, formally organized so that paradigmatic
> relationships between the concepts are made explicit, and the
> preferred terms are accompanied by lead-in entries for synonyms or
> quasi-synonyms
> NOTE The purpose of a thesaurus is to guide both the indexer and
> the searcher to select the same preferred term or combination of
> preferred terms to represent a given subject.
(ISO-5964 Sect. 3.16 has a briefer, but compatible, definition)
This is interesting because, in its introduction, that document says:
> Whereas in the past thesauri were designed for information
> professionals trained in
> indexing and searching, today there is a demand for vocabularies
> that untrained users will find to be
> intuitive. There is also a need for search aids in contexts where
> “full text” is not available, such as museum
> collections and image databases. As the Internet and other networks
> allow simultaneous searching across
> resource collections that have been indexed using different
> vocabularies, there is a need to have the means
> of “translating” search queries across boundaries.
That is, here and implicitly throughout these various documents, there's the focus on thesauri as being for _search_, and for human- machine interactions, and that matches the actual uses of the A&A vocabulary (where also, all the concrete nouns are plural) and the AOIM one (singular), and the intended use of the IAU vocabulary (plural).
What thesaurus terms are _not_ about is machine understanding, and their semantics doesn't really help with that, and this is why the notion of broader/narrower has an operational definition ('all items returned by a query on a term will also be returned by a query on a related broader term') rather than a logical subclass relation.
> how do you label a single instance of a concept (for example,
> for later use in ontologies, creation of individuals becomes
> difficult)?
I think the answer is that you don't. Thesaurus/SKOS concepts are individuals, not classes, so that the vocabulary term 'stars' refers to the 'concept of stars' rather than the class of stars. Thus
> Lets try another.."find all concepts which are stars which have
> coordinates" :
>
> PREFIX ivo: <http://ivoa.net/vocab>
> describe $s where { $s a ivo:star . $s ivo:ra $ra . $s ivo:dec
> $dec . }
...should return nothing at all, because the 'concept of stars' doesn't have an RA and Dec (only stars have those, and the 'concept of stars' is not itself a star). Thus the sort of query you can imagine is
prefix rm: <...registry_metadata#>
prefix vocab: <...vocabulary#>
select $resource
where {
{ $resource rm:keyword vocab:stars }
UNION
{ $resource rm:keyword ?kw .
?kw skos:broader vocab:stars .
} }
(in the context of some rule which has skos:broader being transitive).
This is, I realise, returning to the older discussion of vocabulary vs. ontology.
You also said:
> I feel that by defining terms in the plural, we would be crippling
> any machine
> use of the document (which is my understanding of its primary
> reason for being)
But that's what an ontology's for, not a thesaurus. It's clear that an ontology for a subject area would allow different and valuable functionality, and it's clear that there would be close links between an astronomy ontology and, for example, the IAU thesaurus (in whichever version...), but it is also clear that a thesaurus is addressing a distinct, human-centred, and relatively simple problem, as distinct from the machine-centred problem that would need a full ontology.
>
>> That's according to ISO-5964.
>
> The copy I found is "Guidelines for the establishment and
> development of multilingual thesauri"
> which I checked out at : http://www.collectionscanada.ca/iso/
> tc46sc9/standard/5964e.htm#3.
> I tried looking at this for insight, but a quick read didn't reveal
> any information germane to
> the plural vs singular concept definition issue. Can you give a
> better pointer?
ISO-5964 Sect. 11.1.3 talks about this (but appears not to be included in that excerpt; bah! I wish ISO didn't seek to profit from their blasted standards, and sell them for what appears to be £2/$4 per page!).
[There's a quite separate potential problem here in that UCDs are not really concepts, but types. I wonder if this will get us into trouble in the future....]
All the best,
Norman
-- ------------------------------------------------------------ Norman Gray : http://nxg.me.uk eurovotech.org : University of Leicester, UKReceived on 2007-11-21Z18:33:36