Re: handling metadata with multiple values

From: Alberto Micol <Alberto.Micol-at-eso.org>
Date: Wed, 13 Aug 2003 11:53:45 +0200

Going extreme ...
And, for sure, I will regret this.

Naive question: What's wrong with the following syntax

<UCDList>

<x1/>
<x2/>

</UCDList>

where x1 and x2 are two UCDs ?
After all the UCDs are a well known set of words, and nobody is allowed to invent his/her own UCDs, isn't it ?
I don't know enough of SAX and DOM, so I will appreciate if the gurus can explain me the pros and cons of this.

And with such scheme, I can also add a value to each ucd:

<UCDList>

<x1> val1 </x1>
<x2> val2 </x2>

   <!-- e.g.: -->

<FLUX>

            <WAVELENGTH unit="nm"> 555 </WAVELENGTH>

</FLUX>

</UCDList>

(
  Now I have moved the multiple values problem one level down ...   but I think that we can live with:
  <UCDList>

           <x1> val11 </x1>
           <x1> val12 </x1>
           <x2> val2 </x2>

  </UCDList>
)

A criticism will probably be that this might OK for UCDs, but there are other metadata out there which are not UCDs,and those other metadata could still have multiple values, so we are back to the original problem.

At the contrary, and that's where I go extreme, I think that all metadata should be part of a well defined dictionary. Not only that, but even metadata values should be part of the same dictionary!

Example, when I go to the pub and in front of a beer we start a nice conversation regarding WFPC2, we never refer to it as "instrument wfpc2", that is, we never use the syntax:

<INSTRUMENT> WFPC2 </INSTRUMENT> (or whatever is the UCD for instrument_name)

but we always use

<WFPC2/>

because WFPC2 is part of our dictionary.

I'm claiming that WFPC2 should become a sort of UCD, at list within the HST context. (One could imagine the existance of specialised sets of context-dependent UCDs managed by each individual project). The same is true for all other metadata.

You will claim that elevating UCDs and other metadata info to the level of XML tags is not a flexible approach. At the elementary school, when I misspelled a noun, or invented one, I did not get a good mark for my flexibility, nor for my imagination. It is the price to pay to be understood.

You will claim that could be hard to build such a huge dictionary, and maybe even harder to use it.
But without a dictionary I will not know how to formulate queries like:

Select

   service_name, service_description, service_rowcount from

   Registry
where

   service_category = "CATALOG BROSWER"
   and data_class = "OBJECT CATALOG"
   and subject = "GALAXY"
   and querable_parameter = "GALAXY DISTANCE"    and output_parameter = "GALAXY NAME"

Both the left and right values must be known to both client and server otherwise the query cannot be formulated by the client, nor interpreted by the server. The dictionary is the place to list all those "words". When we speak our language we do not use typically more than a couple of thousands words, even though my dictionary at home lists more than 100,000 words. A perfect case for the 90/10 rule I suppose, let's make those 90 happy.

Enough provocations for today ...

Alberto Received on 2003-08-13Z11:54:03