Sebastien's 'UCD: Status and Perspectives' gives a very clear summary of the recent discussions, responding on the basis of the considerable experience of CDS. Thanks very much - I actually enjoyed reading it, especially the ontology section. It suggests to me some issues which it would help if we made decisions on, or at least reached a working agreement for the immediate future. Maybe Sebastien (or anyone) could suggest what is most urgently needed for discussion at the IVOA registry meeting Wednesday 19 Mar - is anyone planning a presentation? This is what occurs to me:
'Status and Perspectives'
discusses relevant issues in '3. UCDs and data models'. We need
general UCDs which describe an entire data set ("Header metadata"),
e.g. for which wavelength regime (Radio, IR, Optical etc) - where
such currently UCDs exist, they are often parts or parents, rather
than ones which are instantiated. See e.g.
[[http://www.stsci.edu/~hanisch/NVO/ResourceServiceMetadataV6.doc][Bob Hanisch ResourceServiceMetadataV6.doc]] or
[[http://www.stsci.edu/~hanisch/NVO/ResourceServiceMetadataV6.pdf][ResourceServiceMetadataV6.pdf]]
and
[[http://wiki.astrogrid.org/bin/view/Astrogrid/RegistrySchema][AstroGrid
draft registry schema]]
We also need a means to associate header data with column labels, e.g. if all data in one catalogue is at 1.4 GHz and another at 1.6 GHz this currently has two separate column UCDs, PHOT_FLUX_RADIO_1.4G and PHOT_FLUX_RADIO_1.6G - dozens of UCDs in all. Using the atom idea across the header and the columns, this could be achieved using one column UCD PHOT_FLUX, and header UCDs for the wavelength regime (radio) and the nominal frequency. This does require that we evaluate the data corresponding to UCDs, see below.
2) Evaluating data identified by UCDs
Many people think of UCDs as primarily for selection of catalogues for a human to then view the contents. This is their primary funtion as far as the Registry is concerned, but for actually executing queries we need to compare values within cagtalogues and possibly associate the results with a new UCD (e.g. compare flux densities to derive a colour).
Until recently, UCDs were only evaluated routinely (e.g. via Vizier) in one of the most difficult cases - coordinate conversion. Otherwise, the user said 'give me a list of catalogues containing information about x' and the service converted x into UCDs and returned a list of catalogues containing columns corresponding to x. This is now extended to photometry but only for special formats (see above or the AVO demo). We now need UCDs which can enable selection via evaluating UCDs, e.g. if I want flux density measurements between 1.4 and 1.6 GHz I should be able to access data at 1.5 GHz too, via the registry search seeing there is a catalogue containing radio flux densities, and the query execution finding OBS_FREQUENCY 1.5 GHz - in the header, or as a column heading for a collection of measurements at a range of frequencies, as well as a column PHOT_FLUX.
This may be a problem where one UCD describes several columns e.g. TIME_DATE for the start and stop of observations, but we want to evaluate both e.g. for error bars on a proper motion measurement. Or do we just need an algorithm which says 'if there are two TIME_DATEs and they are different, take the difference'?
3. Who reads UCDs?
The document asks 'how do I find the proper UCD to decribe my data set'. At present, this is done automatically if you are someone contributing a 'simple' table e.g. out of a paper; the average astronomer is not _forced_ to be exposed to UCDs, and a user doing a search certainly isn't (although they might want to use them directly). Data providers of major archives may need to investigate them to check they are allocated correctly or suggest new ones, as do VO workers. Thus they should be reasonably human-readable and available for anyone, but not restricted to the lowest common denominator of astronomical understanding.
We could increase the present number ten or even 100 fold (still <10^5) - I am not saying we should, but we need to be afraid of it. For example, if I am using the SED tool in Aladin, I really do not want to know that an optical data set has UCDs for photometric zero point and colour corrections (although an optical astronomer might and I had to learn...) but fortunately the Aladin prototype SED tool knows how to use these to plot magnitudes in Jy. Conversely an optical astronomer does not want to know parameters associated with radio visibility data but if s/he wants to extract an off-centre image at a certain resolution from the MERLIN archive, they need the extraction tool to know about baseline lengths, visibility integration times etc. to chose a data set with an appropriate field of view and resolution.
Thus, completeness of UCDs for their purpose is more important than economy, and in fact the savings in going to an atomic structure would probably mean we could add UCDs for specific errors, and remove the degeneracy of things currently under 'TIME_DATE' or 'NUMBER' etc., without making UCDs unmanagable. As the document says, the creation of new UCDs should certainly be restricted to defined bodies (e.g. via a central monitoring panel) to avoid duplication and maintain consistency at a functional level. This probably means UCDs will proliferate slowly, acquire some clumsy ones, and then be pruned or refactored over a cycle of months or years.
Best wishes
Anita