Re: Pre-meeting UCD question

From: Anita Richards <amsr-at-jb.man.ac.uk>
Date: Fri, 2 May 2003 15:43:21 +0100 (BST)

Ray Plante's 'Requirements for the Future' is a very neat summary, and I am especially glad he has put on record the need for * Accessible documentation and
* Backward compatibility
- and I think it is a good basis for the Cambridge discussion

There are a couple of things which have come up previously:

The use of UCDs other than for column content descriptors - e.g. for the ResourceMetadata which summarises the content of a dataset for the Registry.
Should these be taken from exactly the same set of UCDs? Or should these embody their context (e.g. by reserving superclass UCDs, such as the first element if we use Guy Rixon's atoms...)?

How to qualify UCDs?
At the moment we have some 'degenerate' UCDs, e.g. SPEC_WAVELENGTH in the IDHA model for both the high and low bandpass limits. This means you have to evaluate both, if you just want data above a certain frequency
(say). Then we have some overspecified, e.g. all the PHOT ones for U B V
R I, RADIO_1.6, _1.4 etc.

In some cases this might be solved just by adding MAX/MIN or equivalent
(LONG and SHORT for the bandpass as MAX/MIN is ambiguous unless you know
if it is freq. or wavelength). However with XML we can be more intelligent, as has been pointed out, and give them properties or attributes or values. How do we cope with the cases where the required information (ie another UCD?) is elsewhere in the same data set, e.g. another column or in the header? e.g. for associating cumulative errors with a data point, or for realising that an entire catalogue is at 1.4 GHz or in the Cousins photometry convention?

That is, we need not only to be able to select on the basis of UCDs, but to be able to interpret their properties at the Registry level.

In some cases we might need to evaluate the data they describe at the Registry (as in the present bandpass example I gave), but perhaps that can be avoided, or should be, if possible, so that you only have to dive into the dataset itself when you are answering the query in detail?

This can be summed up as saying that we want UCDs to describe data but we should avoid as much as possible using UCDs as data.

Thoughts on some of David's comments:

Units and accuracy

When we are using UCDs (assuming this includes the ResourceMetadata) in the Registry, we do not need high accuracy as long as we err on the side of inclusion (so a catalogue spanning 10 - 3001 GHz would be both radio and IR). If we want catalogues with photometry, we do not need to know what units the flux is measured in. However if we want a certain level of accuracy we do need units (at least if we are to avoid quality factors etc. which will be very arbitrary) - but here, the Registry could standardise everything to one sort of unit for each quantity as the conversion does not need to be accurate (e.g. noise < 0.0001 Jy derived from a certain limiting magnitude) as long as it is rounded down/up as appropriate. This is analogous to Ray's squinting.

I think it is crucial that do not tie UCDs to certain units or whatever, as we then make it difficult to interconvert. Any conversion errors should be carried in proper UCDs for errors - thus, if you ask for x-ray photometry in counts, you will get a result which will probably ahve greater accuracy than if you ask for it in Jy - but the latter is vital if you want to compare it with data at other wavelengths.

Cheers
a

Received on 2003-05-02Z14:44:51