Re: UCD for SIAP

From: Doug Tody <dtody-at-nrao.edu>
Date: Tue, 17 Jun 2003 22:17:36 -0600 (MDT)


Roy -

Certainly we can replace the VOX: namespace UCDs with global UCDs, so long as we are willing to make up new UCDs where needed. This might be a good short-term solution.

The key problem I see with trying to use existing UCDs is that historically UCDs have been used primarily as fuzzy tags to link similar fields in catalogs. In data access metadata such as is introduced in SIA we are using UCDs to identify the fields of a formal data model. Here the tag is not fuzzy at all, linking similar fields of unrelated catalogs, rather it is a link to a field of a formally defined data model. Precision is important for these data models - we are precisely defining attributes of the data model.

We should formally define data models such as spectralBandpass or WCS and define, as part of the data model, the UCD tag used to identify an attribute of the data model. When we represent a data model as a set of related columns in a table, or as an entity struct in XML (as in IDHA or HDX), we will use the UCDs to formally type the data model attributes so that programs can use them unambiguously, so that we can use XML Schemas for automated validation, and so forth.

This one-to-one mapping of UCDs to formal data models is a concept that does not currently exist in UCDs. If we try to take a more classical UCD-like approach and use UCDs to associate "similar" fields of different data models, then we no longer have a precisely defined data model. This association of "similar" fields of different data models should occur in the data model definitions, where data models may define attributes in terms of more fundamental data models or quantities.

Some more specific comments based on your proposed UCDs. I haven't tried to be complete, these are only examples.

> example is there a reason for VOX:Image_AccessReference as a new UCD, why
> can't we simply use DATA_LINK from the existing set?

This would work if we have only one DATA_LINK in an interface such as SIA, and we define that within this particular interface, DATA_LINK means the formally defined SIA Image_AccessReference. If for some reason we have two DATA_LINK attributes then we are in trouble, as the type is then overloaded and the meaning is ambiguous. The problem with what you suggest is that we are inherently overloading the type. It might work for a while, but will cause a problem in the future if we apply the same logic to a similar attribute. Since the attribute is precisely defined we gain nothing by using a fuzzy tag.

> VOX:Image_Naxes
> ** new: POS_TRANSF_WCS_NAXES
> specifying the number of image axes.
>
> VOX:Image_Naxis
> ** new: POS_TRANSF_WCS_NAXIS
> NOTE: Can a UCD refer to an array like this?
> with the array value giving the length in pixels of each image axis.

This sort of mapping of the WCS data model onto "standard" UCDs (if newly defined) is certainly possible, so long as we define these tags as part of the WCS data model.

However, the geometry of an image (NAXES, NAXIS) is not really part of the WCS - these are image attributes (in FITS they existed long before WCS). A WCS is associated with an image. CDELT, FRAME, etc., are part of the WCS.

Do we need an image geometry data model? A general image attributes data model? NAXES, NAXIS are clearly (or shall we say, clearly should be) precisely defined terms of some formal image data model.

> VOX:BandPass_ID
> ** existing: INST_FILTER_CODE
> identifying the bandpass by name (e.g., "V", "SDSS_U", "K", "K-Band", etc.).

BandPass is a more general concept that "instrument" or "filter". Filters and instruments are examples of specific entities that have a bandpass. "bandpass" should be a formal data model with "bandpass"-specific attributes (UCDs).

> VOX:BandPass_Unit
> ** existing: UNITS -- but should be GROUPED with Bandpass
> identifying the units used to represent spectral values, selected from
> "meters", "hertz", and "keV".

What do you do once there are two data models both of which need to define their units? GROUPED above, whatever that is, may address this, but it is better to have an explicit, unambiguous attribute. The units have to be precisely defined. The data model could define a default if this parameter is absent.

> VOX:Image_PixFlags
> ** existing: CODE_MISC
> specifying the type of processing done by the image service to produce an
> output image pixel. The string value should be formed from some
>
> combination of the following character codes: C, F, X, Z, V

What do you do as soon as there are two fields with UCD=CODE_MISC in the metadata? In the current SIA, the implication is that Image_PixFlags is an attribute of some sort of "image" data model used in SIA, hence it has a precise definition.

Similar comments apply to other UCDs where we confuse UCD associations with data model terms.

I think it would help a lot if we just took one of these simple data models, e.g., spectralBandpass, and formally defined it, with UCDs assigned to identify attributes. Then we could use the same approach to define all the other SIA image attributes, grouped by data model. Later perhaps we can show how to define data models in terms of other data models or ultimately Quantities, to fully define complex data objects via a hierarchy of formal definitions.

Received on 2003-06-18Z04:20:54