Sebastien -
Good - we agree I think. This is exactly the point I was trying to make about data models and UCDs. The attributes of formal data models need to be defined precisely and unambiguously. The attributes of different data models need to be uniquely identified by some means, e.g., a globally unique name or reference (e.g., a form of UCD), a namespace (e.g., our temporary VOX namespace), or some hierarchical structure as in IDHA. The attributes of different data models, although they need to be distinguished from one another, may well share the same fundamental type, and a UCD could be used to express this.
Using different approaches to naming data model attributes and types (UCDs), as I think you are suggesting below, is one way to solve the problem. This provides both the precision required to identify DM attributes, and the means to associate elements of different data models for interoperability.
The only problem I see with this is that we would like flexibility in how we represent data models and metadata. Mapping DM attributes into the columns of a flat table, as in SIA or in a FITS header, is convenient and can simplify representations, up to a point. If datasets get complex enough then eventually one needs more structure and an approach such as IDHA or HDX may be called for. In many cases the simpler representation is adequate. It would be good if the underlying mechanisms, such as UCDs and how we define data models, were flexible enough to permit a variety of such representations.
If we map the attributes of a DM into table columns and we do NOT use the UCD to identify the DM attribute, then we need another tag of some sort for this purpose. This would be no problem in XML, but we would have the nuisance of carrying along an additional tag separate from the UCD. In VOTable this would give us NAME, ID, UCD, plus a new tag for the formal DM attribute assocation (conceivably ID could be used for this purpose but it already has other uses). In a representation such as FITS, (e.g., if we try to represent VO data in FITS), then it is harder. In this case one might want to use the comment field of a FITS keyword to contain something like a UCD: keyword = value / UCD. I am not saying we necessarily want to do this, but it is an example of representation flexibility and it would be good if our scheme could extend to this level.
If we DO use the UCD to carry this additional meaning, then the global UCD namespace could include both formal DM attribute names, and the more fundamental types used to associate different data elements as at present. UCDs would then provide a global naming index, with a single name (the UCD) being sufficient to carry all this meaning. Given the UCDs and an understanding of the associated DM (stored separately) we would then be able to recognize that different metadata elements (table columns in this case) are associated, define and use an XML schema to verify the integrity of the DM subset in these columns, use semantic relationships for inference, and so forth.
In this case what we would do is use the UCD tag in a representation to convey the data model attribute name, uniquely identifying both the data model and the attribute of the data model. The formal definition of the DM would then define each attribute of the DM, ** giving for each attribute the UCD type of the attribute **. If this UCD type is elemental then we would have the desired interoperability, and the means to associate and compare similar data elements. UCDs would thus provide the metadata "glue" to link related concepts such as fundamental quantities and data models, making possible a uniform representation for both.
To summarize, UCDs or something like them can play a key role to structure and link fundamental metadata and data models. The issue has already come up in interfaces like SIA and IDHA. Can we come up with something which is sufficiently powerful and general to provide both types of representations?
On Fri, 20 Jun 2003, Sebastien Derriere wrote:
> Doug Tody wrote:
> >
> > The key problem I see with trying to use existing UCDs is that historically
> > UCDs have been used primarily as fuzzy tags to link similar fields in
> > catalogs. In data access metadata such as is introduced in SIA we are
> > using UCDs to identify the fields of a formal data model. Here the tag
> > is not fuzzy at all, linking similar fields of unrelated catalogs, rather
> > it is a link to a field of a formally defined data model. Precision is
> > important for these data models - we are precisely defining attributes
> > of the data model.
> >
> > We should formally define data models such as spectralBandpass or WCS
> > and define, as part of the data model, the UCD tag used to identify an
> > attribute of the data model. When we represent a data model as a set of
> > related columns in a table, or as an entity struct in XML (as in IDHA or
> > HDX), we will use the UCDs to formally type the data model attributes so
> > that programs can use them unambiguously, so that we can use XML Schemas
> > for automated validation, and so forth.
>
> Hello,
>
> The primary goal of UCDs is to ensure interoperability between
> heterogeneous datasets. That's why they have been defined to some
> "reasonable" level of precision (what you call fuzziness).
> Internal attributes of a formally defined data model can be defined
> at any level of precision, and have their own names. But you can
> have *in addition* a UCD attached to every attribute (see the case
> of the IDHA model). Those UCD can ensure interoperability between
> different data models, and between data models and datasets.
> The names of the attributes can not a priori ensure this task,
> because nothing prevents from having the same concept named
> differently in different models.
>
> Sebastien.
Received on 2003-06-20Z18:55:58