data modeling issues

From: Gerard <gerard.lemson-at-mpe.mpg.de>
Date: Thu, 28 Feb 2008 18:19:36 +0100


Hi
Below a longish list of issues that we have been discussing. I'd like to send these off to the dm and theory mailing lists. Maybe not complete, particularly I have no more time to spend on discussing the issue
of "level of normalisation" in the SNAP model at the end. Maybe one of you can fill that in.

Please your comments on whether these represent a fair summary of what we have discussed and please propose additions or changes. Also, can we send this email in one go, or is it too long?

Thanks

Gerard

Dear colleagues

In the theory group we have been discussing the SNAP data model in a small group and
come across a couple of "issues" that we want to present to the larger community.
Some of these are aimed at theorists who hopefully will use the model in some form,
others to the other data modellers in the DM working group that were not involved in these
discussions.
We would like to start discussions on these, though we may already make some decisions to keep us going until these come to completion.

  1. UML as normative data model representation We had agreed already in the past (Victoria) that a UML model would be the normative representation of the SNAP data model. It is also the agreed upon (Cambridge 2003) form that DM WG data models should be represented. We want to push this so far that the UML should be complete, in the sens taht all other required prodcust canbe derived of it. This includes amongst others:
    - all elements must have descriptions
    • all attributes must have datatypes and multiplicities (0..1 or 1)
    • more ...
  2. Subset of UML syntax To make the entrance to UML as simple as possible, and make the possible modeling choices as restricted as allowable, we want to settle on a subset of the UML modeling elements. UML2 allows one to define a so called profile, which formalises these choices and can be used by toools (such as MagicDraw) to adjust the environment. Such a profile can include stereotypes for detailing certain syntax types further, standard tags to be added to elements for application or other purposes or a standard set of primitive datatypes for example to be used by us all.
  3. XMI as standard serialisation of the UML document We propose to use the XMI (XML Metadata Interchange, http://www.omg.org/technology/documents/formal/xmi.htm) serialization of UML as the standard representation of our UML data models. So far we have been using the community version of MagicDraw (14.0) and the XMI it generates, so that we can all actually work on the diagrams. It is doubtfull whether other tools will be able to use these documents directly, even though that is what XMI's intended goal was.
  4. Standardized and automated mapping from UML to XML schema. In Cambridge (2003) we (the DM WG) decided that at least also an XML schema should accompany the UML diagram as the product of a data modeling effort in the IVOA. Use is obvious (?). We propose that such a schema should be generated from the UML automatically. This requires a set of mapping rules from the proposed subset of UML to XML schema. This set of rules can be implemented in an XSLT script that, working on the XMI representation, can generate appropriate XSD files. This may be generalised to other representations such as relational model, Java classes etc. There are some open issues with this, in particular how to map shared associatiosn/references. We can bring these up during the discussion. It implicitly will also imply a selection of what style of XML schema we write. The Registry and VOTable groups have settled on a suggestion for XML schema style that was originally derived from precisely such a mapping. On the other hand STC's schema for example is deviating from that style. It was not precisely derived from a UML model either ofcourse.
  5. Repository for storing these results Currently we store SNAP products such as XMI documents generated by MagicDraw, and XML schemas ("generated by hand" from the UML) on the theory SNAP DM wiki pages. There has been some discussion of moving this to a proper source repository. We have started using one provided by google and already used by Norman Gray in his semantics project. There was some discussion on respositories independent from the IVOS during the TCG telecon the other day, but I missed the conclusions.

The above issues were mainly aimed at the community at large (dm, registry, ivoa). Now some issues with the SNAP data model itself.

6. Normalisation
...

7. Need for semantic vocabularies
...

8. more?

Best regards

Gerard Lemson for
Laurent Bourges, Norman Gray, Rick Wagner Received on 2008-02-28Z18:18:10