RE: XML Schema for the Simulation Data Model

From: Gerard <gerard.lemson-at-mpe.mpg.de>
Date: Tue, 12 Feb 2008 14:56:39 +0100


Hi Rick
Thanks for posting your work on the theory mailing list. Please let's keep discussing the work here and not go offline too soon.

I am trying to understand the relation of your schema to the SNAP data model and schema we are working on. To this end I have created a UML version of it which I have attached as a JPG. I also attached three JPGs of the SNAP model as today updated on the theory wiki. The updates are not very involved, mainly some details refined and cleaned up.

Comparing your model with this latest version of the SNAP DM I think the following correspondences can be made (I am ignoring detailed differences in attributes etc) :

Rick's model SNAP model


ProgramType				SNAPProtocol, SNAPSimulator
SimulationType			SNAPProject (1 below)
RunType				SNAPSimulation (1)
CharacterisationAxisType	Property (of ObjectType) (2 below)
CharacterisationType		Characterisation
ParameterType			InputParameter+ParameterSetting 
inputSnapshot			InputDataset

Comments and questions:

  1. You have a SimulationType and a RunType. The latter seems to correspond to a SNAPSimulation, as it contains the collection of input parameters and snapshots and has its own reference to a ProgramType. At first I assumed that your SimulationType corresponded to a number of SNAPSimulation-s, all with the same program and characterization. But from the example instance document you sent around I guess it is actually more like the SNAPProject. Is this correct? Your SimulationType has a reference to ProgramType as well. Is this supposed to mode a kind of pipeline?
  2. The concept of ObjectType is missing in your model. This makes it impossible to have multiple explicitly defined types of objects inside a single simulation. In the SNAP DM each object type is defined explicitly with its own set of properties. Note that my choice of using the name Property has been a point of contention by some of the Characterisation DM people, who wanted me to use Axis. I see you have chosen their side ;). I feel that for many simulations the ObjectType is definitely and explicitly present. For example some of the SPH simulations I have access to here have dark matter, star and gas particles, each with its own properties. I can see though that when someone's database is ever going to contain only one type of simulation, one might want to remove the extra "indirection" of the ObjectType. Obviously related to this is the absence of ObjectCollection. In the SNAP model this is the anchor that ties a list of characterizations to the properties of a particular object type. If you remove one, you can remove the other. Note that only today I added a ChildObject to the model. This is the outcome of an offline (sorry!) discussion with mainly Laurent Bourges and Herve Wozniak. They model galaxies being built from disks and bulges, each with their own properties.
  3. I assume Group and GroupedQuantity are borrowed from the Spectrum data model's XSD serialization? Because of single inheritance you have a problem with ProgramType, which can now not be a Resource. If instead you had made Resource a Group (impossible of course in the IVOA context), you could have ProgramType be a Resource as well. I must say I don't like Group very much. ID and IDREF are useful only when the element being referenced exists in the same XML document. This I think will often not be the case. I see it as an example of inheritance run wild.

Btw, I have for a while wanted to remove the inheritance of Resource from the SNAP data model, and done so in today's update. It is too restrictive I find. I think one can take a SNAP model instance and turn it into a Resource if one wants to register it, but that does not mean it "is a" resource in our model. There are more flexible ways of using existing models than always using inheritance. In particular the Content of Resource is very cumbersome. The SNAP model is supposed to describe the Content already.

4. You have InputParameter and ParameterSetting merged into 1, ParameterType. Note that I have added an attribute "value", representing the "xsd:string" inheritance in your ParameterType. In an earlier version of the SNAP DM I had made the same choice for simplicity. However Franck LePetit for example agrgued that redefining the list of parameters for his simulation types would be very costly.
If one runs parameter studies with lists of 100s of parameters it is better to have the parameters defined once on the Protocol (where they belong really), and only add the parameter settings on the experiment. Problem is that in XML this is often more involved, as one needs to somehow reference the parameter that may not exists in the same XML document (so IDREF will not work) etc etc.
Again, in one's particular database I can well see people choosing one or the other. For the SNAP DM I have now chose for the more correct way.

5. You do not have TargetObjectType, TargetProcess, Algorithm, Physics, SNAPWebService. These were all introduced explicitly to support discovery (first 4) and execution (SNAPWebService) in the SNAP protocol.

All in all it seems though that the models are pretty compatible, with the SNAP model being more general and comprehensive, as one should expect for a model that needs general application.
For now I see your model as an alternative representation of (a subset of) the information in the full model, that can have its particular application area. In that it would be similar to similar models for example from Patrizia Manzato and from the Horizon team (see the links in the "Existing data models..." paragraph in
http://www.ivoa.net/twiki/bin/view/IVOA/IVOATheorySimulationDatamodel )



From: owner-theory-at-eso.org [mailto:owner-theory-at-eso.org] On Behalf Of Rick Wagner
Sent: Tuesday, February 12, 2008 2:08 AM To: theory-at-ivoa.net
Subject: XML Schema for the Simulation Data Model

Hi,

After working to understand the current SNAP Data Model (in particular the current proposed XML Schema), I decided to distill it into a single document with fewer types. I've had some success, so I've post the Schema, and a sample instance document on the Twiki attached to the IVOATheorySimulationDatamodel page:

Schema
http://www.ivoa.net/internal/IVOA/IVOATheorySimulationDatamodel/Simulation.x sd

Sample Instance
http://www.ivoa.net/internal/IVOA/IVOATheorySimulationDatamodel/SimulationIn stance.xml

At the bottom of the page there are links screen shots of the elements and data types, which help to show they're relations.

This schema keeps the method of characterization as the SNAP model, by defining the axes up front (or, at the top), but is less abstract. It treats a simulation and its data as the results of running a program with defined input parameters, and does not try describe everything about the method and numerical representation. To me, these are things defined by the program (the software), and could be handled by defining separate VOResource for "Program" or "Software Project".

If this works looks interesting to anyone, I would be glad to write up a fuller description, any even put some documentation in the Schema and instance documents.

I plagiarized heavily from both them SNAP Data Model, and the Spectral Schema, so any credit should go to Gerard and the DAL, Data Modeling group, and annoyed comments sent my way.

--Rick



Rick Wagner, Graduate Student Researcher UCSD Physics
9500 Gilman Drive
La Jolla, CA 92093-0424
Email: rwagner-at-physics.ucsd.edu
WWW: http://lca.ucsd.edu/projects/rpwagner (858) 822-4784 Phone

Measuring programming progress by lines of code is like measuring aircraft building progress by weight. --Bill Gates

RickWagner.jpg SNAP__postprocessing.jpg SNAP__simulation.jpg SNAPDataModel.jpg
Received on 2008-02-12Z14:55:19