[SEMANTICS] Simulation Categories and Standard Vocab words

From: Laurie Shaw <lds-at-ast.cam.ac.uk>
Date: Fri, 21 Jul 2006 19:47:41 +0100 (BST)


Dear theory group,

I would just like to make a few comments (rather belatedly) on the Semantics discussion, especially the emails by Frank, Herve and Miguel. I've listed my points by topic.

UCDs vs Standard Vocabulary


It seems that there is a little confusion on whether we are trying to define new UCDs or words for the Standard Vocabulary. Initially, on the twiki page
(http://www.ivoa.net/twiki/bin/view/IVOA/TheorySemanticVocabulary) we outlined lists of words under various categories (e.g. physical process, subject, algorithm, etc) describing various different types of simulation properties. In his email, Frank Le Petit then defined a specific set of categories that must be filled in in order to describe the purpose and operation of a piece of simulation code (to be published in the VO) adequately.

As far as I understand it, we will require there to be a UCD describing each of the categories that we eventually come up with. However, the words that we use to populate a category will come from the IVOA standard vocabulary (a vocabulary that encompasses the set of words that can be used as UCDs, plus all standard terms that are currently in use in astronomy (in UCD syntax)).

So, eventually, once we have decided on a set of categories, we will need to propose a UCD for each category, plus words for the standard vocabulary that will be used within these categories that we identify as being necessary in order to ensure that most astrophysical simulation codes can be adequately represented.

Simulation Categories


The set of simulation categories that must be filled in order to fully describe a simulation that were proposed by Franck are as follows:

1 - Name of the code
2 - Name of the developer / team / contact
3 - Version of the code
4 - Description of the code (ASCII text)
5 - Physical processes
6 - Subject
7 - Algorithm
8 - Time evolution
9 - Type of results

10 - Results format

So far, it seems like no-one has any problems with the first three, each of which I think there already exists a UCD (meta.id, meta.curation, meta.version, meta.note). I don.t think that we need define any new standard vocabulary (SV) words for these as they are all fairly specific to each simulation, although, it could be argued that the .Name. be part of the SV as simulated datasets may point to this tag.??

The next four categories on the hand seem to be the most important (at least in terms of defining new SV words) as they entirely describe what the code is trying to do and how it does it.

Taking the .Subject. category first, which describes the astrophysical objects that the code is primarily dealing with, looking at the IVOA standard vocabulary as it is now
(http://www.ivoa.net/internal/IVOA/IvoaUCD/VO-standard-vocabulary_8.pdf) it seems like many of the objects are already roughly accounted for (I.ve only had a quick glance at this though). The only things that I can see that are not there are Dark matter halo (and subhalo) and volume of space. Note that we can still use the UCD syntax and structure for SV words, so .stellar cluster. is .stars.globular.cluster. or .stars.cluster..

The .physical process. and .algorithm. categories are by far the most complicated due to the sheer range of physical processes that are modelled, in how much detail (approximations or exactly), their relative impact on the results (does a process, e.g. stellar feedback, have a major or minor effect on whatever is being simulated) and how it is incorporated.

It seems to me that .physical process. could mean two things here .

  1. the overall phenomenon that is being simulated (e.g. galaxy *formation*, stellar *evolution*)
  2. the physics that is accounted for in doing so, (e.g. radiative transfer, GR, etc)

We could either go the way of having two separate categories, one for .process. and one for .physics., or just keep .physical process. and have multiple entries that together define what is going on, and the physics that is making it happen. It should be noted here that there is already a .process. field in the SV (e.g. procees.accretion, process.emission, etc), so I guess we have to propose those that are missing, e.g. process.radtransfer, process.evolution).

So for stellar evolution we might have, .stars. in the Subject category and .process.evolution. and .process.radTransfer. in the Physical Processes category (or, alternatively, have the last two in separate Process and Physics categories).

I.m wondering whether this approach gets past the Process or Subject? category problem for .stellar evolution. and .stellar population synthesis. that Herve pointed out.

Algorithm


As others have pointed out, for this category I think we have to be careful not to be too specific else there will end up being millions of words in the SV list for .comp.algorithm. (or whatever it ends up being called). I think that the words required for this category should only refer to the top-most level algorithm. Looking at the list on the twiki, it seems that some of the words suggested are almost code Names or Physical Processes -- I don.t think .collisionless. or .Fuel consumption theorem. are algorithms (although I could be wrong). Furthermore, tpm, pppm, pm, pp, etc, are all types of Nbody code, so the word for tpm might be .comp.alg.Nbody.tpm., whilst I.m guessing .adaptive refinement mesh. would be .comp.alg.mesh.adaptive-refinement. , or something like that.

I.m thinking that in the near term most of the entries for algorithm will be under .mesh., .hyrdo (including sph). or .nbody. with a few extras. If we decide to get more specific than this, then the number of words we.ll require to describe different simulations to the same degree with increase exponentially. If someone was to need more detail, they could always look at the .Description. category, or even a paper that that points to, having before rounded down the search to tree-sph, etc.

Time Evolution
--

Do we really actually need this? In terms of methods like .leap frog., etc, it seems like this is more for the Algorithm category (and a very specific detail at that). I would have though that the temporal resolution, included under some kind of Parameters category for individual simulated datasets might be more relevant. I agree with Franck that this should at most be a YES or NO flag with regards to whether the code is time dependant or not.

Type of Results and Results Format
---

I totally agree with Franck.s suggestion here..

Code language and .parallelism.
---

I also propose a new category (Language?) for the language (c++, fortran, etc) in which the code is written and whether it is designed for use on multi-processor machines or clusters. I guess these could also be under two separate categories. Parallism could be either by protocol (mpi, openMP) or by a flag (yes/no).

Single or Multiple Entries in each Category


I think that we should not place limits on any of the categories (except for maybe Version and Time Evolution) to just one entry. Could cause problems down the road as simulations get more and more powerful in size and especially scope.

Would be great to hear peoples thoughts on these topics, and to make some decisions so that I can write a proposal that we agree on!

Sorry for the long email,

Cheers,

Laurie Received on 2006-07-21Z20:48:31