Problems about the Spectrum Data Model from the view of a Web Service programmer

From: Dobos, Laszlo <dobos-at-pha.jhu.edu>
Date: Thu, 14 Sep 2006 16:53:29 +0200


Hello DM group,

For those who don't know me, I'm Alex Szalay's and Tamas Budavari's student at the JHU and I was working on the Spectrum Service (http://voservices.net/spectrum) in the last three years. I'm studying physics at the Eotvos University, Hungary and before starting the university I was working for years as a database and distributed application programmer for commercial companies.

I just ran through the document quickly, because I was interested in the Web Service issues. I'm pretty sure that from the scientific view the data model is very well designed, and your work on collection and organizing the enourmous number of different metadata fields is appreciated, but I've found some problems that may cause trouble in a web service aware application.

Here are my observations:

	<Wavelength>
		<Unit>nm</Unit>
		<Value>700</Value>
	</Wavelength>

or store in attributes, like

        <Wavelength Unit="nm" Value="700" />

But the following is not correct:

        <Wavelength Unit="nm">700</Wavelength>

We should keep in mind that even though this third version looks nicer in a text editor, XML is nor for reading by humans, but for using in web services, so I suggest to follow the SOAP standard instead of making good looking documents.

The problem is the following: If one creates program classes in Java or C# or whatever and wants to expose his/her functions as a web service, he/she might also want to add the optional fields to his/her data modell. The problem is with the normal data types. Let's consider an everyday double variable. It can have the value of 0, but the SOAP serializer is going to write it into the XML anyway. To avoid a value written into the XML the variable should be set to null, but only reference types (pointers) can be null. In the Spectrum Service I'm developing I used the following approach: stored the header field as usual, in classes producing a hierarchical set of variables and stored the actual data in simple arrays of doubles (or ints). Because arrays are reference types, they can have the value of null, so if I don't want to use any of the optional axes, I simply set it to null.

If I would have used an array of structs (i.e. simple data types grouped together), I couldn't make this trick, I would have to set each unused field in each instance of the struct to null... but fields of the struct are simple types thus cannot be null...

So if we want to create a data model that supports the web services and still want to keep the xml file size small, we should consider storing the actual data in simple arrays instead of arrays of structs. It is also going to help in the future, when the binary web service protocold will be available!

I've attached a document that I wrote a year ago but somehow missed the DM group.

I Hope I could help eliminating these problems about Web Services. Unfortunately I cannot go to the Moscow meeting, but most of you will be at the ADASS, so if you're interested, I can show you the actual Spectrum Service code and explain the programming tricks that bridge the gaps between the complicated data model and the web service standard.

-Laszlo

Received on 2006-09-14Z17:44:10