is UCD out of control?

From: Roy Williams <roy-at-cacr.caltech.edu>
Date: Wed, 22 Oct 2003 18:02:10 -0700


I must say that I find Tom's version of the UCD paper has a number of definite improvements, such as the importance of Groups, with the child inheriting UCD from its parent.

However, I find the suggested syntax confusing and muddying. It seems to be going back to the old model of "base + other stuff" that we discussed in Cambridge. What I do not understand is how a machine would parse the other stuff, these modifiers and attribute properties and so on. I do not understand which is a modifier and which is an attribute. The reason we went with the new scheme was that we couldn't imagine writing code to disentangle the Cambridge scheme.

In the 1.9.9 document, the first word of the UCD corresponds to the thing that has the units. In "stat.variance; phys.length", we know that the unit is L*L (its a variance). The second word was the concept to which this relates.

Everything in UCD2 should be of the form "The <property> of the <concept>".

Forget the attempts to justify three words. Leave that for UCD3.

Every UCD has at most two words. Keep It Simple!

In the 1.9.9 document, we tried to keep as close as we can to the metadata mines -- the 3000 tables of Vizier from which all this comes. We thought that had more validity than somebody (anybody) sitting down and inventing structure. Look at the problems we get when we move away from mining real metadata: Tom thinks that "error" belongs in a tree called "measurement", and the earlier version put it in a tree called "statistics". There is no right or wrong here, just opinion. I pointed this out in the earlier document concerning the "equinox" concept, but that has been deleted. We must make every attempt to follow what 3000 published paper have done -- not push our own opinions.

In Tom's paper, there seem to be lots of new attributes (value, vector, multiplet, local, human, soft) that further stretch the scope UCD. If there are multiple values in a table cell, then the VOTable will indicate this in other ways. Perhaps Tom can put in a few more attibutes so we can find out if the data quantitiy is a float or an integer? UCD is about *semantic type*, not all this other stuff. What *real* tables use the "human" section? Are humans base, attribute, or modifier?

I think we can all agree that UCD as currently formulated cannot express the complexity inherent in its task. What is really needed is a well-thought RDF vocabulary of predicates and objects, and that is the idea of UCD3. The intention of UCD2 is to provide a stopgap that will be backward compatible when UCD3 arrives. We use only one predicate for now "propertyOf". But Tom has chosen to remove all the discussion of why and what we are doing, where we are going, and driven instead down a road that tries to put a lot of complexity into this string representation. The result is something terribly complicated and not very understandable.

Of course the proof is in the pudding. As usual in the VO, we are making a language that is very expressive, then hope to eventually write the code that understands it. So let's think it through now. How do I construct code that "understands" something like "phot.flux; em.optical; intent.calculated; value". I want to know what kind of data structure can be created from this, I want to know how to compare UCDs, I want to know how to convert a UCD into a human-readable description of what it represents. I know how to do these things with the 2-word property/concept style, but not with this grab-bag of attributes and modifiers.

In conclusion is my IF ... ELSE clause:

IF {

we cannot find a killer app for UCD2, if we cannot write code to understand them, we should stick with UCD1, that has been improved and groomed in the last months. Then next year we can make UCD3.

} ELSE { I like simplicity. I want to turn every table cell into "<property> of the <concept>" so that every UCD2 would have at most two words.

}



Caltech Center for Advanced Computing Research roy-at-cacr.caltech.edu
626 395 3670 Received on 2003-10-23Z01:07:36