Dear VOTablers,
I have an issue concerning the VOTable format. This has been touched on by the group to some extent before, but no real consensus was reached (at least, no action was taken).
Summary
The Problem
It is common for VOTables to contain columns which represent numeric values as strings (more precisely, as one-dimensional datatype="char" or "unicodeChar" arrays). The most important cases in astromony are the following:
there are other examples; an exhaustive listing is probably neither possible nor desirable.
There is currently no formal way for a VOTable to describe how to map these strings to numbers, which means that software can't reliably do anything with them apart from display their values as strings. Software would often like to be able to do something with them which requires their numerical values, for instance use them as coordinates in a plot, define ranges, interpolate between them, order values etc.
Various software hacks are possible to work around the current situation and determine the intended numeric values from such encoded string-valued columns, for instance:
This contravenes the mandated use of the "units" attribute,
which the VOTable standard states must be composed as described
at http://vizier.u-strasbg.fr/doc/catstd-3.2.htx.
UCDs are really about semantics not form, which makes this a
philosophically unattractive solution. There are related
practical problems in that you might have multiple choices of
representation for a given quantity - e.g. it would prevent you
from saying that a right ascension is represented as
degrees:minutes:seconds. Also, it's not all that easy to
determine programmatically from a UCD whether it is likely to
be represented in one string form or another.
Depending on your processing model, this is likely to be
inefficient (if you examine all data) or error-prone (if you
examine only some). Or maybe both, in the case of string
representations which don't have very distinctive formats.
These hacks, and possibly others, may have a fair chance of working in practice, but as well as the individual problems listed above they operate outside of the VOTable standard and hence rely on an informal understanding between the VOTable provider and consumer, and different data/software might encode/decode this information differently, or not at all.
I therefore think that we need some way of expressing in a datatype="char"/"unicodeChar" FIELD element that a certain value representation format is being used. As usual, the same applies to PARAMs.
Proposed Solution
I suggest one of the following:
P1. Introduce a new attribute "representation" (or some other name)
for FIELD/PARAM to contain a special string indicating how values
in that column are to be interpreted. The special values
"hms", "dms" and "iso8601" (any more?) would be initially noted
along with rules for what counts as valid instances of those
representations. Such "noting" could be in the VOTable standard
itself or in some more dynamic form like a wiki page, or both.
VOTable producers would also be free to introduce other values
for private use. Such introduced values might possibly be
noted in the standard at a later date if it's agreed they are
useful.
P2. Modify the definition of the "units" attribute so that it is
permitted to contain either a catstd-format unit string as at
present, or a special representation string as in P1.
We could possibly say that for numeric-valued columns it works
as at present, but for string-valued columns it has the new sense.
However one could conceive of a case where you want both
representation and units (though I can't think of any actual
examples), which would be problematic for this scheme.
P1 is the cleanest since it avoids overloading one variable with two meanings and the associated problems of inadvertently trying to interpret a catstd-format unit string as a representation type and vice versa (though in practice such collisions are not very likely). P2 is basically a fudge which has the advantage that it requires no change (apart from comments) to the schema, and moreover that there are VOTables out there which are already using this solution. I generally favour P1, though could perhaps be swayed to P2 by arguments of pragmatism.
Comments about this on the list are welcome; perhaps we can also discuss and hopefully reach a decision in Victoria.
Mark
-- Mark Taylor Astronomical Programmer Physics, Bristol University, UK m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/Received on 2006-04-20Z18:08:50