Re: String representations of numeric values

From: Thomas McGlynn <tam-at-lheapop.gsfc.nasa.gov>
Date: Thu, 20 Apr 2006 14:04:14 -0400


While Roy and Dave may have hinted at this, let me say very plainly that I think this is a bad idea. Coordinates in VOTables should be respresented in decimal degrees and any other usage should be strongly discouraged. Allowing other formats makes the job of software which is to read and write VOTables much harder and more error prone.

User interfaces must support sexagesimal coordinates but they will need to be able to convert between decimal values and sexagesimal formats in any case so there is no savings there. The number of sexagesimal formats used in the community is large

   hh mm, hh mm.f, hh mm ss, hh:mm, hhHmm, hhHmm.f, hh:mm, ... and this brings up a whole issue of validation of the units. I.e., XML can validate that a number is a number but unless we do a lot of work it's going to be hard to validate XML documents that claim to have sexagesimal coordinates.

Bury it... Bury it deep!!

        Tom

P.S., for times there is an ISO standard and we should allow that. That may be supported in the XML standard but I cannot recall if that's true.

Mark Taylor wrote:

> Dear VOTablers,
>
> I have an issue concerning the VOTable format. This has been touched on
> by the group to some extent before, but no real consensus was reached
> (at least, no action was taken).
>
>
> Summary
> --------
>
> - VOTable doesn't allow you to mark columns which are essentially
> numeric but are formatted as strings (e.g. sexagesimal angles) as such
> - For some purposes such a facility would be useful
> - We should modify the VOTable standard accordingly
>
>
> The Problem
> -----------
>
> It is common for VOTables to contain columns which represent
> numeric values as strings (more precisely, as one-dimensional
> datatype="char" or "unicodeChar" arrays). The most important
> cases in astromony are the following:
>
> 1. Sexagesimal angle as hours:minutes:seconds (e.g. "23:04:46.5")
> 2. Sexagesimal angle as degrees:minutes:seconds (e.g. "+15:12:19")
> 3. Epoch as ISO-8601 (e.g. "2001-08-16T21:16:51.5")
>
> there are other examples; an exhaustive listing is probably neither
> possible nor desirable.
>
> There is currently no formal way for a VOTable to describe how to map
> these strings to numbers, which means that software can't reliably
> do anything with them apart from display their values as strings.
> Software would often like to be able to do something with them
> which requires their numerical values, for instance use them as
> coordinates in a plot, define ranges, interpolate between them,
> order values etc.
>
> Various software hacks are possible to work around the current
> situation and determine the intended numeric values from such
> encoded string-valued columns, for instance:
>
> - If the "units" attribute looks like "hms"/"dms"/"iso-8601" then
> the corresponding format is assumed.
>
> This contravenes the mandated use of the "units" attribute,
> which the VOTable standard states must be composed as described
> at http://vizier.u-strasbg.fr/doc/catstd-3.2.htx.
>
> - If the "ucd" attribute looks like an RA/Dec/epoch then
> hh:mm:ss/dd:mm:ss/iso-8601 is assumed.
>
> UCDs are really about semantics not form, which makes this a
> philosophically unattractive solution. There are related
> practical problems in that you might have multiple choices of
> representation for a given quantity - e.g. it would prevent you
> from saying that a right ascension is represented as
> degrees:minutes:seconds. Also, it's not all that easy to
> determine programmatically from a UCD whether it is likely to
> be represented in one string form or another.
>
> - You can trawl through some or all of the data in the column -
> if all the string values you look at appear to be valid
> sexagesimal/ISO8601 strings, then assume that's what they are.
>
> Depending on your processing model, this is likely to be
> inefficient (if you examine all data) or error-prone (if you
> examine only some). Or maybe both, in the case of string
> representations which don't have very distinctive formats.
>
> These hacks, and possibly others, may have a fair chance of working
> in practice, but as well as the individual problems listed above
> they operate outside of the VOTable standard and hence rely on an
> informal understanding between the VOTable provider and consumer,
> and different data/software might encode/decode this information
> differently, or not at all.
>
> I therefore think that we need some way of expressing in a
> datatype="char"/"unicodeChar" FIELD element that a certain value
> representation format is being used. As usual, the same applies to
> PARAMs.
>
>
> Proposed Solution
> -----------------
>
> I suggest one of the following:
>
> P1. Introduce a new attribute "representation" (or some other name)
> for FIELD/PARAM to contain a special string indicating how values
> in that column are to be interpreted. The special values
> "hms", "dms" and "iso8601" (any more?) would be initially noted
> along with rules for what counts as valid instances of those
> representations. Such "noting" could be in the VOTable standard
> itself or in some more dynamic form like a wiki page, or both.
> VOTable producers would also be free to introduce other values
> for private use. Such introduced values might possibly be
> noted in the standard at a later date if it's agreed they are
> useful.
>
> P2. Modify the definition of the "units" attribute so that it is
> permitted to contain either a catstd-format unit string as at
> present, or a special representation string as in P1.
> We could possibly say that for numeric-valued columns it works
> as at present, but for string-valued columns it has the new sense.
> However one could conceive of a case where you want both
> representation and units (though I can't think of any actual
> examples), which would be problematic for this scheme.
>
> P1 is the cleanest since it avoids overloading one variable with two
> meanings and the associated problems of inadvertently trying to
> interpret a catstd-format unit string as a representation type and
> vice versa (though in practice such collisions are not very likely).
> P2 is basically a fudge which has the advantage that it requires no
> change (apart from comments) to the schema, and moreover that there
> are VOTables out there which are already using this solution.
> I generally favour P1, though could perhaps be swayed to P2 by
> arguments of pragmatism.
>
> Comments about this on the list are welcome; perhaps we can also
> discuss and hopefully reach a decision in Victoria.
>
> Mark
>
Received on 2006-04-20Z20:04:40