Re: String representations of numeric values

From: Thomas McGlynn <tam-at-lheapop.gsfc.nasa.gov>
Date: Thu, 20 Apr 2006 14:44:15 -0400


Mike Fitzpatrick wrote:
...

> Well, the USNO-B1 service from Flagstaff adds another twist and writes the
> RA/DEC as a 3x1 array of doubles (e.g. "12 34 56.7") and not with the
> traditional
> colon-delimiters. I can't find another example quickly just now, but
> I know I've
> seen sexigesimal in other data sets. Note the registry validation
> level is 2 and
> the ConeSearch validator only issues a warning that the 'units' should
> be in degrees
> (they're listed as 'hh mm ss", is/would that be different than "hh:mm:ss"?).
>

The validator doesn't check the format of the numbers, but I don't think this should be considered a valid VOTable.

It does point out a serious error/omission in the VOTable definition though, in Section 7. For the normal <TABLEDATA> representations that are almost exclusively used, the definitions of the primitives for types short, long, float, double and probably bit are all incorrect. We tried to define how the data would be stored on the computer in terms of what it gets parsed into, but if you don't know how to parse it, that's not very helpful. The data
are not binary, they are an ASCII representation and there is very little to specify how the transformation to binary is to be done.

E.g., is prefixed white space legal? It is not for character data (i.e., I believe it's supposed to be part of the string). But we generally permit it for numeric data and I'm not at all sure that we are consistent with white space before strings.

If we have a bit value in TABLEDATA do we encode it as a string of 0's and 1's or do we follow the spec and encode bits into bytes -- even though that might give us invalid XML if we happened to generate a byte that had the bit pattern of '<'.

How is a NaN to be written in table data. Using the string 'NaN'? Are NAN or nan allowed? What about the infinities?

Are spaces allowed between signs and digits? What exponential notations are permitted? E.g., is the Fortran 1.D10 allowed or only 1.E10. What about 1.e10? Do we allow exponents at all? Do we use Java syntax? What about numbers that go outside the range supported by singles and doubles? How are they supposed to be represented? NaN? Inf?

What about spaces inside numbers? Fortran allowed those to be treated as 0's. Do VOTables support that?

If X is a double are the two values

  1. and 1.000000000000000000000000000001 the same? IEEE doubles can't distinguish them (assuming I put in enough 0's). Is a VOTable reader that enables a user to distinguish these non-compliant?

Even though I'm nominally one of the VOTable authors, I'm not sure I know what the answers to these questions are.

Since the tabledata are clearly not IEEE data, we probably shouldn't say that they are, but we need to have some rules for how numbers are represented in <TABLEDATA>.

As a byproduct this would have made it clear that the USNO VOTable is wrong, but this is a more general failing of the standard.

        Tom Received on 2006-04-20Z20:44:37