On Tue, 20 Jun 2006, Edward J. Sabol wrote:
> Mark Taylor wrote:
> > I've had the following misleading wording in the VOTable specification
> > pointed out to me.
> >
> > In section 4.6 it says:
> >
> > In the TABLEDATA data representation, the default representation of
> > a ``null'' value is an empty column (i.e. <TD></TD>); for fields
> > containing arrays, individual ``null'' elements of the array can
> > be specified either by the value specified in the null attribute,
> > or by the "NaN" or "nan" text in place of the expected numeric value.
> >
> > This sentence only applies to certain datatypes; in particular it is
> > not true of integer types (unsignedByte, short, int, long), which fact
> > is made explicit in the relevant paragraphs of section 6.
> >
> > I suggest that (at such time as the next version of the VOTable
> > document is released) the sentence above is withdrawn, or modified to
> > make it clear that it applies only to certain datatypes.
>
> I have actually been meaning to propose a change to the VOTable specification
> here that would extend the validity of "<TD></TD>" to include integers, so I
> am in agreement with Mark and would strongly favor withdrawing this
> restriction on integer nulls entirely. The only alternative method for
> specifying null integers is to determine some integer value that does not
> exist in the range of valid values for that column and specify that as the
> null value. This is rather impractical and makes it impossible to dump the
> table data a row at a time (such as when querying from a database) in a
> stream-like fashion. Basically, you have to read or scan the whole table (or
> at least just the integer columns) before being able to dump the table in
> VOTable format. Also, I am aware of at least two VOTable implementations that
> ignore this restriction, so perhaps a case could be made that the VOTable
> standard should reflect the reality of existing VOTable implementations. When
> I first came across this restriction on how to encode integer nulls in
> TABLEDATA respresentation some months ago, I spent some time searching the
> mailing list archive and wiki trying to determine why this restriction was
> written into the standard. I could not find any such discussion, though
> perhaps I just did not look hard enough. Could someone here can explain or
> make a positive case for it or point me to some historical dicussion of the
> issue?
I don't know if the rationale for this decision is written down anywhere, but I have always assumed it to be as follows: there is no way to represent a null integer value in the BINARY or FITS variants of VOTable, since every bit pattern represents a valid integer value. If you allow empty TD elements for integers then you can't necessarily transform any TABLEDATA-format VOTable into an equivalent BINARYor FITS-format one. Furthermore, it may no longer be possible to perform a given FITS->VOTable->FITS round trip without loss of information (since FITS BINTABLE has no way to do the equivalent of an empty TD), which is an explicit goal of the standard (see sec 2.3 of the spec).
This argument does not apply to floating point types, since the IEEE representation specifies certain bit patterns which represent Not-A-Number values. The empty TD element for floating point values is just a convenience notation equivalent to <TD>NaN</TD>.
Having said all that, I'm well aware of the fact that it would often be much easier to be able to use empty TD elements, especially as you say when streaming out TABLEDATA-format VOTables. My STIL parser, like many (all?) others will accept empty TD elements as nulls if it comes across them, and certainly many VOTables out there use this convention despite the fact that it is not legal. So there is an argument for relaxing the current ideologically respectable position and allowing empty TD elements for all datatypes (though it would probably have the effect of marginalising the BINARY and FITS variants even more than at present). I'll leave the discussion, if any, of whether that would be a good thing to this list - I don't have a strong opinion either way.
Mark
-- Mark Taylor Astronomical Programmer Physics, Bristol University, UK m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/Received on 2006-06-21Z09:34:55