CfA VO Group comments on the VOTable document

From: Ian Evans <evans_i-at-head-cfa.harvard.edu>
Date: Thu, 02 May 2002 12:55:18 -0400

To: VOTable Working Group
From: Ian Evans for the CfA VO Group
Subject: Comments on the VOTable document

These comments are divided into two groups. First, those comments that are specifically and clearly related to sections of the existing VOTable document, and second, those comments that are more general in nature. The two groups are not in principle mutually exclusive, but individual comments that might fit in both groups are not duplicated.

Specific comments relate to VOTable document version 0.99, dated 09 April 2002.

Specific comments by document section:


Section 2.1: We note that Table 1 restricts primitive datatypes to those that can be represented in FITS. It is not clear that this limitation is appropriate, given the diverse nature of the data that the VO must support. In particular, we note that the possibility of extending FITS support for unsigned int and unsigned long have been discussed intensively, clearly identifying a perceived need for these datatypes in the community. We do not see any good reason to exclude the 16- and 32-bit unsigned integer types and would urge they be included in the list of allowed primitive data types. In addition, given the expected life of the VO data formats and the fact that 64-bit machines are now quite commonplace, some of us feel that by extension provision should be made to allow support of unsigned and signed 64-bit integer and 128-bit real datatypes in addition. We note that 64-bit integer datatypes are already in common use for spacecraft orbital ephemerides, for example.

Section 4.5: We note that the mechanism for locating the DTD has several implications for how VO analysis can/will be performed that seem to merit further discussion. Choosing where you want your DTD to be located (embedded, local, or internet) is not a simple matter. Although at the office you may prefer VOTables with non-local DTD references, if you then download the same dataset to your laptop to continue analysis while flying over the Atlantic then non-local references are clearly not appropriate. The notion that the VO is a transparent computational resource suggests that the responsibility for resolving this issue should not be placed on the user's shoulders, who would then have to know enough to (a) copy the extra files locally, and (b) modify the VOTable XML itself to continue working. Building software to resolve this (possibly by deciding intelligently when to copy DTDs locally, and to fall back on the local copy when the primary DTD referenced in the VOTable cannot be resolved) is one option that would contribute to VO transparency, but is problematic in its own right.

Section 5: The mechanism for defining "null" values does not always work well for primitive types such as byte, int, etc. if the actual data space spans the entire range of allowable data values. For example, if all of the integer data values from 0 through 255 may actually be present for a byte valued quantity, then null="nn" does not work. Such cases are quite common in our experience, especially when dealing with instrumental telemetry that defines ancillary parameters (for example, temperature, pressure) that may be needed for calibrations. Because of this, and also our comments regarding section 7 below, we feel that an alternate mechanism to define null values (for example, having "null" reference a data quality column that records the validity of the data) would be a useful addition to the existing definition of null values.

Section 6.1.1: The last paragraph describing the example does not seem to match the example listed.

Section 6.1.3: We feel that the restriction that variable length fields are permitted only after all fixed length fields merits further discussion. We note that binary data format wisely does not require knowledge of the number of records in a data table at the start of the table, in order to allow streaming of data. Many queries will result in data being streamed from several sources, each of which might have a somewhat different set of attributes. For example, a query might return RA, Dec, and V magnitude from all of the star catalogs, but only some will return information about colors and proper motions. Requiring knowledge of which fields will be variable length when the table is defined seems to defeat many of the advantages of data streaming in the same way that having to know the number of records ahead of time does. A flexible and efficient representation of binary tables for cases such as these clearly would be a desirable and necessary enhancement to the existing binary table definition

Section 6.3: It is not clear from the description how the "rights" attribute works. There are numerous possible authentication methods that variously require differing authentication information which must then be encoded in a variety of ways. Some clarification and examples would be helpful here.

Section 7: The "null" value of a logical datatype is indicated by a space. However, this does not interact well with an embedded TABLEDATA in the case of a variable-length array. Consider the following example:

    ...
    <FIELD ID="Logarray" datatype="boolean" arraysize="*"/>     <DATA>

      <TABLEDATA>
      <TR>
        <TD>0 0 1 0 1 1</TD>
      </TR>
      <TR>
        <TD>0 1 0 1 1</TD>
      </TR>

    ...
The second row contains a "null" for the second logical value; however the most likely outcome is that the parser will erroneously interpret this as a row with 5 logical values rather than 6. A solution may be to accept in the VOTable the (case-insensitive) string "null", as is used commonly in many other contexts.

Section 7: The document defines that single and double precision floating point numbers "shall consist if ANSI/IEEE-754 ... floating point numbers." IEEE-754 does not specify a required byte ordering of the encoding (even though the bit ordering of the formatted - 32, 64, 128 bit etc. - numeric item is defined). If a specific byte ordering
(for example, so-called big-endian or network byte ordering) of the
encoding is required, then that should be defined in the document; if not, then a mechanism for identifying the byte ordering is required.

Section 10: The DESCRIPTION element in the schema diagram has a "+" symbol, implying sub-structure. However, this element is not expanded anywhere in the diagram.

General Comments:


(1) We note from our preliminary analysis and discussions regarding
the VO data model that there are cases where rows, columns, and individual cells of tables represent different types of datamodel objects. In other words, the requirement that all of the entries within a single column be the same type of beast is too restrictive. This flexibility could be achieved using the VOTable structure if a new type - a pointer to an arbitrary structure - is defined as a VOTable primitive. In this case, all of the individual cells within a column could be pointers, even though the objects that are referenced by these pointers are non-homogeneous. It is possible that this mechanism could serve as a generalized solution to the problem noted above (section 6.1.3) with regard to variable length items, although further study is required to determine whether this is the most efficient solution to that problem.

(2) Is the name VOTable too narrowly focused? Nomenclature is
important, as language shapes the way we think, and can thus add to or detract from our clarity of thought.    

The VOTable document will largely shape the VO lexicon for years to come. Perhaps we should not repeat the FITS experience (i.e., of embedding "image" into the format name when the format quickly grew to be much more than just an image transport mechanism), or ignore the lessons of software engineering by not mapping concepts to objects more generally.

The term "VOObject" is more encompassing, and would immediately set the tone that images need to be included as we enhance the format definition. This would as well leave room for future extensions into other structures (trees, lists, etc) that are not merely "... an unordered set of rows, each of a uniform format, as specified in the table ..." and "... derived from the Astrores format, itself modeled on the FITS table format" as is stated in the first paragraph of the VOTable document.

Generalizing the name does not detract from the VO mission of having a usable prototype by year-end, but rather avoids sacrificing nomenclature for expediency. Wouldn't the only alternative be to invent separate VOImage, VOTree, VOList, etc. standards for other potential formats, each of which would largely overlap with VOTable?

(3) In the FITS world the wish has often been expressed that column
and keyword be treated as the same kind of beast (i.e., that a keyword be interpreted as a column with a constant value; or, conversely, that a constant column can be replaced by a single keyword). Something along these lines is incorporated in various sets of conventions. It is also explicitly part of the Aips++ table class.

We would like to suggest that this equivalency principle be built into the VOTable from the start. One way to do this would be to interpret FIELD as a general object, that can optionally have a value. In a sense, the collection of FIELDs would make up a virtual table that is the true object represented by the VOTable. The physical table that is contained in the document is the subset of columns that need to be enumerated and can be made up by having a collection of COLUMN definitions that simply refer to some of the FIELDS.

One could go a few steps further and allow a FIELD to contain a mathematical expression, optionally including other FIELDs as parameters or variables; or to contain references to (parts of) other tables (note that this is different from allowing cells to contain references).

The CfA Virtual Observatory Group includes:

  Alice Argon, Mark Cresitello-Dittmar, Ian Evans, Janet DePonte   Evans, Pepi Fabbiano, Michael Kurtz, Jonathan McDowell, Robin   McGary, Doug Mink, Michael Noble, and Arnold Rots

Comments from Mark Cresitello-Dittmar, Ian Evans, Michael Noble, and Arnold Rots.

-- 
Dr. Ian Evans                               Email:    ievans-at-cfa.harvard.edu

Smithsonian Astrophysical Observatory       Phone:    +1 (617) 496-7846
60 Garden Street, MS-29                     Fax:      +1 (617) 495-7040
Cambridge,  MA  02138,  USA                 Cellular: +1 (617) 699-5152
Received on 2002-05-13Z07:01:22