Re: Comments on V1.1 - Future of VOTable (flame bait sigh)

From: Doug Tody <dtody-at-aoc.nrao.edu>
Date: Wed, 14 Apr 2004 11:46:34 -0600 (MDT)


Didn't we just have this same discussion a while back?

There are some significant advantages to a generic table mechanism.

    o	By definition a generic table mechanism should work well for storing
	tabular data.  General table management software can be written to
	implement the table abstraction, then this software can be reused
	throughout a data analysis system.  Having factored off the table
	abstraction as common software it is worthwhile investing effort
	in such software, e.g., to efficiently handle bulk data.

    o	Data stored in a general can be manipulated with generic table
	tools.	This has been very successful in the past with FITS tables
	and we are seeing it again now with VOTable.

    o	Compatibility with existing astronomical software, much of which
	is table-based.  As Clive mentions, it is easy to modify such
	software to read VOTable as well as FITS ascii and binary table,
	text tables, etc.  Integration of VO and legacy astronomical formats
	such as FITS binary datable is much easier if both implement the
	table abstraction.

    o	Compatibility with non-astronomical software which is also table
    	based, e.g., databases, spreadsheets, statistical analysis tools.

    o	Any approach which uses a general container mechanism is likely
	to be more open than one which uses a custom schema designed for a
	single class of data.  One can represent the core elements of the
	data model in the container, and extract them from the container
	later to manipulate the object in class-specific code.	But other
	information can be stored in the generic container as well.
	This flexibility is important to allow data representation to
	evolve, or to adapt to subclasses of data.

If all one wants to do is serialize a single data model in XML then I agree the simplest thing to do is to define a custom schema specifically for that data model. While simple, this is very restrictive. Anything which does not fit into the predefined schema is either disallowed, or awkward to handle via the schema approach. As soon as we try to model complex datasets by aggregating multiple component data models (as we do in the real world all the time) then the schema-based approach starts to break down. In general the schema approach only works at the level of individual, well defined data models.

Perhaps we should try both approaches. Any tabular data, be it a catalog or a 1D spectrum, can be reasonably expressed using a generic table mechanism, permitting use of generic tools and providing scalability to very large datasets. For any self-contained, well defined data model it is natural to define an XML schema. In cases where the data model is simple enough we can implement a Web service which is schema-based. More complex cases are probably better addressed using a generic, flexible/open document-centric approach such as VOTable or FITS.

Received on 2004-04-14Z19:47:11