Re: Comments on V1.1 - Future of VOTable (flame bait sigh)

From: martin hill <mchill-at-dial.pipex.com>
Date: Fri, 16 Apr 2004 12:15:27 +0100


Mark Taylor wrote:
> On Wed, 14 Apr 2004, martin hill wrote:
>>Nor do we save any effort in using VOTable to represent SEDs than in writing a >>new SED schema (except maybe for a few people to learn schemas - but this is no

>>harder than learning VOTable and it's a good skill to have!). 

>
> An important bit of effort which is saved is coming up with facilities for
> dealing with, crudely, columns of numbers. Schemas are not designed to
> represent tables, and though there are plenty of industry-standard tools
> for doing computer-science-type tasks with them (validation, web service
> specification, searching), the same is not true (as far as I'm aware)
> for astronomy-type tasks which rely on the tabular nature of the data
> (plotting columns against each other, converting to a FITS table or other
> tabular format for use with legacy applications, calculating statistics).

I suspect I haven't understood this right. You can use schemas to describe tabular data very easily - The VOTable schema is an example of this. And I don't see too much wrong with using VOTable (or even, <spit>, CSV!) for representing tables. My issue is that using a generic tabular form (of any sort) is not the right way to communicate between web services; we should be explicitly 'typing' the data and its natural structure, which is *not* purely tabular. A fair amount of astronomy data is in *relational* tables because that has been the only suitable storage mechanism available, and VOTable does *not* represent relational tables - at least, not without an extra VO-specific layer to resolve the relationships. It's just as easy to write a new schema as it is to design and agree a new VOTable flavour, and you get many advantages to doing so.

One of the advantages of using tables is, as you say, you can do statistical analysis/plots/etc. Good stuff. Let's use VOTable for that - as well as CSV and any other commonly used formats used - as inputs to these tools. Transformation sheets that take, eg, SEDs and create CSVs to be imported into Excel (or your favourite non-microsoft spreadsheet) would be very useful.

> An important related point is that VOTable provides ways of storing and
> transmitting very large data sets for which raw XML is not well suited
> in terms of bandwidth and/or processing efficiency.

Of course, VOTable/XML *is* XML... Any difference in size between VOTable/XML and XML with longer element names is irrelevent; if size is a problem we should not be using Votable/XML in the first place! And there are many unresolved issues with using VOTable/FITS; mostly based around ensuring that the VOTable is correct for the wrapped FITS file. Using this mechanism you can submit FITS catalogues to SExtractor... We are blinding our web services to each others' capabilities, something we want to avoid as part of our 'metadata-rich' VO. There are other possible solutions to using binary XML-like structures without having to go through ASCII.

> If a SED is passed around as a VOTable then the application programmer
> can use an existing VOTable processing library to turn it into something
> which looks like a column of numbers for further processing without
> further ado, or the astronomer can use a VOTable-aware tool to do
> something tabularly-generic with it such as plot.

But we don't want to have to build special VOTable processing libraries in all the various languages that astronomers use. We should be using standard XML tools (below) to move data from one existing tool to another.

> If it's passed
> around using a SED-specific schema this functionality has to be
> rewritten from scratch for SEDs (and the same applies for any other
> specific formats that we want to define new schemas for).
>

With VOTable the same thing is going to have be done 'from scratch' for each flavour of VOTable - ie converting from an SED VOTable to SED visualisation tools. This is 'hidden' because we can lump it all under 'developing a VOTable tool'.

Our data formats should only be storage/transportation mechanisms; we should not be trying to build our own toolsets around them. There are plenty of existing tools in the astronomical world for visualising and analysing (and producing) data. We need to be able to transform the outputs of one into the inputs of another. VOTable provides an illusionary 'many to one to many' interface format, but the 'one' in the middle is of many flavours, and yet the flavours are not explicitly specified in any standard way. Further, we are finding we have to build a huge *code*set around it, which needs to be installed and run somewhere (ie a new toolmaker is going to have to do so).

Instead we should have XML message formats that correctly represent the data, and transformation sheets that understands the source data type so it can produce the right 'native' format for the target tool. XML transformation sheets are an industry-standard conversion mechanism available on many different platforms, so people who want to introduce new tools to the VO can use them rather than find and learn some VO-specific tools.

Building specialised schemas and associated sheets is *not that hard*! With an SED-specific schema, we add a transformation sheet to create (say) Votables and CSVs for things like spreadsheets, and other transformers as required for existing SED visualisation tools. This means that if you run an SED visualisation tool *it will only accept SED data*. It means you can construct the right input document, with help from XML-building tools. You can't throw a sky catalogue at it, and you will know before you try.

>>I have been assuming that VOTable is for representing 2d tables (I realise it 
>>can now hold tables that include tables).

>
> If I understand this statement correctly, I don't think it's true.
>

Someone said recently you can store arbitrary data structures in VOTable? I don't believe this (and I shy from what an example would look like - but go on, show me!) The structure of VOTables are based around <TD> and <TR> elements. There is not even a way of setting up (XML-based) relationships between rows or cells. To do this we again have to set up a VO-specific interpreters over the top of standard XML ones to do what we need.

In summary: we need VOTable (for many of the reasons discussed). But we should not be trying to use it as the sole representation of all astronomical data for either transport or store. It is not designed for it, we don't have the tools for it, we shouldn't be spending effort making tools for it, and we will be shooting ourselves in the foot trying to do so.

MC

-- 
Martin Hill
Software Engineer
AstroGrid @ ROE
Tel: +44 7901 55 24 66
www.astrogrid.org
Received on 2004-04-16Z13:16:02