RE: Definitive version of the VOTable schema for web services

From: Linde, A.E. <ael13-at-leicester.ac.uk>
Date: Wed, 30 Jul 2008 23:11:47 +0100


VOSpace will certainly help keep resultsets off the desktop, Alex, but we need to make it *easy* to build, deploy, register, find and execute server-side applications which operate on that data so that processing is offloaded as well as the data.

t.



From: Alex Szalay [szalay-at-jhu.edu]
Sent: 30 July 2008 20:39
To: 'Linde, A.E.'; grid-at-ivoa.net; votable-at-ivoa.net Subject: RE: Definitive version of the VOTable schema for web services

Quite right, but all is not lost ... This is what the VOSpace framework should provide -- we can (and hopefully will) build a very nice asynch messaging over the VOSpace.

--Alex

-----Original Message-----
From: Linde, A.E. [mailto:ael13-at-leicester.ac.uk] Sent: Wednesday, July 30, 2008 3:03 PM
To: grid-at-ivoa.net; votable-at-ivoa.net
Subject: RE: Definitive version of the VOTable schema for web services

I think one of the failures of the whole VO effort has been the inability to take processing of huge datasets off the user's desktop/laptop and onto servers where it could be performed more efficiently. We've developed great ways of getting results from a diverse range of databases but it all still comes back to the user. Maybe the next phase of the VO ought to focus more on this issue.

t.



From: Dave Morris [dave-at-ast.cam.ac.uk]
Sent: 30 July 2008 16:34
To: Anita M. S. Richards
Cc: Grid_Ivoa_List; IVOA VOTable
Subject: Re: Definitive version of the VOTable schema for web services

As Guy pointed out, putting the data in the SOAP response causes enough problems for the software experts, and is setting a trap for astronomers who just want to write a simple program to connect to a service and get at the data.

One of the drivers for the VO was to create tools that astronomers could use to access and process data from large data sets without requiring expert programming knowledge. Relying on Moores law is not an option. However much memory you can pack into a laptop/desktop, it will not be able to keep up with the rapidly growing data sets held by the data archives. It is not beyond imagining that a valid science query to a large data set could return 21G bytes of data.

One of the reasons for using SOAP/WSDL is that non-expert programmers should be able to use a generic toolkit to connect to a service and process the response automagically, based on the structure defined in the service WSDL.
If the WSDL defines the data as a string, then the toolkit will treat it as just that - a single string. So if Anita used a Python SOAP library to call one of our services, it would hand back the results as a single string .... all 21G bytes of it, in one large anonymous BLOB, probably melting her laptop in the process. In which case, we might as well return the results as base64 encoded FITS and be done with it.

One of the advantages of using XML is that it should be possible to process it as a stream of elements, allowing the client to process the data one row at a time.

Ray mentioned avoiding the WSDL generated classes and treating the response as a document. Using the SOAP toolkit to return the contents as an array of DOM elements would still mean that the client would have to hold all 21G bytes of data in memory, albeit split into a tree of thousands of tiny DOM elements. However, the more recent SOAP toolkits (e.g. Axis2 and XFire in Java) can process the XML one element at a time, without building the entire tree. These tools would allow the client to process the data one row at a time, without holding the entire data set in memory.

Matthew mentioned using XSLT to process the response. A good example of this would be an XSLT processor that parsed the data one row at a time, kept the few 'interesting' rows and threw away the rest. If data contained one 'interesting' row in a thousand, a simple parser on an ordinary laptop could process the 21G byte stream and return the 21M bytes of 'interesting' rows (network bandwidth allowing).

Whatever we replace/update VOTable with it should be easy process the service response as a stream of rows, without requiring the client to hold the entire data set in memory. If not for the programmer/astronomers, then as a software developer I know it would make my life a lot easier.
My job is to give Anita a Python package she can use in her scripts that calls a service, processes the results, and returns a simple Python object with two methods, hasNext() and getNext(). Without requiring her to upgrade her laptop with 30G of memory and a muti-core CPU.

Dave

Anita M. S. Richards wrote:

>
> On Wed, 30 Jul 2008, Ray Plante wrote:
>
>>> astronomer
>>
>>
>> I believe it was implicit in the discussion that by "astronomer" we
>> meant the "scripting astronomer", one who has enough scripting
>> ability to use, say, a Python module to access a web service.
>>
>> cheers,
>> Ray
>>
>
> That is fine for e.g. the VO expert in a large project or students
> whose project has a major fraction working on these sorts of data,
> but it excludes the majority of astronomers; whereas most astronomers
> do use VOTables although most are not aware of it. If the minority
> are your target audience, fine. Regarding the list of pacjkages from
> fortran to IDL... most astronomers will learn one or two, they will
> _not_ learn the whole lot. Currently, to make best use of VOs, people
> need a bit of SQL plus _one_ scripting language out of python, perl or
> IDL, in most cases. That is about as much as we can expect.
>
>
> cheers
> a
Received on 2008-07-31Z00:16:14