TAP and large resultsets

From: Kona Andrews <kea-at-roe.ac.uk>
Date: Tue, 9 Jan 2007 15:43:42 +0000


Greetings colleagues,

Happy New Year and I hope you have had a restful and productive holiday. Mine was productive of two colds and a mild case of flu, which just goes to show what a bad idea it is ever to stop working ;-)

Prior to our telecon next week, I wanted to raise a point about the TAP protocol, in particular about having a paged interface for large queries (so the user can bring back the query results in small ordered chunks), as we briefly discussed last time.

First, some background.

In AstroGrid, we have a deployment-oriented remit whereby part of our goal is to get our software deployed in "third-party" institutions (i.e. by people outside our own team/locations), and ideally in *all* UK institutions. Two things that deployers emphasise as critical to them are:

  1. Components should have a low installation/maintenance cost in human time (and I acknowledge we still have much work to do here!)
  2. Components should have a low resource requirement

(In other words, "we'll deploy it as long as we don't have to do very much and it doesn't require any additional hardware; we have no time and no money." Etc. Fair enough.)

In the case of the Astrogrid DataSet Access (DSA) component, the architecture was very carefully designed to be fully streaming (partly to reduce resource requirement, and partly to ensure an architecture that scaled to the very large queries envisaged as a normal event in the VO). In other words, in the course of processing a query, the query results never need to be cached in memory or on disk. This means, for example, that a DSA running in a tomcat with (e.g.) 64Mb of memory and no additional "scratch disk" resources can successfully return multi-*gigabyte* query results files to VoSpace, if requested.

This fully-streamed approach has additional benefits, in that the component is not vulnerable to the filling-up of disk caches and there is no disk-maintenance load (flushing old files, managing quotas etc). However, the streamed approach has implications for offering results paging as a part of TAP - namely that, since the results are not cached anywhere, each time a page is requested in a TAP query, the full query must be (re-)run and only the relevent subset of results returned to the user.

While inefficient, this is obviously not impossible to implement, and we can certainly implement paging as part of our TAP support. However, I am strongly opposed to making the paged interface *compulsory*.

Our observation with "real deployers" of AG software has been that, if an AstroGrid component starts to hammer too heavily/obviously on an institution's resources, then the institution responds by wanting to disable it (perhaps I should have added a point 3 above: "Give us any trouble and you're outta here..."). For example, some AstroGrid deployers have specifically disabled the conesearch interface on their DSAs until conesearch efficiency improvements are in place [mea culpa]).

If paged support in TAP is *optional*, then we can provide a mechanism to selectively disable it. Then, if an institution finds that paged querying is clogging up the database because of the repetition of intensive queries, they can switch the *paging function* of TAP off (or limit/throttle it in some way), but still support e.g. simpler unpaged queries. However, if paging is compulsory in TAP, then they may just switch the whole TAP interface off - or maybe the whole component - to the greater detriment of the users who then can't run queries at all.

I realise that it may seem that I'm driving the interface protocol spec based on a particular implementation (our streamed DSA component). However, I do honestly believe that a streamed architecture for querying is the only sensible choice for scalability (handling arbitrarily large results and arbitrarily large numbers of simultaneous queries); anything based on disk caching is always going to hit the limits of the available disk cache at some point - sooner rather than later if deployers are stingy with resources.

All the best,
Kona Received on 2007-01-09Z16:43:57