On Monday 29 January 2007 05:14, Kona Andrews wrote:
> We have a hard time already in persuading third-party adopters
> to install our components, even though they can be installed with
> minimal space usage and locked-down read-only JDBC access to their
> datasets. If we tell them they need to provide either a (potentially
> very) large disk cache or (even worse!) write access to their DBMS
> for temporary tables, many of them simply won't install the component -
> so no TAP at all.
I think Kona mentioned this earlier but the other way to do paging is to do the query again and add some extra monkeying around to get the right batch of rows... but providers who don't want to see any impact are not likely to be too happy with this approach either.
In general, streaming puts the lowest strain on the server-side resources (query once, usually only need a modest buffer in server-side app, DBMS manages worktables as it sees fit, more or less easy to release resources if the client connection is closed).
Chunked returns provide an easier (obvious) way to recover from failures and still (eventually) get the complete result. Now that we are routinely delivering data files in the 100s of GB range, we are seeing quite alot of cases where user downloads fail and are looking at making resumable downloads work where possible.
Note: by using TOP and ORDER BY judiciously, the client can arrange to do the chunking without any support in the service, especially easy if there is a known unique key:
select top N * from some_table
[ where uid > biggest_id_I_have_seen ]
order by uid
If the uid column (or timestamp, or whatever) is always increasing as new data in added, then this works even if the table is getting new content while you are looping to get results. So, I don't see any particular need to specify chunked results in the TAP per se unless they can accomplish something more than this... maybe table metadata could hint which column(s) have the right properties.
> On Sun, Jan 28, 2007 at 10:15:13PM -0700, Doug Tody wrote:
> > I agree that a fully streamed query could be a powerful way to deal
> > with large queries, and we should consider supporting this. However,
> > a fully streamed query is not fully general (e.g., no ORDER BY or
> > anything else which requires management of the full result set on
Not sure I follow this. Sure, the DBMS has to make worktables in some cases where one uses ORDER BY, but the result set from the db can be transformed to XML and written to the output in a stream. Although technically one can put a large amount of content into a DB with modest memory and CPU, that is usually not the case and usually a poor overall system architecture. DBMS vendors and best practices recommend certain balances between disk space, memory, CPU power, etc... I have done alot of crazy stuff to our database and although ORDER BY can add (often, but not always) considerable latency to a query, server side resource usage has never been large (or even medium) problem.
Yeah, the lights blink like mad and the load shoots up, but that's what they are supposed to do :-)
-- Patrick Dowler Tel/Tél: (250) 363-6914 | fax/télécopieur: (250) 363-0045 Canadian Astronomy Data Centre | Centre canadien de donnees astronomiques National Research Council Canada | Conseil national de recherches Canada Government of Canada | Gouvernement du Canada 5071 West Saanich Road | 5071, chemin West Saanich Victoria, BC | Victoria (C.-B.)Received on 2007-01-29Z20:23:29