Re: Asynchronous querying and tabular data

From: Doug Tody <dtody-at-nrao.edu>
Date: Wed, 2 May 2007 12:07:50 -0600 (MDT)


On Wed, 2 May 2007, Patrick Dowler wrote:

> Ok, I see where you are coming from. I think the disconnect is that everyone
> else (me included) sees TAP as a single step process which can be sync or
> async; the service would decide which to support.
>
> I think estimation is more or less pointless - even the RDBMS with all kinds
> of internal knowledge and statistics has a hard time chosing a good query
> plan and none of the 4-5 I have used have an estimating built in. There is
> good old "select count(*)" but that is faster if the query cost is dominated
> by delivering the rows, which is not always the case. It is more often than
> not dominated by the cost of joins (including using an index and then looking
> up a bunch of rows in the table - which has cost that scales just like a
> key-join).

I also suspect that query estimation could be quite difficult in general, although for really large queries where significant resources are required it is probably necessary.

I don't any reason why we couldn't have a way to submit a query directly to execute as an asynchronous operation. For TAP this may be all that is required. A simple way to do this might be to just skip the queryData, and issue a stageData instead, containing all the query and staging information directly in the job description.

A single service could support both: queryData for synchronous DM and ADQL-based queries, and optionally stageData for asyn/staged execution. The client would then either have to guess which to use, or try a few smaller synchronous queries first to determine what to do, and then resubmit a larger query as a batch job.

Received on 2007-05-02Z20:16:52