TAP Protocol Use-Cases D.Tody April 2007 This analysis is a companion to the description of the proposed TAP protocol. To illustrate how the proposed TAP interface would work, we have worked out some typical use cases. 1. Simple Queries This is the simplest case, and a minimally-compliant service might implement only case 1.1. In the minimal case, this is basically a cone search, although a simple service might support a few more parameters than are defined for the simple cone search. 1.1 Simple model-based query of a single catalog Assuming that this is not a "large" query (allowing immediate mode to be used), this is a lot like cone search, except that a richer set of parameters are available to refine the query. Everything is done with a single queryData operation. By model we mean POS/SIZE, BAND, TIME, SNR (or other limiting flux measure), REDSHIFT, TARGETCLASS, etc. (possibly this should be further refined for catalog data but what we already have is a useful start for source catalogs). If POS is generalized to permit, e.g., rectangular as well as circular regions (as for SIA) then this would provide a simple uniform way to retrieve "cutouts" from source catalogs, e.g., to overlay on an image. 1.2 Parameter or ADQL-based query of a single catalog This is the same as the previous case, except that the query is posed in terms of the table fields, UCDs, or possibly model terms, and can be expressed in ADQL (much as in the V0.1 spec). The only difference is how the query is posed. A prior table metadata query will be required to discover the table fields (possibly a more general UCD-based query could be posed instead). 2. Multiple Catalogs This case need only be supported by larger data centers. 2.1 Simple model-based query of multiple catalogs This is the same as 1.1 except that immediate mode is not used, rather each row of the output table describes a single catalog. General dataset metadata (DataID, Curation, Char, etc.) is returned, as well as some catalog-specific metadata (catalog size, number of fields, catalog type, etc.). In the case of small catalogs, a simple acref can be used to retrieve the entire catalog. In the case of larger catalogs, the acref will return a "cutout" (subset) of the table; this acref could be a case of query 1.1 for example, although the client need not know that. In a typical scenario, an application would query a large data center which provides access to many catalogs, and pose a general model-based query to discover catalogs of interest. The application would list these for the user, who would select the catalogs of interest (in an applications such as Aladin for example, these could be displayed as overlay layers). In other scenarios, such a query might merely be used to retrieve metadata describing catalogs matching the query. 2.2 Parameter or ADQL-based query of a multiple catalogs As for 2.1, except that the query is based on UCDs (or possibly model parameters), and can be expressed in ADQL. The only difference is how the query is posed. 3. Large Queries By "large query" we mean a query too large to be completed in a single synchronous operation. An intermediate case occurs when the output table can be computed quickly, and streamed back to the client, in which case very large tables can be queried and transferred in a single operation. 3.1 Single Output Table A standard non-immediate queryData is performed against a single catalog. The output table contains a single row describing the query operation to be performed, and the output table which would result (if the are output options, e.g., for multiple output formats, the response table could contain several rows). Describing the query operation means things like estimating the compute time required, and the size of the output table dataset to be returned. Describing the output table which would be returned is similar to what is done for any other virtual data described by queryData in a DAL service (the table size, characterization, etc. would be described using standard metadata). An acref is provided which can be used to compute and retrieve the table dataset. If the query is small enough (as indicated in the query response metadata), this can be used to directly retrieve the dataset with a GET. If the query is a large operation, this acref is used as input to stageData to initiate a job to compute, and possibly transfer, the table. The client can issue a sequence of queryData operations to refine the query, before initiating the actual data query, e.g., to reduce the cost of the query to an acceptable level. The stageData operation is a POST, used to initiate an asychronous batch job on the server. The data uploaded by the POST is TBD, but could be parameterized, based on the UWS model, an XML document, etc. Acrefs, returned earlier by a queryData, are used to refer to the dataset(s) to be generated. During execution, a GET operation of some sort (TBD) can be used to poll for job status. Alternatively, a streaming GET could be used to hold open a connection, allowing the service to send message events back to the client to monitor job status in real time. Using the job ID returned by the initial stageData operation, multiple clients can query the job status if desired. Generated data can be cached on the server and retrieved with a simple standard getData (streaming GET), or it could be delivered to a local or remote VOSpace. (Probably all of this is the same for any DAL service which supports grid capabilities.) 3.2 Multiple Output Tables This is the same as 3.1 except that the query is posed against multiple data tables. Using stageData, it is possible to initiate multiple large queries which can execute in parallel, e.g., on a large cluster or grid system. 4. Metadata Queries 4.1 The standard queryData operation returns generic dataset metadata describing the available output tables (which may be virtual data), much as for any other DAL service. 4.2 When posed against multiple tables, 4.1 also serves to list the tables available via a service. 4.3 To query the fields of a single data table, a queryData operation is posed against a single table, using either a parameter or the table name syntax (TBD) to indicate that metadata describing the table fields should be returned. The output is a table, each row of which describes one table field, providing standard metadata to describe each field. This metadata is TBD, but could be based on the description of a tabular resource in the registry. The output table could be returned in a variety of formats (VOTable, XML, CSV, etc.; this option could be provided for the output of any other queryData operation as well). 5. Service Capabilities 5.1 Service capabilities are defined by the service metadata as returned by the getCapabilities operation. Simple client queries probably do not need to deal with service metadata at all. More advanced queries will need to examine the service capabilities to determine whether the service can query multiple tables, perform large queries, to determine what ADQL features are supported, and so forth.