FYI - Item 3 below contains more detailed information on how queryData and
stageData could interact to implement large queries in TAP.
Hi Roy -
Some comments on the draft IVOA Assessment. These comments don't necessarily indicate that changes to the document are required, rather they are comments on the issues raised more briefly in the document.
In particular, I comment on issues which are high priority for the DAL services we are working on now, e.g, how to handle getCapabilities, and asychronous activities in DAL services.
(1) GetCapabilities method for Services
DAL and Registry are addressing this issue now, and we should have an agreement by the time of the Interop.
The current thinking is that 1) a service should be able to function stand-alone, including describing its capabilities; 2) the service capabilities are *cached* in the registry in the VOResource; 3) so far as possible the information content and format of this data (XML) will be common between the service and the registry (but an exact match cannot be assured, e.g., due to version differences); 4) this agreement in the service and registry capabilities descriptions will be provided by the people writing the service specification, and a service implementor will not need to know anything about the registry or how such information is used by the registry (or any other external software). The service specification will define the service metadata which the getCapabilities operation returns. "Service metadata" describes a service instance, and does not include other information, e.g, dataset metadata describing data returned by the service. The getCapabilities operation is thought to satisfy the VOSI requirements as well.
(2) SOAP and REST
My suggestion is that the major VO service interfaces, which will be widely implemented with astronomy-oriented toolkits (e.g., ACR, VOClient) regardless of the interface, are better done with a simple GET/POST interface (either following an object model or REST depending upon the circumstances). These services also tend to have a fairly coarse interface, with a limited number of operations, and mostly generate document-style output, which is well suited for a GET/POST approach.
The automated interface-building capabilities of SOAP are advantageous for exposing arbitrary (e.g., application level) interfaces as Web services, where such an interface may contain many simple function-like operations, and there may be many such applications, often exposes as one-off services. In this case it is much less likely that custom toolkits will exist, and the automated interface building capabilities of SOAP/WSDL may provide the only practical way to gain access to such applications from multiple client environments. (UWS provides a possible alternative for exposing arbitrary application functionality, but is really a third category, providing a "task" interface instead of a "function" interface).
(3) Asynchronous Services
All of the DAL services are also potentially asynchronous, e.g., SIA may create large mosaic images, sophisticated cutouts from large data cubes, do on-the-fly imaging of radio data, compute 10000 cutouts from survey images to scale up to large scale problems, and so on. Likewise, with the upcoming TAP interface, large table queries may require asynchronous execution; SNAP may likewise access large amounts of data, or compute complex data products on demand. In general, any data access service may require asynchronous execution, especially when scalability is considered.
Basically what we have with asynchronous services is some sort of batch execution mechanism running on a server. This can be anything (CASJOBS, ROME, PBS, GLOBUS, CONDOR, Sun Grid engine, etc. etc.). UWS will provide a general tasking interface to front-end such job execution mechanisms, allowing anything which can be made to look like a task with parameters to be run.
An asynchronous DAL operation is similar to UWS, and in fact probably wants to use much of the same mechanism. The chief difference is that instead of having a generic task-parameter interface, DAL provides an object-oriented interface designed for data access-oriented operations upon a specific class of data. Whenever one does a queryData in a DAL service, and gets back an object descriptor including an access reference describing a virtual data product which can be generated, one has a descriptor (the acref) for a potential batch job. This acref completely describes the proposed job to be run, in terms which the service which generated it can understand. The "job" can be arbitrarily large or small; the query response can include "cost" metadata to allow the client to decide if it wants to run the job, or iteratively refine the size of the job to be run. A single query can describe any number of such possible "jobs".
The DAL service profile (common to all DAL services) provides for both synchronous and asynchronous execution. Where possible, the simpler synchronous operations are used. Asynchronous capabilities are provided, not by a different service interface, but by an optional "stageData" operation. If we compare this to UWS, basically what happens is that the queryData/acref mechanism replaces the task-parameter interface in UWS. By invoking the "stageData" operation, the client submits a batch job to the service (telling it to run a number of jobs identified by their acrefs) and gets back some sort of jobid. Probably much of the rest of the UWS mechanism can be common, e.g., poll requests or messaging used to monitor job status. Once such a job is started, one probably doesn't even need the DAL interface any longer to monitor job status. When eventually the job completes, a conventional DAL "getData" can be used to retrieve the data product, or optional delivery methods (e.g., VOSpace) can be specified in the stageData request when the job is initiated.
Monitoring job status during execution could be provided via a simple
REST-type interface used to poll for status (<baseURL>/<jobId>/status),
or via an AJAX-like streaming GET, which would provide a simple real time
messaging mechanism for delivery of asynchronous events. During job
execution the service would send period "percent done" type messages
(which also serve to avoid time outs), and real event messages when some
sort of actual asynchronous event occurs, such as completion of a data
product or job, or an error of some sort. Again, aside from the queryData
and acref, most of this can probably be common with something like UWS.
(6) Data Models and UTYPEs
There is also the related issue of defining a data model as an abstraction, independently of implementation. Usually when we have "data models" for which there are no UTYPEs, it is because the data models are actually merely data structures implemented in some specific technology such as XML and XML Schema. If the data model is defined as an abstraction, e.g., RM, Spectrum, SSAP, etc., then it is natural to parameterize the data model as a collection of specific data model fields with associated UTYPE.
(8) Regions of the Sky
I agree that this is an issue; we will need to address this in order to define a more general POS/SIZE type region specification for future versions of the DAL interfaces, such as SIA V2 and TAP.
Applications Messaging and Client interfaces are also hot topics, however I will comment on those elsewhere as this is already long enough. Received on 2007-04-22Z22:07:24