Hi Francois -
I have been pondering your email from yesterday for a while, as I am not sure I understand what the issue is. In the discussions you refer to, Pedro and I were talking about service metadata and data staging, which is not a data query issue.
On Wed, 25 Apr 2007, Francois Ochsenbein wrote:
> I share Pedro's worries -- in the original IVOA design VOQL/ADQL was
> presented as a language which would be generally used to locate/select
> the available data in the various IVOA components (directly from servers
> or through registries). The various DAL realisations seem to include now
> querying facilities -- I would however call these 'filtering capabilities'
> since these are used to restrict the interesting set of 'files' to the user's
> wishes.
>
> Some clarification between the roles of
> -- this 'filtering capabilities' applied to tabular material
> discussed by Doug
> -- the query language (more than just a filtering operation)
> -- the TAP itself
> seems to be necessary.
I agree that the VO requirements for query capabilities to discover and select data, and operate upon table data, should be more than what can be provided by a *single* data service (which is that TAP addresses, for table data). An example of this is the distributed query and cross-matching problem, which is a more powerful mechanism built on top of TAP. Another example might be an ADQL interface to the registry, used for the first stage of data/server discovery and selection. Another example could be a "google like" global discovery mechanism, based on harvesting of more detailed dataset metadata from services, similar to the way Google harvests information from Web pages (this came up earlier in a different forum - see the message I append below). Probably there are other examples as well. The only part of this which is specific to DAL, is the TAP.
The data services have always had basic query capabilities of course. These permit queries based on the data model of the data being accessed, and are specific to each type of data. Data service queries are also part of the "virtual data" mechanism, which is fundamental to data access. If all we had was a global data discovery mechanism, e.g., some sort of global data catalog, we would probably be limited to retrieving whole datasets. The query mechanism in a DAL service allows the client to negotiate on the details of virtual data, hence it is fundamental to data access. A cutout on an image is not very different from a SELECT on a single table; both are examples of a subset or filter, as you say.
In the case of TAP, with an ADQL-based query it is also possible to do operations which combine data from multiple tables - so long as they are part of the tableset managed by a single service. As you say, this is more than just the usual subset/filter/transform. It is an actual data combination operation. It is still an operation against a single service instance which returns a table however, so it can be considered another form of virtual data operation.
In the longer run, there is also the old question of whether we want to provide ADQL capabilities for data services other than TAP. The thinking has always been yes - it is just an alternative form of query, returning the same query response as a parameter-based query. Such query would be posed on the data model defined for the data being accessed, e.g., the SIA or SSA query response.
Something like the TAP metadata queries could also potentially be useful, e.g., to determine the output columns which are supported by a queryData. This could provide a way, consistent with TAP, to discover any extra non-standard columns which are supported by a given service.
In the case of TAP I do think it is important, in the general case, to support services which can operate upon multiple tables, hence we can get into the types of multiple table operations outlined in my earlier analysis. Plus of course, the case of an ADQL expression which operates upon multiple tables (one important case of this is a table previously uploaded by the client).
Getting back to the issue of what information needs to be in the registry, I found Ray's example use case of a registry search on the UCDs of table columns interesting. This is a case where such information may be needed to discover the data services useful for the next stage of analysis, before a data service is accessed, hence it could be a useful thing to have in the registry.
Perhaps this is merely another case of successive levels of detail in metadata. In general, resource metadata provides rough information regarding things like coverage, and this is used for the first stage of data and service discovery. One then goes to things like footprint and data services, which provide much richer and more accurate and up to date metadata. This is then used by the client to further narrow down the selection of data to be used for analysis. Finally the actual datasets of interest are accessed, and these come with a third level of detailed metadata sufficient to support actual data analysis.
In the case of catalog metadata (a hot topic for the table access protocol currently under discussion), we may very well want to cache some table metadata, such as the UCDs of table columns, in the registry, to support high level discovery. We should think more carefully about how much such data needs to be in the registry to support resource discovery for table data at this level. However, this is not the same thing as storing detailed dataset metadata of the sort needed to plan precision physical dataset access. - Doug Received on 2007-04-26Z23:40:42