Re: [Fwd: Comments on Doug's TAP inputs]

From: Doug Tody <dtody-at-nrao.edu>
Date: Fri, 20 Apr 2007 09:36:15 -0600 (MDT)


Hi All -

I agree with Inaki/Aurielien that concepts and consistent terminology are important; we have been working to a long time to define these carefully within the DAL WG, and to coordinate with Registry in areas where there is overlap. Some elaboration follows.

> - Dataset : a complete entity containing astronomical information of
> some sort. This is an Image (in FITS, JPG, PNG, VOTable, ...) for
> SIAP, a Spectrum (in FITS, CSV, VOTable, ...) for SSAP or even a
> Table (in FITS, CSV, VOTable, ....) for Doug's "Multiple Catalogs"
> part of the use-cases.

A Data Collection is a uniform set of Datasets, generally created as a group (over a period of time), with some unifying purpose, e.g., a survey data release, or data from a specific instrument.

The term "Dataset" is typically overloaded: it can mean a single "primary" Dataset (Image, Spectrum, etc.) or a number of primary datasets. Within DAL we have been distinguishing between the two cases with the terms "primary dataset" (a single Image, Spectrum, etc., as defined by the corresponding DAL interface), and "complex dataset" (a logical association of a number of primary datasets). A Table would appear to be a primary dataset.

> - Data : some queriable information that is usually (but not always)
> saved in a RDBMS. The data for SIAP or SSAP is actually Metadata
> about Images or Spectra, that is Dataset Metadata. In the case of
> TAP the data can be absolutely anything, but should be astronomical
> information about sources.
>
> - Dataset Metadata : for SIAP or SSAP, this is the data. See above.

What is data and what is metadata is often relative; it can depend upon what level in a system you are looking at. In DAL terms, "Data" is a dataset, and "dataset metadata" is metadata describing the dataset. The queryData operation returns (uniform) dataset metadata describing the available datasets matching the query. queryData can be used to acces dataset metadata, without having to retrieve the actual dataset.

> - Protocol Metadata : the definition of what this particular service
> can take as input, can return as output in terms of parameter names
> (POS, TIME, BAND, VERB, ....), table/column names (if supporting
> ADQL), uType names (if supporting ADQL with uTypes) and default set
> of fields/columns returned for a DAL-Type simple query. This can be
> given many names as Table Metadata, Protocol Metadata, Input/Output
> Metadata, and this needs to be accessible from the service itself
> (it may be accessible from the Registry also since the XSD permits it).
>
> - Service Metadata : general information and limits of this particular
> service .... description, curator name, version, URL endpoint,
> records limit, optional protocol methods support, contact email,
> .... All this is what the Registry VOResource XSD is here for and it
> needs to be accessible from the Registry (it may be accessible from
> the service also, but this is no concern of the specification).

I don't there is any such thing as protocol metadata: a protocol is a pre-defined interface for a whole class of services. Service metadata describes a single service instance. Service metadata does *not* describe the individual datasets available via the service; that is dataset metadata, and is created on a per-dataset basis. In general a single service can provide access to any number of datasets.

Regarding service metadata, there have been many discussions between DAL and Registry about this, going back a year or two. We are in the progress of defining this explicitly for SSAP, including in broader terms how a DAL service instance and the registry interact (there is a DAL session on this in China where Ray is slated to give a talk).

To summarize our current thinking (which will be cast in stone shortly due to the need to get SSAP out): a service can be used stand-alone, without any connection to the registry. Included in the service profile is a getCapabilities operation which can return service metadata describing the service instance. This will be cached in the registry - but what the registry contains is only a cached version of this. Ultimately, we will probably provide a means for the registry to query a service to get its service metadata. If the service metadata changes, the registry may ultimately provide an means for the service itself to request that the registry update this information.

The proposed contents of the service metadata are basically what I described in my description of the standard service profile. They are supposed to correspond to what is in a VOResource, and in fact, we are trying to align the XSD so that it is the same in both the service specification and the registry (in general though they can differ somewhat, e.g., due to version issues). The intention is that the DAL service specification is self-contained, and a service implementor will not need to know anything about the registry to implement a service. Hence all the required service metadata will be defined in the service specification.

Received on 2007-04-20Z17:36:38