Hi All -
After reading Anita's careful review of Spectrum (thanks Anita!) and Jonathan's thoughtful replies I think the issues below are the most important, so some further elaboration follows.
Required/optional vs must/should/may
The advantage of must/should/may is that it allows us to differentiate
between "minimal compliance" (all the "must"s) and "full compliance"
(all the "should"s). This is useful as we want minimal compliance
to be as low a bar as is reasonable, but we would really prefer that
most services implement at least the "should"s. To reward service
implementors for doing more we would do something like flag fully
compliant services in the registry. Hence I tend to agree that it is
useful to make the must/should/may distinction.
In general what is required or optional depends upon how a general data model is used - it might be different in different circumstances. For Spectrum the priorities are probably pretty clear, but for something more general like Char it will really depend upon the application (hence it is not clear how much this should be specified at the level of the Char spec).
Coordinate systems other than just ra/dec
For the 2nd generation DAL interfaces it is probably too restrictive to limit ourselves to only ICRS/J2000, as for SIA. For example, we already have folks trying to use DAL for solar data. A reasonable compromise is to default coordinates to ICRS as in SIA, but provide a means to optionally specify a different coordinate system; whether or not other coordinate systems are supported would be a service-specific capability.
The above refers mainly to the query interface and standard parameters. To describe the actual data we probably want to permit the native coordinate systems of the data to be used. This is already done in SIA 1.0, where the WCS information allows the coordinate system to be specified rather than requiring that a new WCS be computed to publish the data.
Should Coverage.Location (or whatever) be a MUST
I agree with Jonathan that fundamental metadata such as this is a
"must". Anita is correct that it may not be appropriate for all
data, e.g., theory data, but we should at least require it where it
is appropriate for the data. Rather than define what "appropriate"
means it might be better to define values such as "not applicable"
or "undefined", and still require such a value to be specified even
for data where the value is not applicable. This would allow more
rigorous queries to be performed. The problem is, this may not be
possible for numeric values other than in a text-based serialization.
(I saw something like this elsewhere recently, possibly in VOEvent).
Mediation to a standard data model vs pass-through of native data
This is an essential feature of SSA. There is no standard astronomical format for spectra, and at the scale of the VO, where a client application may access spectra from dozens of archives, it becomes impractical for each client application to know how to deal with spectral data from dozens of different projects (sure, a few applications do this now for a few archives, but that is not good enough, and such a scheme will break whenever anything changes).
What we want to make possible is for each SSA service to return data conforming to the SSA data models (Spectrum in this case), so that the mediation occurs once in the service rather than hundreds of times in remote applications. A pass-through for "native" format data is also important, in part for on-the-cheap services that can't perform the data conversion, or more importantly, to obtain direct access to the native data so that clients with intimate knowledge of a specific data collection can take advantage of project-specific features of the data. Both approaches are important.
Target.Name vs dataset IDs, collection, etc.
Target.Name is just the name of the observed object (if any), such as one might pass to a name resolver. (Title is the more important version of this since it always applies and is broader).
Collection is the data collection (ShortName) e.g., "SDSS-DR4" or whatever. DataID.CreatorDID is the dataset ID (URI) assigned to the dataset (spectrum) by its creator, e.g., the survey project or observatory which created the data collection. The CreatorDID does not change if the data is replicated. Curation.PublisherDID is the dataset ID assigned by the publisher, and will be different for each publisher.
It is possible that the published dataset returned by the service may differ significantly from the "parent" (Creator's) dataset, e.g., in the case of virtual or derived data. This can be indicated with the CreationType attribute. For example, if we extract a spectrum from a data cube, CreatorID identifies the cube, PublisherID the extracted spectrum, and CreationType is something like "extracted spectrum". This is a primitive form of provenance model. If a completely new collection is formed by analysis then a new Creator resource is required to describe it. Received on 2006-09-14Z07:42:02