Re: RFC initiated for Simple Spectral Access protocol

From: <jsalgado-at-sciops.esa.int>
Date: Tue, 26 Jun 2007 11:19:27 +0200


Hi Doug,

Nice summary. We know these 4 fields mini-data model (columns and units) (6 if we include the UCDs) do not cover all the possible native/legacy data but we agree it is a good compromise solution to restore the column names in the spec. We will prepare a usability summary so you could use it for the document.

I think it is not fair to say to data providers "we support native/legacy data, but the VO client applications will not be able to open your data", so a minimal support in the document is required.

About to expand the support to more data formats, we also agree it is a quite complex issue, and probably it is beyond the scope of the current document. This is why we did not expand the mini-data model at the beginning. I think we can give a try with only these fields. In fact, we have not found major problems in the creation of the a.k.a. pre-SSA services and all of them use native/legacy data.

Best Regards,
Jesus

Quoting Doug Tody <dtody-at-nrao.edu>:

> Hi All -
>
> I am on travel now and for a day or so more, so I will only comment
> briefly, on the issue of active mediation vs native data pass-through.
> Jesus asked a number of questions which I think it is probably simplest
> to address by looking at the general approach. This is repeating a
> fundamental architectural discussion which already took place several
> years ago, but is perhaps worth revisiting again now.
>
> To fully deal with the issue of multiwavelength analysis of data from
> many sources, which in the case of spectra involves data that can be
> represented external to the VO in many different ways (including no
> serialized representation at all as in the case of a RDBMS or dynamic
> generation of data), SSA provides as its major interface a mechanism to
> actively mediate spectra to a standard data model.
>
> At the simplest level this is really not all that complicated; some
> general metadata plus spectral coordinate, flux, and error vectors.
> Of course the full model is more complex than that, but the essential
> bits are not terribly complicated. Once we have gone to the trouble of
> identifying the essential data elements and their units in a standard
> form, why not go ahead and provide the vector data as well?
>
> This is a general solution which will work for essentially all 1-D
> spectra. On the other hand, if we try to describe how to map these
> standard data model elements onto some arbitrary external data format,
> in the general case the problem is intractable. Sure, one can do it for a
> simple enough model for various common table formats (assuming the client
> supports all of these), but in the general case the external format can
> be anything and the problem, if posed in terms of an arbitrary external
> format, becomes intractable. By having the data provider actively mediate
> the data on the other hand, we have a straightforward 1-1 transformation,
> performed by software which has full knowledge of the native project data.
> The client sees only the standard representation, so from the client
> perspective it is quite simple.
>
> Hence, at least for spectra (and probably also for time series), general
> multi-wavelength analysis requires mediation to a standard data model.
>
> Pass-through of native project data is also important. This is not so
> much to make things easier for the data provider, but because information
> can be lost in the process of mediating data to a standard data model.
> If the client software knows about data from a specific project it may
> be able to do a more sophisticated analysis working directly with the
> native data. In the general case of course, the client may not be able
> to deal directly with native data.
>
> Hence, SSA provides both active mediation (on the server side) to a
> standard data model, as well as pass-through of unmodified native data.
> This provides both support for general multi-wavelength analysis as well
> as direct access to native data.
>
> What was implemented earlier is an intermediate approach, where native
> data is allowed to be in several standard formats (all of which must
> be supported by the client), and a simple data model with four terms
> is used to identify the spectral coordinate and flux vectors and their
> dimensional units.
>
> While this is simple and works to some extent for conformant data,
> the problem is lack of generality and a too-limited model. It only
> works for some data formats, and puts more burden on the client which
> must understand all possible native data formats. The dimensional units
> lack generality and do not address all the cases; the main alternative,
> the FITS OGIP syntax as used in Spectrum and SSA, is more general (if
> more complex) and is also a broader standard.
>
> In any case this 4 parameter model is a very simple model. To fully
> understand native data will require project-specific information on the
> part of the client. This is not unusual for major data collections,
> and VO should support this by native data pass-through, but the only
> way we can hope to provide uniformity and standardization for data from
> many sources is by developing a more complex generic data model - as
> we have already done for SSA. Some loss of information will occur if
> data is mediated, but that is always the case when data is combined in
> a more general common analysis, and the native data is always accessible
> if required.
>
> A possible compromise here might be to restore SpectralAxis and FluxAxis
> as optional attributes, to go along with the dimensional units for these
> two axes. To do this we would have to specify what the values mean
> and what data formats they refer to. These could be useful to improve
> the support for native data pass-through. However it would be good to
> recognize that this is not a general solution, and if we were to expand
> upon this approach we would likely be reinventing the spectral data model.
> To simplify an increasingly complex client-side mapping we might find
> we needed to perform the transformation on the server side and include
> the data vectors in with the metadata, in which case we would be right
> back where we are today with what SSA already provides.
>
> - Doug
>



This message was sent using IMP, the Internet Messaging Program.

This message and any attachments are intended for the use of the addressee or addressees only. The unauthorised disclosure, use, dissemination or copying (either in whole or in part) of its content is prohibited. If you received this message in error, please delete it from your system and notify the sender. E-mails can be altered and their integrity cannot be guaranteed. ESA shall not be liable for any e-mail if modified.
Received on 2007-06-26Z09:22:52