Hi Doug,
Nice summary. We know these 4 fields mini-data model (columns and units) (6 if we include the UCDs) do not cover all the possible native/legacy data but we agree it is a good compromise solution to restore the column names in the spec. We will prepare a usability summary so you could use it for the document.
I think it is not fair to say to data providers "we support native/legacy data, but the VO client applications will not be able to open your data", so a minimal support in the document is required.
About to expand the support to more data formats, we also agree it is a quite complex issue, and probably it is beyond the scope of the current document. This is why we did not expand the mini-data model at the beginning. I think we can give a try with only these fields. In fact, we have not found major problems in the creation of the a.k.a. pre-SSA services and all of them use native/legacy data.
Best Regards,
Jesus
Quoting Doug Tody <dtody-at-nrao.edu>:
> Hi All -
>
> I am on travel now and for a day or so more, so I will only comment
> briefly, on the issue of active mediation vs native data pass-through.
> Jesus asked a number of questions which I think it is probably simplest
> to address by looking at the general approach. This is repeating a
> fundamental architectural discussion which already took place several
> years ago, but is perhaps worth revisiting again now.
>
> To fully deal with the issue of multiwavelength analysis of data from
> many sources, which in the case of spectra involves data that can be
> represented external to the VO in many different ways (including no
> serialized representation at all as in the case of a RDBMS or dynamic
> generation of data), SSA provides as its major interface a mechanism to
> actively mediate spectra to a standard data model.
>
> At the simplest level this is really not all that complicated; some
> general metadata plus spectral coordinate, flux, and error vectors.
> Of course the full model is more complex than that, but the essential
> bits are not terribly complicated. Once we have gone to the trouble of
> identifying the essential data elements and their units in a standard
> form, why not go ahead and provide the vector data as well?
>
> This is a general solution which will work for essentially all 1-D
> spectra. On the other hand, if we try to describe how to map these
> standard data model elements onto some arbitrary external data format,
> in the general case the problem is intractable. Sure, one can do it for a
> simple enough model for various common table formats (assuming the client
> supports all of these), but in the general case the external format can
> be anything and the problem, if posed in terms of an arbitrary external
> format, becomes intractable. By having the data provider actively mediate
> the data on the other hand, we have a straightforward 1-1 transformation,
> performed by software which has full knowledge of the native project data.
> The client sees only the standard representation, so from the client
> perspective it is quite simple.
>
> Hence, at least for spectra (and probably also for time series), general
> multi-wavelength analysis requires mediation to a standard data model.
>
> Pass-through of native project data is also important. This is not so
> much to make things easier for the data provider, but because information
> can be lost in the process of mediating data to a standard data model.
> If the client software knows about data from a specific project it may
> be able to do a more sophisticated analysis working directly with the
> native data. In the general case of course, the client may not be able
> to deal directly with native data.
>
> Hence, SSA provides both active mediation (on the server side) to a
> standard data model, as well as pass-through of unmodified native data.
> This provides both support for general multi-wavelength analysis as well
> as direct access to native data.
>
> What was implemented earlier is an intermediate approach, where native
> data is allowed to be in several standard formats (all of which must
> be supported by the client), and a simple data model with four terms
> is used to identify the spectral coordinate and flux vectors and their
> dimensional units.
>
> While this is simple and works to some extent for conformant data,
> the problem is lack of generality and a too-limited model. It only
> works for some data formats, and puts more burden on the client which
> must understand all possible native data formats. The dimensional units
> lack generality and do not address all the cases; the main alternative,
> the FITS OGIP syntax as used in Spectrum and SSA, is more general (if
> more complex) and is also a broader standard.
>
> In any case this 4 parameter model is a very simple model. To fully
> understand native data will require project-specific information on the
> part of the client. This is not unusual for major data collections,
> and VO should support this by native data pass-through, but the only
> way we can hope to provide uniformity and standardization for data from
> many sources is by developing a more complex generic data model - as
> we have already done for SSA. Some loss of information will occur if
> data is mediated, but that is always the case when data is combined in
> a more general common analysis, and the native data is always accessible
> if required.
>
> A possible compromise here might be to restore SpectralAxis and FluxAxis
> as optional attributes, to go along with the dimensional units for these
> two axes. To do this we would have to specify what the values mean
> and what data formats they refer to. These could be useful to improve
> the support for native data pass-through. However it would be good to
> recognize that this is not a general solution, and if we were to expand
> upon this approach we would likely be reinventing the spectral data model.
> To simplify an increasingly complex client-side mapping we might find
> we needed to perform the transformation on the server side and include
> the data vectors in with the metadata, in which case we would be right
> back where we are today with what SSA already provides.
>
> - Doug
>