Re: SED FITS Serialisation: multi-extension?

From: Alberto Micol <Alberto.Micol-at-eso.org>
Date: Mon, 13 Jun 2005 16:16:04 +0200

On Jun 13, 2005, at 14:23, Markus Dolensky wrote:

> Hi Alberto,
>
> Would you mind specifying which parts of either the spectral DM doc ...
>
> http://www.ivoa.net/twiki/bin/view/IVOA/IVOADMSpectraWP
>

Yes. Specifically chapter 8 "Serializations". Sorry not to have mentioned that earlier.

> or the SSA interface doc. ...
>
> http://www.ivoa.net/internal/IVOA/InterOpMay2005DAL/ssa-v090.pdf
>
> ... triggered particular thoughts?
>
> For instance, what is meant by a "VOTABLE accompanying the SED"?
> There is going to be a VOTable query response, but the serialization
> can either be an XML or VOTable document or a FITS binary table.

As I see it, tell me if wrong, the SSA client receives back a VOTable response,
which might point to some FITS file for individual segments, or even for a bunch of segments at once. The VOTable is the "messenger" and I see it quite volatile; the associated FITS instead, containing the actual data, is
to be stored by the end user for subsequent scientific analysis. And I'm afraid
that, as soon as the message is received, the VOTable will be kindly moved to .Trash
hence leaving no idea to the end user of which segment had certain characteristics;
even the Provenance (in DM terminology) of any segment might go lost.

It is particularly important to remember which reference files were used for those archives that offer on-the-fly calibration, where the SAME dataset
at different times will originate different (better) products as time goes by.
If the user loses that info, s/he will not be able to know whether a given
product is still the best possible (the "current" one) for a given observation.

>
> The scope is 1d spectra and time series. Are you suggesting to expand
> this for V1.0 of the two docs?

No, I'm not looking for an "expansion", I'm just considering a different (let me say "better") serialisation.

>
> Remember, we are trying to serialize a DM. So, are your suggestions
> aiming at expanding the DM or the way its implemented (serialized)?

The second one.

> > Conclusions: I see only advantages in adopting MEF, am I biased?
>
> Does it mean to give up on serializing a particular DM and to use
> existing formats instead?

Not at all. We need to agree to a single particular DM, otherwise it would be
a mess. when I say MEF I don't just say "any MEF". I'm considering an MEF that contains what the SED DM imposes, but also allows Data Provider's specific info.
Regarding metadata:
  Even the current DM allows for "more keywords" than just the suggested   standard ones. My idea is to preserve all the metadata that the DM already promotes,
  and *at the same time* preserve all the metadata that the data provider has
  already published. I can see that only with a MEF (one header per extension, i.e. per segment).
/* Note: SED proposed keywords "shall" not clash with the commonly used ones.*/

Regarding the data:
  The actual format of the data is NOT to be the original format adopted maybe 20 years ago by a data provider; that of course needs to be standardised,
and the currently SED0.93 proposed solution is to use a binary table with
one segment per row.
Instead I'm proposing a MEF to allow for more metadata than just the VO ones (see above),
and to be able to cope with other kind of data like the echelle or the spectropolarimetry,
which are still to be seen as 1d spectra, but need extra "columns", a concept
ruled out by the current SED 0.93. That's why I'm suggesting one binary table
per segment; a binary table per segment allows the data provider to fold into the VO standard all the information judged to be useful. For example I can imagine useful to associate with the standard WAVELENGTH, FLUX
and ERROR other columns like the SUBTRACTED BACKGROUND" etc. Or, as is the case for spectropolarimetry, to add columns to store the Stock's parameters.

And again:
> Does it mean to give up on serializing a particular DM and to use
> existing formats instead?

At the contrary: I am proposing "one format to rule them all". And in fact:
> BTW, the next step on the roadmap is to unify access to images,
> spectra and catalogues by means of ADQL.
also for images we probably need MEF if we want to offer not just the image
but also the accompanying weight maps, data quality, etc. Hence MEF is good for both imaging and spectroscopy.

> This is just to better understand your comments that you thankfully
> took the time to put down.

Thanks for having taken the time to read me! :-)

> Cheers,
> Markus
>

Ciao,
Alberto

>
> Alberto Micol wrote:
>> Dear SSA/SEDers,
>> I'd like to comment on the serialisation aspects of the protocol
>> which now states that each segment is one row in a fits binary table.
>> In such serialisation the characterisation is left completely to the
>> VOTable
>> accompanying the SED, since it becomes impossible to characterise
>> each and
>> every segment with a single header.
>> That is fine IF the user does not care to know the origins of the
>> segments.
>> (And someone might claim that such a user in not too careful, to say
>> the least.)
>> My view is that the VO should simplify life of the users in other ways
>> than just stripping off all the information that the data provider,
>> mostly
>> painfully, put together. :-)
>> My favourite solution would be to adopt a FITS extension for each of
>> the segments,
>> each extension containing:
>> - a header with VO keywords PLUS the original header keywords,
>> - a binary table with scalar columns
>> In that way the work of the data provider would be happily
>> recognised, and
>> the user might be able to find any kind of details regarding any
>> segment,
>> from the calibration reference files used to calibrate a spectrum
>> down to
>> the acknowledgment sentence some times buried in some fits COMMENT or
>> HISTORY keyword.
>> The multiple extension FITS format would also allow to cover the
>> spectropolarimetry
>> case (currently not supported at all), where for each wavelength
>> the Stokes parameters will be also stored in separate scalar columns.
>> Also, I think that the echelle spectra are causing some troubles to
>> the current format. Each of the multi order spectra should probably
>> end up
>> into its own extension.
>> Conclusions: I see only advantages in adopting MEF, am I biased?
>> Alberto
>> Aside: With such a format, it would then also be easy to build a SED
>> On The Fly
>> whereby a SED-OTF tool can compose SSAP queries to some selected
>> services and
>> come back with a single multi-extension FITS file: it is just matter
>> of
>> appending any individually ssap-returned FITS file to the
>> multi-extension file.
>> (Unless I'm wrong, I don't think that the current serialisation allow
>> a so simple
>> assembling of the fits files).
>>

Alberto Micol
ST-ECF HST Archive Scientist Received on 2005-06-13Z14:16:28