RE: Binary data (branch from future of VOTable)

From: Alex Szalay <szalay-at-jhu.edu>
Date: Tue, 20 Apr 2004 22:54:46 -0400


Has anybody measured the performance of BinX? It would be a useful info.

Cheers, Alex

-----Original Message-----
From: owner-votable-at-eso.org [mailto:owner-votable-at-eso.org]On Behalf Of Guy Rixon
Sent: Tuesday, April 20, 2004 8:52 AM
To: Tony Linde
Cc: 'VOTable mailing list'
Subject: Re: Binary data (branch from future of VOTable)

Longish description of DFDL, since others have said what can be said simply...

As Bob noted, BinX was produced by edikt and DFDL is the GGF equivalent. DFDL
was originally intended to be a very limited extension of BinX. However, once
it came under discussion at GGF it became clear that something more complex was needed for many cases. DFDL and BinX are now totally different in form.

BinX is an XML vocabulary that defines how binary data are ladi out in a stream. You could paraphrase a BinX description as "integer, integer, short integer, float; repeat 20 times, then add two more integers; oh, by the way, the integers are all big-endian and the floats are in IEEE format". It works when the binary stream is capable of lexical analysis. BinX just does the lexing; it doesn't support parsing as it doesn't contain any semantic information.

DFDL, as it was a week ago (it's been mutating rather fast) is much more subtle because it's trying to solve a harder class of problems: what to do when the binary stream isn't lexable directly. E.g.: what do you do if the stream is gzip'd?

DFDL works in terms of a series of transformations; these are, mathematically,
transformations between object graphs. The most primitive graph represents the
bit-stream and the most structured represents fragments of an external data model: i.e. something that could be serialized as XML or transscribed as DOM,
etc. A DFDL processor rips the bit-stream into semantically-meaningful things
defined by W3C XML schema, rather than into C-like primitives. In the process, it does as much rearranagement of the bits as is needed.

Currently, DFDL is proposed to be a set of annotations to a W3C XML schema. The DFDL bits go into an appinfo element for the schema as a whole. Therefore, if one makes a schema for a union of IVOA data-model fragments (as discussed on the VOTable list in the past), then one could decorate that schema with DFDL information that allows it to be applied to a binary stream.
The schema is still a schema; the DFDL part extends it without changing it.

There's a big, open question as to whether one writes DFDL descriptions for formats or for instances. Clearly, describing formats is more reusable, but it
is much harder and may be infeasible. To illustrate this, consider describing
a FITS table that you have in a file. Do you describe the byte patterns for this particular file - a "bytes to my file" transform - or do you try to describe FITS - "bytes to FITS binary table" transform, followed by a "FITS table to my table" trabsform that says how you used the format? Think about doing the latter in Java or C and you'll see that describing concrete instances is a lot easier.

There's also an open question as to how a DFDL schema is used. Serializing all of a big, binary file to XML is silly; if it could be handled as XML, then
it wouldn't be in binary in the first place. I suspect we need an API that supplies the next few major objects from the stream in DOM, with some safety mechanism that limits the amount of memory used in the DOM. The DFDL-WG isn't
giving much thought to the API at present; we in astronomy might be able to help out there.

Finally, we should note that BinX is a usable system with an implementation but DFDL is only a specification as yet.

On Wed, 14 Apr 2004, Tony Linde wrote:

> > I have yet to see a standard for bulk binary data transport
> > that is significantly better than FITS. Hence the wrapping
>
> There's a BinX format floating about from (I think) the grid world - and
Guy
> told me about some DFDL (pronounced 'daffodil' apparently) which is a way
of
> allowing xml-based tools to read binary format data. I don't know much
more
> about it.
>
> Cheers,
> Tony.
>
> > -----Original Message-----
> > From: owner-votable-at-eso.org [mailto:owner-votable-at-eso.org] On
> > Behalf Of Arnold Rots
> > Sent: 14 April 2004 14:34
> > To: VOTable mailing list
> > Subject: Re: Comments on V1.1 - Future of VOTable
> >
> > Now that this side of the Atlantic has woken up and surveyed
> > (bleary-eyed) the wisdom that has been dispensed from the
> > other side while we were blissfully asleep, we might a well
> > join in :-)
> >
> > I'll repeat my old mantra (slightly disagreeing with what Clive said
> > earlier): FITS defined the syntax of the metadata but failed
> > to define the semantics - and that's turned out to have been
> > a considerable problem. HEASARC/OGIP tried to fix that with
> > a set of conventions, but such efforts were very much limited
> > to sub-communities: high energy, optical, and radio each have
> > their own set of idiosyncratic conventions; and that's
> > troublesome for a VO.
> >
> > To expand on the reversibility that Mark quoted:
> > So, defining an XML-based metadata standard that is purposely
> > semantic in nature is a very sensible thing to do and XML is
> > well-suited for that (better than FITS). On the other hand,
> > I have yet to see a standard for bulk binary data transport
> > that is significantly better than FITS. Hence the wrapping
> > concept - which I think is actually a very sensible, but also
> > accpetable, way of transporting data and information.
> > Extract the metadata from the FITS headers (requiring
> > software that understands the peculiar dialect of a
> > particular FITS file), put it in a universally understandable
> > XML document (for the sake of argument, let's say a VOTable),
> > and send it with the FITS file containing the data - that
> > seems like a perfectly acceptable and practical solution (at
> > least to me).
> >
> > And, of course, if you can translate a FITS dialect to a
> > VOTable, you can try to do the reverse and feed the new file
> > into your existing FITS-based analysis package - or you can
> > use something more modern.
> >
> > In summary:
> > - The data transport mechanism of FITS ain't broken -> don't fix it.
> > - The metadata semantics in FITS is a problem -> replace it
> > by something better (VOTable or whatever).
> >
> > Anyway, that's the assumption I have been working from.
> > Cheers,
> >
> > - Arnold
> >
> > Mark Taylor wrote:
> > > On Wed, 14 Apr 2004, martin hill wrote:
> > >
> > > > While I appreciate that VOTable can be used to wrap FITS,
> > this is (I
> > > > hope) a temporary measure while we sort out our data models and
> > > > representations. We should recognise it as such and put future
> > > > effort into producing suitable solutions, not in
> > overloading VOTable
> > > > to represent all data formats. FITS files too have
> > limited structures.
> > >
> > > I don't see it as a temporary measure. VOTable is not supposed to
> > > represent all data formats, FITS is a special case. From
> > sec 2.3 of
> > > the VOTable document:
> > >
> > > "the transformation of FITS to VOTable is meant to be reversible"
> > >
> > > The kind of data (though not metadata) which can be held in
> > a VOTable
> > > is by design compatible with the FITS binary table format, so that
> > > requiring the possible storage of VOTable bulk data using a FITS
> > > binary table does not introduce any additional limitations (put
> > > another way, VOTable explicitly accepts the limitations of
> > FITS binary
> > > tables as far as pure data storage streams go).
> > >
> > > Mark
> > >
> > > --
> > > Mark Taylor Starlink Programmer Physics, Bristol
> > University, UK
> > > m.b.taylor-at-bris.ac.uk +44-117-928-8776
> > > http://www.star.bris.ac.uk/~mbt/
> > >
> > --------------------------------------------------------------
> > ------------
> > Arnold H. Rots Chandra X-ray
> > Science Center
> > Smithsonian Astrophysical Observatory tel: +1
> > 617 496 7701
> > 60 Garden Street, MS 67 fax: +1
> > 617 495 7356
> > Cambridge, MA 02138
> > arots-at-head.cfa.harvard.edu
> > USA
> > http://hea-www.harvard.edu/~arots/
> > --------------------------------------------------------------
> > ------------
> >
>

Guy Rixon 				        gtr-at-ast.cam.ac.uk
Institute of Astronomy   	                Tel: +44-1223-337542
Madingley Road, Cambridge, UK, CB3 0HA		Fax: +44-1223-337523
Received on 2004-04-21Z04:50:17