On 21.07.2005, at 15:48, Roy Williams wrote:
> The key to efficient transfer of big data is to separate (XML)
> metadata from (binary) data. The metadata can contain pointers to the
> data (http:// or srb:// or gridftp://).
>
> When I buy a single brick at my local hardware store, I take it with
> me to the cashier and deal with metadata (payment) and data (the
> brick) together. But when I buy 1,000 bricks, it is different. I pay
> the cashier and receive a piece of paper (the pointer), then I take
> the paper somewhere else to load the bricks into my truck.
>
> As Andreas points out, VOTable was built in this way, to represent
> table metadata, with pointers to binary or FITS data elsewhere. It is
> not advised to use the TR,TD mechanism of VOTable to represent large
> datasets. In the same way, the VOStore specification is being built
> with the possibility of splitting metadata from data.
>
> Splitting metadata from data is more efficient.
> But it requires more effort to make sure the two are properly
> synchronized.
That's why we chose to keep the two things together in one file (or rather transfer block) containing header (VOTable) and binary data. The references in the VOTable thus are using what is called a contents-ID (see http://www.ietf.org/rfc/rfc2111.txt) and this construction is known by many mime handlers. Like this we have a self-describing file containing both a valid XML-file and plain binary data. This is much like FITS, but we are using well-known standards from the e-mail world. In fact such a file can be opened and interpreted using a standard e-mail client. The size of the actual VOTable is limited to the resource and field description, all the data is in binary attachments. Well, we actually wanted to have a bit more flexibility even and created VOTables where single fields are referring to multi-dimensional binary arrays and other fields are still given as TD elements. Since this is not covered by the current version of the VOTable standard, we are not using this externally though.
I would also like to mention that we should strictly separate the issue of the transfer from the issue of the packaging/formating. The fact that TCP connections have their problems with latency and slow startup speed should not be mixed with what you would like to transfer. If we find a much better protocol to transfer large amounts of data, thats should be used, but we should not be forced to use a different internal packaging because of this.
Anita: The transfer speed we reached actually is almost 0.8 Gb/s (equivalent to 96 MB/s) and is in fact limited by the network cards.
Andreas Received on 2005-07-21Z15:31:16