Re: String character range

From: Luigi Paioro <luigi-at-lambrate.inaf.it>
Date: Thu, 28 Aug 2008 17:16:02 +0200


Dear Mark, Dough and all,

   in reply to this mail

> This is a coherent suggestion and it could be done. However in my
> opinion it's not the best way to go. While making the protocol as
> general and flexible as possible sounds like a good thing, the price
> that you pay is a reduction in interoperability. If the protocol
> says that SAMP strings can only ever contain characters 0xA, 0xD and
> 0x20-7F (or whatever) then you know that if you can handle those
> characters then you can definitely interoperate with anyone else
> speaking the protocol. If the protocol says that any UTF-8 character
> is permitted then someone trying to write middleware that does
> translation between the far future perverted Ice-based profile and the
> current Standard Profile will have a problem. Is that kind of
> middleware something we're going to need? I don't know. But in
> weighing up how we ought to plan for unknown future evolutions,
> I would rather err on the side of safety than of flexibility.

I must admit that I didn't consider the possibility of having a multi-profile hub, hence the necessity of translations. For interoperability reasons, probably it is logical to assume that every SAMP hub implementation MUST support at least the Standard Profile, and then other possible profiles or extensions. Therefore the XML limits should be taken in account at abstract API level. Sure.

Anyway, as I said in a previous mail, I don't think that UTF-8 support is really important and likely in the 99% of the cases ASCII with the said limits is appropriate, so I don't insist.

However Dough's problem with that VOTable has suggested me a possible scenario that could require UTF-8 support (with the XML constraints), maybe introducing an additional data type. Well, this is the scenario: I get from a SSA (TAP, SIA, whatever) service a VOTable which contains UTF-8 chars (no matter what) and I get it using a VO enabled application; after some elaborations I broadcast it to one or more other applications in an asynchronous way using SAMP. This simple operation can be done in two ways:

i) by reference: the VOTable is written in a file (local or remote) and the reference to such a file is sent with a proper MType as a simple ASCII string (e.g. "file:///tmp/myvotab.vot", "ivo://my.vospace.address/myvotab.vot", etc.)

ii) by value: the content of the VOTable is sent as a byte stream, still using a proper MType. This byte stream can simply be a string UTF-8 encoded.

Case i) requires only ASCII charset supported, while ii) requires support for UTF-8 (or at least leave it to pass through) or an additional general data type for byte streams (which I suspect could be useful even for other purposes).

If ASCII charset only (with the said limits) is allowed in a SAMP message, then only case i) is allowed. If someone wished the possibility of passing a data by value (case ii) then I think the discussion would be still long...

Luigi Received on 2008-08-28Z17:17:05