Re: String character range

From: Luigi Paioro <luigi-at-lambrate.inaf.it>
Date: Mon, 04 Aug 2008 16:28:59 +0200


Hi!

I think that Unicode chars would be rarely sent, and control chars never at all. Probably in the 99% of the cases ASCII charset with the limitations you indicated is enough, so I don't have a strong position respect the Unicode support.

Anyway I've thought to Dough's suggestion regarding UTF-8 and I've looked here and there for what string encoding mechanism adopt other RPC systems like ZeroC's Ice and DBus (I've also looked for CORBA encoding, but I didn't succeed). Well, DBus and Ice, either use UTF-8 (with no limitations). I've not looked at the other RPC systems (there are a plethora) but those are my favourite (along with XML-RPC and SOAP of course) and so I've looked there.

Now, suppose that in the far far future, a perverted guy decides to implement SAMP using a different profile, for instance using Ice as wire protocol (in principle it should be possible) instead of XML-RPC. It would be a shame if such an implementation inherited the limitations coming from the XML limits. In my opinion the limits should be put at implementation and language level, not at protocol level... it should be as general (and flexible) as possible.

So, why not follow Dough's suggestion and specify at SAMP protocol definition level that the strings serialization is in UTF-8 (in general), and specify at Standard Profile level that not all the UTF-8 chars are allowed but only those supported by XML?

Luigi

>
> My feeling is it would be better to restrict what can be sent in a
> SAMP string to something that is going to be easy to implement in all
> sensible languages/transports (probably 0x09, 0x0a, 0x0d, 0x20-0x7f),
> so that both the standard, and the requirements on clients, stay as
> simple as possible. If specific requirements for sending full Unicode
> strings arise, we could mark these on a per-MType basis
> and come up with a convention along the lines of the SAMP int and
> SAMP float already defined in Section 3.4.
>
> Which of these is best depends on how important the requirement to
> be able to send Unicode and control characters is. My vote is not
> very. Can we have a show of hands?
>
Received on 2008-08-04Z16:29:28