Re: String character range

From: Luigi Paioro <luigi-at-lambrate.inaf.it>
Date: Fri, 01 Aug 2008 18:25:36 +0200


Hi.

I find that your suggestion below is a good compromise. I would split it in two points:

  1. At SAMP protocol definition level we might define that "string" can accept any sequence of 0X01-0x7f characters adding the escape convention for any printable Unicode char out of the specified range (so it is general).
  2. At Standard Profile level I would put more constraints, limiting the charset to the XML range and introducing the escape convention for the other unsupported chars.

Is it reasonable?

Luigi

> As far as SAMP goes: that character looks to me like code point 0xf1,
> from the Latin-1 Supplement code block. So you could not send it using
> either the existing definition for a SAMP string or the proposal (4)
> that I am suggesting. If we used a variant of my suggestion (3):
>
> 3. Define some escaping convention for un-XML characters, e.g. \u001f
> for character 31.
>
> with the intention that this escaping mechanism could be used for
> any 8-bit character it would be possible to transmit this kind of
> non-7-bit Latin character. However, characters with the 8th bit set
> might cause problems for certain other transports and language
> environments. I must admit apart from RFC-822 mail-type contexts I
> can't think of what these might be, but I'd be inclined to steer clear
> of non-7-bit characters just in case. However, if others (e.g. with
> less Anglo-Saxon prejudices) think that it's an important requirement to
> permit transmission of characters like this within
> SAMP we could take that on board. We could even in principle say that
> this escaping mechanism could be used to specify any Unicode character -
> but I think that would definitely be a bad idea as it would effectively
> restrict use of the protocol to languages with Unicode support, which
> excludes quite a lot.
>
> Mark
Received on 2008-08-01Z18:26:06