[seqfan] Re: Windows text editor
Donald Alan Morrison
donmorrison at gmail.com
Thu Jan 20 01:54:11 CET 2011
On 1/19/11 4:10 PM, Russ Cox wrote:
>> kb> Russ Cox tood me that the new uploader expects text files to be encoded
>> kb> in UTF-8.
>>
>> I hope that b-files stay in plain ASCII. The occurrence of (non-ASCII)
>> byte-order-marks (BOM's) in UTF files (which are not shown by all editors by
>> default and make parsing more difficult in basically all standard languages)
>> is not a nice feature.
>
> The b files have a strict format; byte order marks are not allowed.
>
> Russ
UTF-8 does not require a BOM marker anyway.
http://en.wikipedia.org/wiki/UTF-8#Advantages
If the high bit is set in a byte, then that byte (sequence) is
non-ascii, and the convention is to assume utf-8 if no BOM exists. So
either way, your decision flow chart is clear.
Has BOM? Yes->Reject
Is any high bit set? Yes->Reject if you don't want to try utf-8
validation, which would involve a lookup table of which "code points"
should be supported by a "standard" unicode font...like Linux Libertine,
Charis SIL, or Gentium, etc....
Donald
More information about the SeqFan
mailing list