on the Curl Web Content Markup Language

on the Curl Web Content Markup and Programming Language from www.curl.com and www.curlap.com

Tuesday, April 5, 2011

Curl character encodings

I have added a post above displaying the CharEncoding which Curl 7.0 offers on my Windows installation.

The two lists there cover more than 100 character encodings: both are from the procedure {get‑all‑character‑encodings} (note the use of & # 8209 ; to get that procedure name to appear here with non-breaking hyphens.)

The reason for using them is that Curl offers a CharEncoding class feature to obtain an instance by name - but the docs had no complete list of names, no doubt because the HostEncoding instances will vary by platform and current installation.

One dump is from a breakpoint in the debugger and one is console output from a loop over the {Array‑of CharEncoding} which is returned by the proc.

The types in that container include:

NoneCharEncoding
ShiftJISCharEncoding
EUCJPCharEncoding
UTF8CharEncoding
UTF16CharEncoding
UTF16UnknownEndianCharEncoding
SingleByteCharEncoding
MappedSingleByteCharEncoding
HostEncoding

with most being in the last.  The first is a system default for no encoding specified.

I happened to note that the Curl mother site www.curlap.com pages are encoded in Shift‑JIS.

A few days ago I noticed that some Classical Greek in UNICODE (UTF‑16) displays correctly in the Curl 7.0 IDE but not in any of my Windows web browsers.  I have not yet tested a Curl desktop Dcurl application to see if the UTF‑16 presents correctly in an RIA test.

By way of contrast, the current Pharo Smaltalk environment offers 14 encodings for Workspace contents. EncodedCharSet in Pharo currently has subclasses for GB2312, JISX0208, KSX1001, Latin1 and Unicode.

No comments:

Post a Comment