ECL is fully ANSI Common-Lisp compliant in all aspects of the character data type, with the following peculiarities.
There are two ways of building ECL: with C or with Unicode character codes. These build modes are accessed using the
--enable-unicode configuration options, the last one being the default.
When using C characters we are actually relying on the char type of the C language, using the C library functions for tasks such as character conversions, comparison, etc. In this case characters are typically 8 bit wide and the character order and collation are determines by the current POSIX or C locale. This is not very accurate, leaves out many languages and character encodings but it is sufficient for small applications that do not need multilingual support.
When no option is specified ECL builds with support for a larger character set, the Unicode 6.0 standard. This uses 24 bit large character codes, also known as codepoints, with a large database of character properties which include their nature (alphanumeric, numeric, etc), their case, their collation properties, whether they are standalone or composing characters, etc.
If compiled without Unicode support, ECL all characters are implemented using 8-bit codes and the type extended-char is empty. If compiled with Unicode support, characters are implemented using 24 bits and the extended-char type covers characters above code 255.
|Type||With Unicode||Without Unicode|
All characters have a name. For non-printing characters between 0 and 32, and for 127 we use the ordinary ASCII names. Characters above 127 are printed and read using hexadecimal Unicode notation, with a
U followed by 24 bit hexadecimal number, as in
Table 11.1. Examples of character names
#\Linefeed is synonymous with
#\Newline and thus is a member of