Chapter 11. Characters

11.1. Unicode vs. POSIX locale
11.1.1. Character types
11.1.2. Character names
11.2. #\Newline characters
11.3. C Reference
C types — C character types
Constructors — Creating and extracting characters from Lisp objects
Predicates — C predicates for Lisp characters
Character case — C functions related to the character case
ANSI Dictionary — Common Lisp and C equivalence

ECL is fully ANSI Common-Lisp compliant in all aspects of the character data type, with the following peculiarities.

11.1. Unicode vs. POSIX locale

There are two ways of building ECL: with C or with Unicode character codes. These build modes are accessed using the --disable-unicode and --enable-unicode configuration options, the last one being the default.

When using C characters we are actually relying on the char type of the C language, using the C library functions for tasks such as character conversions, comparison, etc. In this case characters are typically 8 bit wide and the character order and collation are determines by the current POSIX or C locale. This is not very accurate, leaves out many languages and character encodings but it is sufficient for small applications that do not need multilingual support.

When no option is specified ECL builds with support for a larger character set, the Unicode 6.0 standard. This uses 24 bit large character codes, also known as codepoints, with a large database of character properties which include their nature (alphanumeric, numeric, etc), their case, their collation properties, whether they are standalone or composing characters, etc.

11.1.1. Character types

If compiled without Unicode support, ECL all characters are implemented using 8-bit codes and the type extended-char is empty. If compiled with Unicode support, characters are implemented using 24 bits and the extended-char type covers characters above code 255.

TypeWith UnicodeWithout Unicode
standard-char#\Newline,32-126#\Newline,32-126
base-char0-2550-255
extended-char-255-16777215

11.1.2. Character names

All characters have a name. For non-printing characters between 0 and 32, and for 127 we use the ordinary ASCII names. Characters above 127 are printed and read using hexadecimal Unicode notation, with a U followed by 24 bit hexadecimal number, as in U0126.

Table 11.1. Examples of character names

CharacterCode
#\Null0
#\Ack1
#\Bell7
#\Backspace8
#\Tab9
#\Newline10
#\Linefeed10
#\Page12
#\Esc27
#\Escape27
#\Space32
#\Rubout127
#\U0080128

Note that #\Linefeed is synonymous with #\Newline and thus is a member of standard-char.