18.3. Stream external formats

An external format is an encoding for characters that maps character codes to a sequence of bytes, in a one-to-one or one-to-many fashion. External formats are also known as "character encodings" in the programming world and are an essential ingredient to be able to read and write text in different languages and alphabets.

ECL has one of the most complete supports for external formats, covering all of the usual codepages from the Windows and Unix world, up to the more recent UTF-8, UCS-2 and UCS-4 formats, all of them with big and small endian variants, and considering different encodings for the newline character.

However, the set of supporte external formats depends on the size of the space of character codes. When ECL is built with Unicode support (the default option), it can represent all known characters from all known codepages, and thus all external formats are supported. However, when ECL is built with the restricted character set, it can only use one codepage (the one provided by the C library), with a few variants for the representation of end-of-line characters.

In ECL, an external format designator is defined recursively as either a symbol or a list of external format designators. The grammar is as follows

external-format-designator := 
   symbol |
   ( {external-format-designator}+ )

and the table of known symbols is shown below

Table 18.1. Stream external formats

SymbolsCodepage or encodingUnicode required
:cr#\NewlineUnicode is Carriage ReturnNo
:crlf#\NewlineUnicode is Carriage Return followed by LinefeedNo
:lf#\NewlineUnicode is LinefeedNo
:little-endianModify UCS to use little endian encoding.No
:big-endianModify UCS to use big endian encoding.No
:utf-8 ext:utf8Unicode UTF-8Yes
:ucs-2 ext:ucs2 ext:utf-16 ext:utf16 ext:unicodeUCS-2 encoding with BOM.Yes
:ucs-2le ext:ucs2le ext:utf-16leUCS-2 with big-endian encodingYes
:ucs-2be ext:ucs2be ext:utf-16beUCS-2 with big-endian encodingYes
:ucs-4 ext:ucs4 ext:utf-32 ext:utf32UCS-4 encoding with BOM.Yes
:ucs-4le ext:ucs4le ext:utf-32leUCS-4 with big-endian encodingYes
:ucs-4be ext:ucs4be ext:utf-32beUCS-4 with big-endian encodingYes
ext:iso-8859-1 ext:iso8859-1 ext:latin-1 ext:cp819 ext:ibm819Latin-1 encodingYes
ext:iso-8859-2 ext:iso8859-2 ext:latin-2 ext:latin2Latin-2 encodingYes
ext:iso-8859-3 ext:iso8859-3 ext:latin-3 ext:latin3Latin-3 encodingYes
ext:iso-8859-4 ext:iso8859-4 ext:latin-4 ext:latin4Latin-4 encodingYes
ext:iso-8859-5 ext:cyrillicLatin-5 encodingYes
ext:iso-8859-6 ext:arabic ext:asmo-708 ext:ecma-114Latin-6 encodingYes
ext:iso-8859-7 ext:greek8 ext:greek ext:ecma-118Greek encodingYes
ext:iso-8859-8 ext:hebrewHebrew encodingYes
ext:iso-8859-9 ext:latin-5 ext:latin5Latin-5 encodingYes
ext:iso-8859-10 ext:iso8859-10 ext:latin-6 ext:latin6Latin-6 encodingYes
ext:iso-8859-13 ext:iso8859-13 ext:latin-7 ext:latin7Latin-7 encodingYes
ext:iso-8859-14 ext:iso8859-14 ext:latin-8 ext:latin8Latin-8 encodingYes
ext:iso-8859-15 ext:iso8859-15 ext:latin-9 ext:latin9Latin-7 encodingYes
ext:dos-cp437 ext:ibm-437IBM CP 437Yes
ext:dos-cp850 ext:ibm-850 ext:cp850Windows CP 850Yes
ext:dos-cp852 ext:ibm-852IBM CP 852Yes
ext:dos-cp855 ext:ibm-855IBM CP 855Yes
ext:dos-cp860 ext:ibm-860IBM CP 860Yes
ext:dos-cp861 ext:ibm-861IBM CP 861Yes
ext:dos-cp862 ext:ibm-862 ext:cp862Windows CP 862Yes
ext:dos-cp863 ext:ibm-863IBM CP 863Yes
ext:dos-cp864 ext:ibm-864IBM CP 864Yes
ext:dos-cp865 ext:ibm-865IBM CP 865Yes
ext:dos-cp866 ext:ibm-866 ext:cp866Windows CP 866Yes
ext:dos-cp869 ext:ibm-869IBM CP 869Yes
ext:windows-cp932 ext:windows-932 ext:cp932Windows CP 932Yes
ext:windows-cp936 ext:windows-936 ext:cp936Windows CP 936Yes
ext:windows-cp949 ext:windows-949 ext:cp949Windows CP 949Yes
ext:windows-cp950 ext:windows-950 ext:cp950Windows CP 950Yes
ext:windows-cp1250 ext:windows-1250 ext:ms-eeWindows CP 1250Yes
ext:windows-cp1251 ext:windows-1251 ext:ms-cyrlWindows CP 1251Yes
ext:windows-cp1252 ext:windows-1252 ext:ms-ansiWindows CP 1252Yes
ext:windows-cp1253 ext:windows-1253 ext:ms-greekWindows CP 1253Yes
ext:windows-cp1254 ext:windows-1254 ext:ms-turkWindows CP 1254Yes
ext:windows-cp1255 ext:windows-1255 ext:ms-hebrWindows CP 1255Yes
ext:windows-cp1256 ext:windows-1256 ext:ms-arabWindows CP 1256Yes
ext:windows-cp1257 ext:windows-1257 ext:winbaltrimWindows CP 1257Yes
ext:windows-cp1258 ext:windows-1258Windows CP 1258Yes