[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5. The compiler

5.1 The compiler translates to C  
5.2 The compiler mimics human C programmer  
5.3 Implementation of Compiled Closures  
5.4 Use of Declarations to Improve Efficiency  
5.5 Inspecting generated C code  
5.6 Embedding C code in lisp source  
5.7 The C language interface  
5.8 The old C language interface  

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.1 The compiler translates to C

The ECL compiler is essentially a translator from Common-Lisp a Lisp source file, the compiler first generates three intermediate files:

The ECL compiler then invokes the C compiler to compile the C-file into an object file. Finally, the contents of the Data-file is appended to the object file to make a Fasl-file. The generated Fasl-file can be loaded into the ECL system by the Common-Lisp function load. By default, the three intermediate files are deleted after the compilation, but, if asked, the compiler leaves them.

The merits of the use of C as the intermediate language are:

The demerits are:

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.2 The compiler mimics human C programmer

The format of the intermediate C code generated by the ECL compiler is the same as the hand-coded C code of the ECL source programs. For example, supposing that the Lisp source file contains the following function definition:

	(defvar *delta* 2)
	(defun add1 (x) (+ *delta* x))

The compiler generates the following intermediate C code.

/*	function definition for ADD1                                  */
static cl_object L1(cl_object V1)
	cl_object value0;
	value0=number_plus(symbol_value(VV[0]),V1); NVALUES=1;
	return value0;
/*      initialization of this module                                 */
void init_CODE(cl_object flag)
	cl_object value0;
	if (!FIXNUMP(flag)){
	flag->cblock.data = VV;
	flag->cblock.data_size = VM;
	flag->cblock.data_text = compiler_data_text;
	flag->cblock.data_text_size = compiler_data_text_size;
	VV = Cblock->cblock.data;
        if(SYM_VAL(T0)!=OBJNULL) cl_setq(VV[0],T0);

The C function L1 implements the Lisp function add1. This relation is established by cl_def_c_function in the initialization function init_CODE, which is invoked at load time. There, the vector VV consists of Lisp objects; VV[0] and VV[1] in this example hold the Lisp symbols *delta* and add1. VM in the definition of L1 is a C macro declared in the corresponding H-file. The actual value of VM is the number of value stack locations used by this module, i.e., 2 in this example. Thus the following macro definition is found in the H-file.
#define VM 2

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.3 Implementation of Compiled Closures

The ECL compiler takes two passes before it invokes the C compiler. The major role of the first pass is to detect function closures and to detect, for each function closure, those lexical objects (i.e., lexical variable, local function definitions, tags, and block-names) to be enclosed within the closure. This check must be done before the C code generation in the second pass, because lexical objects to be enclosed in function closures are treated in a different way from those not enclosed.

Ordinarily, lexical variables in a compiled function f are allocated on the C stack. However, if a lexical variable is to be enclosed in function closures, it is allocated on a list, called the "environment list", which is local to f. In addition, a local variable is created which points to the lexical variable's location (within the environment list), so that the variable may be accessed through an indirection rather than by list traversal.

The environment list is a pushdown list: It is empty when f is called. An element is pushed on the environment list when a variable to be enclosed in closures is bound, and is popped when the binding is no more in effect. That is, at any moment during execution of f, the environment list contains those lexical variables whose binding is still in effect and which should be enclosed in closures. When a compiled closure is created during execution of f, the compiled code for the closure is coupled with the environment list at that moment to form the compiled closure.

Later, when the compiled closure is invoked, a pointer is set up to each lexical variable in the environment list, so that each object may be referenced through a memory indirection.

Let us see an example. Suppose the following function has been compiled.

(defun foo (x)
    (let ((a #'(lambda () (incf x)))
          (y x))
      (values a #'(lambda () (incf x y)))))

foo returns two compiled closures. The first closure increments x by one, whereas the second closure increments x by the initial value of x. Both closures return the incremented value of x.

>(multiple-value-setq (f g) (foo 10))
#<compiled-closure nil>

>(funcall f)

>(funcall g)


After this, the two compiled closures look like:

second closure       y:                     x:
|-------|------|      |-------|------|       |------|------| 
|  **   |    --|----->|  10   |    --|------>|  21  | nil  |
|-------|------|      |-------|------|       |------|------| 
                      first closure             |
                      |-------|------|          |
                      |   *   |    --|----------| 

 * : address of the compiled code for #'(lambda () (incf x))
** : address of the compiled code for #'(lambda () (incf x y))

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4 Use of Declarations to Improve Efficiency

Declarations, especially type and function declarations, increase the efficiency of the compiled code. For example, for the following Lisp source file, with two Common-Lisp

(eval-when (compile)
  (proclaim '(function tak (fixnum fixnum fixnum) fixnum))

(defun tak (x y z)
  (declare (fixnum x y z))
  (if (not (< y x))
      (tak (tak (1- x) y z)
           (tak (1- y) z x)
           (tak (1- z) x y))))

The compiler generates the following C code:

/*      local entry for function TAK                                  */
static int LI1(register int V1,register int V2,register int V3)
        if (V2 < V1) {
        goto L2;}
        { int V5;
          V5 = LI1((V1)-1,V2,V3);
        { int V6;
          V6 = LI1((V2)-1,V3,V1);
          V3 = LI1((V3)-1,V1,V2);
          V2 = V6;
          V1 = V5;}}
        goto TTL;
;;; Note: Tail-recursive call of TAK was replaced by iteration.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5 Inspecting generated C code

Common-Lisp supposed to disassemble a compiled function and to display the assembler code. According to Common-Lisp@c: The Language@c,

This is primary useful for debugging the compiler, ..\\

This is, however, useless in our case, because we are not concerned with assembly language. Rather, we are interested in the C code generated by the ECL compiler. Thus the disassemble function in ECL accepts not-yet-compiled functions only and displays the translated C code.

> (defun add1 (x) (1+ x))
> (disassemble *)
;;; Compiling (DEFUN ADD1 ...).
;;; Emitting code for ADD1.

/*      function definition for ADD1                                  */
static L1(int narg, object V1)
        VALUES(0) = one_plus((V1));

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6 Embedding C code in lisp source

There are several mechanism to integrate C code within ECL, but everything is built around two functions that allow the user to embed arbitrary C/C++ code into Lisp source code.

The two mechanisms are the Clines and the c-inline special forms. The first one permits to insert code in the intermediate C/C++ file generated by the ECL compiler. Such a form outputs no value and takes no arguments, except a series of strings which are inserted literally, such as #include or #define statements, function definitions, etc.

Macro: Clines {{string}*}
When the ECL compiler encounters a macro form (Clines string1 ... stringn), it simply outputs the strings into the c-file. The arguments are not evaluated and each argument must be a string. Each string may consist of any number of lines, and separate lines in the string are placed in separate lines in the c-file. In addition, each string opens a fresh line in the c-file, i.e., the first character in the string is placed at the first column of a line. Therefore, C-language preprocessor commands such as #define and #include will be recognized as such by the C compiler, if the ' # ' sign appears as the first character of the string or as the first character of a line within the string.

When interpreted, a Clines macro form expands to ()@c.

(use-package "FFI")

"   int tak(x, y, z)                       "
"   int x, y, z;                           "
"   {   if (y >= x) return(z);             "
"       else return(tak(tak(x-1, y, z),    "
"                       tak(y-1, z, x),    "
"                       tak(z-1, x, y)));  "
"   }                                      "

(defun tak (x y z)
  (c-inline (x y z) (:int :int :int) :int
     "tak(#0,#1,#2)" :one-liner t))

The second mechanism, which you already appreciate in the example above, is the c-inline special form. This powerful method allows the user to insert C code which is evaluated, and which can accept values and return values from and to the Lisp world, with an automatic convertion taking place in both directions.

Macro: c-inline {args-list arg-C-types output-C-type C-expr &key

c-inline is a special form that can only be used in compiled code. For all purposes it behaves as a Lisp form, which takes the arguments given in args-list and produces a single value. Behind the curtains, the arguments of args-list (which can be any valid Lisp form) are coerced to the the C types given in arg-C-types, passed to the C expression C-expr, and coerced back to Lisp using the C type output-C-type as a guide. Multiple return values can be returned by setting output-C-type to (values type-1 type-2 ...).

C-expr is a string containing C code and maybe some special escape codes. First, the arguments of the form may be retrieved as #0, #1, etc. Second, if the c-inline form is a one-line C expression (That is, one-liner is true), then the whole expression is interpreted as the output value. But if the code, on the other hand, is a multiline expression (one-liner is false), the form has to be output using @(return) =.... Multiple values are returned as @(return 0)=... ; @(return 1)=...;. Finally, Lisp constants may be used in the C code making use of the prefix @.

(use-package "FFI")

(Clines "
#include <math.h>

double foo (double x, double y) {
  return sinh(x) * y;

(defvar *a*
  (c-inline (1.23) (:double) :double
    :side-effects nil
    :one-liner t))

(defvar *b*
  (c-inline (1.23) (:double) :double
       "{cl_object x = symbol_value(@*a*);
	@(return) = foo(#0,object_to_float(x));}"
    :side-effects nil
    :one-liner nil))

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.7 The C language interface

Using these special forms clines and c-inline, plus the ability to handle pointers to foreign data, we have built a rather complete FFI for interfacing with the C world. This interface is compatible with the UFFI specification, which can be found in the web. We recommend you to grab the documentation from this package and read it carefully. All examples should run unmodified under ECL (Of course, you do not need to download UFFI itself, as everything is already implemented in ECL.

However, because ECL provides some additional functionality which escapes the UFFI, and also for compatibility with older versions of the ECL environment, we provide additional toplevel forms, which are listed in the next section.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.8 The old C language interface

In this section we list several macros and toplevel forms which are provided either for convenience or for compatibility with older versions of ECL. You should avoid using them when the UFFI-compatible interface provides similar functionality.

We define some terminology here which is used throughout this Chapter. A C-id is either a Lisp string consisting of a valid C-language identifier, or a Lisp symbol whose print-name, with all its alphabetic characters turned into lower case, is a valid C identifier. Thus the symbol foo is equivalent to the string "foo" when used as a C-id. Similarly, a C-expr is a string that may be regarded as a C-language expression. A C-type is one of the Lisp symbols :int, :char, :float, :double,... and :object. Each corresponds to a data type in the C language; :object is the type of Lisp object and other C-types are primitive data types in C.

Macro: defentry {function parameter-list C-function}

defentry defines a Lisp function whose body consists of the calling sequence to a C-language function. function is the name of the Lisp function to be defined, and C-function specifies the C function to be invoked. C-function must be either a list (type C-id) or C-id, where type and C-id are the type and the name of the C function. type must be a C-type or the symbol void which means that the C function returns no value. (object C-id) may be abbreviated as C-id. parameter-list is a list of C-types for the parameters of the C function. For example, the following defentry form defines a Lisp function tak from which the C function tak above is called.

(defentry tak (:int :int :int) (:int tak))

The Lisp function tak defined by this defentry form requires three arguments. The arguments are converted to int values before they are passed to the C function. On return from the C function, the returned int value is converted to a Lisp integer (actually a fixnum) and this fixnum will be returned as the value of the Lisp function. See below for type conversion between Lisp and the C language.

A defentry form is treated in the above way only when it appears as a top-level form of a Lisp source file. Otherwise, a defentry form expands to ()@c.

Macro: defla {name lambda-list {declaration | doc-string}*}

When interpreted, defla is exactly the same as defun. That is, (defla name lambda-list . body) expands to (defun name lambda-list . body). However, defla forms are completely ignored by the compiler; no C-language code will be generated for defla forms. The primary use of defla is to define a Lisp function in two ways within a single Lisp source file; one in the C language and the other in Lisp. defla is short for DEFine Lisp Alternative.

Suppose you have a Lisp source file whose contents are:

(use-package "FFI")

;;; C version of TAK.
(Clines "

       int tak(x, y, z)                           
       int x, y, z;
       {      if (y >= x) return(z);
               else return(tak(tak(x-1, y, z),
                               tak(y-1, z, x),
                               tak(z-1, x, y)));

;;;  TAK calls the C function tak defined above.
(defentry tak (:int :int :int) (:int tak))
;;;  The alternative Lisp definition of TAK.
(defla tak (x y z)
   (if (>= y x)
       (tak (tak (1- x) y z)
            (tak (1- y) z x)
            (tak (1- z) x y))))

When this file is loaded into ECL, the interpreter uses the Lisp version of the tak definition. Once this file has been compiled, and when the generated fasl file is loaded into ECL, a function call to tak is actually the call to the C version of tak.

Function: defCbody {name args-types result-type C-expr}
The ECL compiler produces a function named name with as many arguments as arg-types. The C-expr is an arbitrary C expression where the arguments to the function are denoted by #i, where i is the integer corresponding to the argument position. The args-types is the list of Common-Lisp while result-type is the Common-Lisp are coerced to the required types before executing the C-expr and the result is converted into a Lisp object. defCbody is ignored by the interpreter.

For example, the logical AND of two integers could be defined as:
(defCbody logand (fixnum fixnum) fixnum "(#0) & (#1)")

Function: definline {name args-types result-type C-expr}
definline behaves exactly as defCbody. Moreover, after a definline definition has been supplied, the ECL compiler will expand inline any call to function name into code corresponding to the C language expression C-expr, provided that the actual arguments are of the specified type. If the actual arguments cannot be coerced to those types, the inline expansion is not performed. definline is ignored by the interpreter.

For example, a function to access the n-th byte of a string and return it as an integer can be defined as follows:

(definline aref-byte (string fixnum) fixnum

The definitions of the C data structures used to represent \clisp objects can be found in file ecl.h in the directory "src/h" of the source distribution.

ECL converts a Lisp object into a C-language data by using the Common-Lisp function coerce: For the C-type int (or char), the object is first coerced to a Lisp integer and the least significant 32-bit (or 8-bit) field is used as the C int (or char). For the C-type float (or double), the object is coerced to a short-float (or a long-float) and this value is used as the C float (or double). Conversion from a C data into a Lisp object is obvious: C char, int, float, and double become the equivalent Lisp character, fixnum, short-float, and long-float, respectively.

Here we list the complete syntax of Clines, defentry, definline and defCbody macro forms.

        (Clines {string}*)

        (defentry symbol ({C-type}*)
                  {C-function-name | ({C-type | void} C-function-name)})

        (defCbody symbol ({type}*) type C-expr)

        (defCbody symbol ({type}*) type C-expr)

        { string | symbol }
        { object | int | char | float | double }

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Juan Jose Garcia Ripoll on May, 30 2005 using texi2html