Updated arithmetic, part 3

August 2nd, 2010 by tk

Multiplication has been written!

The %digit-multiply-add primitive is provided for radix and internal representation independence. It has four arguments. The first two digits are the multiplier and multiplicand digits. The third digit is used to provide the carry from a previous multiply-and-add, and the fourth digit can be used to provide a digit from the product being accummulated.

Multiplication is simpler than subtraction – with the choice of sign+magnitude representation, there is no need to convert the magnitude to or from complemented form.

Because it was simpler to do, and easier to verify, the current added procedures, listed below, do not use the fourth argument (it’s always 0). A partial product is created by multiplying one operand by a single digit from the other operand. Each partial product is added to an accummulated product, and the result is passed on to the next partial product step (function application). To align the partial products properly, we add a least significant 0 digit to the current multidigit operand and pass it on to the next step as the new multidigit operand.

    (multiply
      (lambda (x y)
        (cond
          ( (number? x)
            (cond
              ( (number? y) (mpy-num x y) )
              ( #t          '***undefined*** )  ))
          ( #t '***undefined*** )  )))

    (mpy-num
      (lambda (x y)
        (%list->number (mpy-num1 (%number->list x) (%number->list y)))))

    (mpy-num1
      (lambda (x y)
        (cond
          ( (eq? (car x) #\0)     (list #\0) )
          ( (eq? (car y) #\0)     (list #\0) )
          ( (eq? (car x) (car y)) (cons #\+ (mpy-magnitude (cdr x) (cdr y))) )
          ( #t                    (cons #\- (mpy-magnitude (cdr x) (cdr y))) ))))

    (mpy-magnitude
      (lambda (x y)
        (mpy-mag1 x y)))

    (mpy-mag1
      (lambda (x y)
        (cond
          ( (null? x) (list digit0) )
          ( #t
            (mpy-mag2
              (cdr x)
              (cons digit0 y)
              (mpy-mag3 (car x) y digit0) )))))

    (mpy-mag2
      (lambda (x y partial)
        (cond
          ( (null? x) partial )
          ( #t
            (mpy-mag2
              (cdr x)
              (cons digit0 y)
              (add-magnitude
                (mpy-mag3 (car x) y digit0)
                partial))  ))))

    (mpy-mag3
      (lambda (d y carry)
        (cond
          ( (null? y) (list carry) )
          ( #t
            (cons
              (car (%digit-multiply-add d (car y) carry digit0))
              (mpy-mag3
                d
                (cdr y)
                (car(cdr
                  (%digit-multiply-add d (car y) carry digit0))))) ))))

Updated arithmetic, part 2

February 10th, 2010 by tk

Adding the %digit0 procedure completes the separation of the digit format from the number’s “public” data structure. I chose the decimal digit for convenience. An emphasis on efficiency would choose another “digit” format. The following code doesn’t even know what the digit format is. It simply passes the digit onward, leaving %digit-sum and %digit-complement with the task of dealing with the actual digit format.

One major constraint in the following code is that equivalent digits must be equal in the eq? sense.

(letrec
    ((digit0 (%digit0))
    (digit1 (car(cdr (%digit-sum
                       (%digit-complement (%digit0))
                       (%digit-complement (%digit0))
                       (%digit0)))))
    (max-digit (%digit-complement (%digit0)))

    (add
      (lambda (x y)
        (cond
          ((number? x)
            (cond
              ((number? y) (add-num x y))
              ( #t '***undefined***)))
          (#t '***undefined***))))

    (subtract
      (lambda (x y)
        (cond
          ((number? x)
            (cond
              ((number? y) (sub-num x y))
              (#t '***undefined***)))
          (#t '***undefined***))))

    (negate
      (lambda (x)
        (cond
           ((number? x)
             (%list->number (negate-list (%number->list x))))
           (#t '***undefined***))))

    (add-num
      (lambda (x y)
        (%list->number (add-num1 (%number->list x) (%number->list y)))))

    (sub-num
      (lambda (x y)
        (%list->number (add-num1 (%number->list x) (negate-list (%number->list y))))))

    (negate-list
      (lambda (x)
        (cond
          ((eq? (car x) #\0) x)
          ((eq? (car x) #\+) (cons #\- (cdr x)))
          ((eq? (car x) #\-) (cons #\+ (cdr x)))
          (#t '***undefined***))))

    (add-num1
      (lambda (x y)
        (cond
          ((eq? (car x) #\0) y)
          ((eq? (car y) #\0) x)
          ((eq? (car x) (car y)) (cons (car x) (add-magnitude (cdr x) (cdr y))))
          ((eq? (car x) #\+) (adjust-mag1 (sub-magnitude (cdr x) (cdr y))))
          ((eq? (car x) #\-) (adjust-mag1 (sub-magnitude (cdr y) (cdr x))))
          (#t '***undefined***))))

    (add-magnitude
      (lambda (x y)
        (add-mag1 x y)))

    (sub-magnitude
      (lambda (x y)
        (sub-mag1 x y)))

    (add-mag1
      (lambda (x y)
        (cond
          ((null? x)
            (cond
              ((null? y) '())
              (#t (add-mag2 (%digit-sum digit0 (car y)  digit0) (cdr x) (cdr y)))))
          ((null? y) (add-mag2 (%digit-sum (car x) digit0  digit0) (cdr x) (cdr y)))
          (#t (add-mag2 (%digit-sum (car x) (car y) digit0) (cdr x) (cdr y))))))

    (add-mag2
      (lambda (d-sum x y)
        (cond
          ((null? x)
            (cond
              ((null? y)
                (cond
                  ((eq? (car(cdr d-sum)) digit0) (list (car d-sum)))
                  (#t d-sum)))
              (#t (add-next d-sum (list digit0) y))))
          ((null? y) (add-next d-sum x (list digit0)))
          (#t (add-next d-sum x y)))))

    (add-next
      (lambda (d-sum x y)
        (cons
          (car d-sum)
          (add-mag2
            (%digit-sum (car x) (car y) (car(cdr d-sum)))
            (cdr x)
            (cdr y)))))

    (adjust-mag1
      (lambda (x)
        (cond
          ((neg-mag? x) (cons #\- (trim-mag1 (recomplement x))))
          (#t (check-zero (cons #\+ (trim-mag1 x)))))))

    (check-zero
      (lambda (x)
        (cond
          ((null? (cdr x)) '(#\0))
          (#t x))))

    (trim-mag1
      (lambda (x)
        (reverse (behead-zero (cdr (reverse x))))))

    (behead-zero
      (lambda (x)
        (cond
          ((eq? (car x) digit0) (behead-zero (cdr x)))
          (#t x))))

    (neg-mag?
      (lambda (x)
        (cond
          ((null? (cdr x)) (eq? (car x) max-digit))
          (#t (neg-mag? (cdr x))))))

    (recomplement
      (lambda (x)
        (sub-mag1 '() x)))

    (sub-mag1
      (lambda (x y)
        (cond
          ((null? x)
            (cond
              ((null? y) (list digit0))
              (#t (sub-mag2
                    (%digit-sum digit0  (%digit-complement (car y)) digit1)
                    (list digit0)
                    (cdr y)))))
          ((null? y) (sub-mag2
                       (%digit-sum (car x) (%digit-complement digit0) digit1)
                       (cdr x)
                       (list digit0)))
          (#t (sub-mag2
                (%digit-sum (car x) (%digit-complement (car y)) digit1)
                (cdr x)
                (cdr y))))))

    (sub-mag2
      (lambda (d-sum x y)
        (cond
          ((null? x)
            (cond
              ((null? y) (list
                           (car d-sum)
                           (car (%digit-sum digit0 max-digit (car(cdr d-sum))))))
              (#t (sub-next d-sum (list digit0) y))))
          ((null? y) (sub-next d-sum x (list digit0)))
          (#t (sub-next d-sum x y)))))

    (sub-next
      (lambda (d-sum x y)
        (cons
          (car d-sum)
          (sub-mag2
            (%digit-sum (car x) (%digit-complement (car y)) (car(cdr d-sum)))
            (cdr x)
            (cdr y))))))
  ; ...
)

Updated arithmetic, part 1

February 9th, 2010 by tk

As a further step in separating storage details from algorithms, I’ve created two procedures, %number->list and %list->number, to convert between the actual “hard” format, and a “soft” format we can use for experimenting.

The “soft” format is a list. The first item is the sign in character form, and the rest of the list are the digits in little-endian order.

With the new procedures, I revisited the add and subtract code, and altered it to put conversion to and from the internal format at the same application (i.e., call or invoke) level.

I also added two procedures, %digit-sum and %digit-complement, to highlight the fact that the radix used for “hard” encoding does not affect the general algorithm at all. Although I am using decimal now, the radix could be (decimal) 4 294 967 296, which is what would be expected from a 32-bit implementation.

The add-digit procedure %digit-sum began as a two argument function. With digits as operands and a list as output, adding three numbers (to include the carry) was a bit ugly. The solution was to redefine %digit-sum as a three argument function.

The %digit-sum procedure replaces a Scheme conditional tree. The internal routine eliminates a lot of argument list cons-ing. The only cons-ing needed is for producing the return value. Arithmetic is used to create the sum, speeding it up even more. As a result, I am expecting the new arithmetic to be competitive with the floating-point arithmetic of some old versions of Basic running on their original (slow) microprocessors. The up side: big-num capability. The down-side: fractions are not yet implemented.

The %digit-complement is the (radix – 1) complement. In my decimal radix system, that would be the nine’s complement. If I had used a binary radix, it would have been the one’s complement.

That leaves three items that were not implemented in assembly language: digit0, digit1, and max-digit (replacement for digit9). The updated algorithm is derived from using digit0 and digit1 as the carry digits, and digit0 and max-digit as the “sign extension” digits. Only digit0 needs to be implemented, with the others derived from digit0.

Predicates and character types in the base interpreter

January 2nd, 2010 by tk

I created a new runtime with different data structures for primitive data types; and the GC, list reader, and list writer to go with the new data types. However, the evaluator is not progressing at all.

As a result, I went back to the working base interpreter, and implemented some type predicates and some data conversion procedures in assembly language. Two tables are used to expand the library of primitive procedures. One is the dispatch table, which is indexed by number, so that there are no external names hard-bound to the primitives. The second table is the initialization table that binds names to symbol locations and dispatch indexes. This makes it easy to add new primitives.

Rather than treating characters and octets (aka bytes) as separate data types, I have gone to Unix mode and equated them. This also means that octet strings and text strings are synonymous.

This is counter to the growing use of Unicode. The problem with Unicode is that a valid Unicode-encoded text is a structured data type, rather than a simple sequence of same-sized data.

The base display procedure was changed to output octets as byte values directly to the console display, rather than a number in text form. The base read procedure was changed to input string and character literals.

The purpose of adding type predicates such as symbol?, number?, and char? is to divorce the high level Scheme code from the underlying data structures that are hard-coded by the assembly language code. This allows one to avoid exposing the built-in data structures, eliminating ugly low level manipulations such as the use of (car ’symbol) to get the anonymous type mark associated with symbols.

The data conversion procedures are added for the same purpose. You can build up the symbols from integer values with these procedures:

  • integer->char
  • list->string
  • string->symbol

and break down symbol names with inverse procedures – all without exposing the ugly details of an inefficient implementation.

The divorce also allows creating code which is easier to port to other Scheme systems. Most Schemes will be far more efficient than the version I have created.

Rethinking the machine code for Scheme

October 17th, 2009 by tk

It has been a while since I’ve worked on my Scheme implementation. I need to revisit my work.

The assembly that was generated by my last compiling attempt was a bit difficult to look at. Constantly reloading registers from HeapBase obscured the code, making it difficult to validate by inspection. There are definitely ways to make it better.

I have now found online copies of the “Lambda” papers by Guy Steele and Gerald Sussman, plus a description and listing of Steele’s Rabbit compiler. At the moment, they can be found at the (read scheme) web site listed in the Lisp/Scheme links.

Meanwhile, I am rethinking my Scheme implementation.

There were a few aspects I was especially unhappy with.

There was, of course, the repetitive reloading of the heap base pointer to get addresses of the special nodes located at the beginning of the active heap space. Every time a cons was performed, these nodes might move to a new heap space. There are two solutions. One is to keep the heap base pointer in a register, for example, the x86 EBX register. This way, whenever the GC is invoked, the new base address would be readily available. The push-eval macro would then be

(push-eval
      '((mov     edx (ebx offsetEVALSTACK offsetCDR))  ; arg2 = stack tail
        (mov     ecx eax)                    ; arg1 = item to push
        (call    m_cons1)
        (mov     (ebx offsetEVALSTACK offsetCDR) eax)  ; update (push) stack
      ))

The other (old) solution is to keep the special nodes in fixed locations, so that their addresses can be obtained without registers. We eliminate the base register and replace offsets with labels of the actual object nodes. This changes the above sequence to

(push-eval
      '((mov     edx (objEVALSTACK offsetCDR))  ; arg2 = stack tail
        (mov     ecx eax)                    ; arg1 = item to push
        (call    m_cons1)
        (mov     (objEVALSTACK offsetCDR) eax)  ; update (push) stack
      ))

Another annoyance was the long-winded comparison for determining if a node was a pair or not. If the CAR field pointed to a type node, the node was not a pair node. Every time I added a new type, the pair? function that was inlined into the evaluator had to be changed – it was a series of tests of type nodes, which kept growing. A new implementation would go back to the old atom mark style, with the type attribute as secondary information. Each node containing this special atom mark would be the start of an object.  The inlined pair? function would not need to change every time a new object data type was created.

Add to this list the lack of a continuation handling strategy for my machine code. One solution is to simply pass a continuation on to the machine code. After performing its task, the machine code would then tail-call the apply function to handle the continuation. This requires that the machine code is called without stacking a return address – the continuation replaces the return address. We can add the continuation as a hidden argument, much like OO languages add their hidden object pointer argument. In this manner, apply does not need to be aware of the presence or absence of the continuation.

A small step for compiling Scheme

March 27th, 2009 by tk

This foray into compiling Scheme uses a stack that is stored in the heap, rather than in a conventional stack array or vector. The compiler will be “naive”, producing inefficient code.

I settled on using register EAX as the top of the evaluation stack, and with the rest of the stack in a linked list. A “fixed” node points to the stack via its CDR field, but does not hold anything in its CAR field. This works well with the implemented convention of calling machine language routines with both the argument list and the result held in EAX.

First, we show a code sequence for setting up code, compiling it, and then displaying the result as a “flattened” list.

  (letrec
    ((displayx
        (lambda (x)
          (cond
            ((symbol? (car x)) (display x) (newline))
            (#t (for-each displayx x)))))
      (scheme-code '()))

    (set! scheme-code '(fn #t #f #f))
    (set! code (compile scheme-code))
    (for-each displayx code))

We eventually want to compile procedures, but for now we will compile simple expressions. Boolean constants are pretty simple – we just combine the unchanging pieces of code and send the combination back as a result.

    (compile
      (lambda (exp)
        (compile-exp exp)))

    (compile-exp
      (lambda (exp)
        (display 'compiling-exp=) (display exp) (newline)
        (cond
          ((constant? exp) (compile-constant exp))
          ((symbol? exp) (compile-symbol exp))
          (#t (compile-proccall exp)))))

    (compile-constant
      (lambda (exp)
        (cond
          ((eq? exp #t) (list push-eval true-eval))
          ((eq? exp #f) (list push-eval false-eval))
          (#t '((error exp =constant-not-implemented=))))))

    (push-eval
      '((mov     edx HeapBase)               ; get heap base
        (mov     edx (edx offsetEVALSTACK offsetCDR))  ; arg2 = stack tail
        (mov     ecx eax)                    ; arg1 = item to push
        (call    m_cons1)
        (mov     edx HeapBase)               ; get heap base
        (mov     (edx offsetEVALSTACK offsetCDR) eax)  ; update (push) stack
      ))

    (true-eval
      '((mov     edx HeapBase)
        (lea     eax (edx offsetTRUE))))

    (false-eval
      '((mov     edx HeapBase)
        (lea     eax (edx offsetFALSE))))

Haven’t yet decided how to handle symbols, but we need to state what kind of code to produce and where it will be.

    (compile-symbol
      (lambda (exp)
        '((loadvar ***unknown***))))

Procedure calls are handled by compiling the subexpressions, followed by a currently undetermined protocol for calling closures.

The subexpression results go on the stack with the leftmost expression being the deepest. We then unstack and CONS the results, effectively reversing the subexpression results and detaching the results from the stack. The EAX register ends up referencing the constructed argument list. The number of arguments that are combined into a single list is controlled by the argument list recursion. The result is compatible with my xeq1 interface protocol.

Since the code we’re constructing is a nested data structure, and we’re parsing a nested data structure (as opposed to sequential text as in the list I/O), argument evaluation order of the compiler procedures is not important.

    (compile-proccall
      (lambda (exp)
        (list
          (compile-arglist exp)
          '(callclosure***))))

    (compile-arglist
      (lambda (exp)
        (cond
          ((null? exp) (list push-eval emptylist-eval))
          (#t (list
              (compile-exp (car exp))
              (compile-arglist (cdr exp))
              cons-eval)))))

A better Scheme reader

March 26th, 2009 by tk

The previous Scheme reader was based on recursive descent. But what I really wanted was something like LR(0) parsing, in order to avoid reading an extra token. In LR(0) parsing, a starting token is put on a stack, and then a state machine decides what to do next. You can read in a token and “shift” it to a stack, “reduce” the end items on the stack to a single item, or “halt” a successful parse. Lisp is simple enough that the one item on the stack top represents a parse state.

The new method is much simpler. It has an initialization procedure that starts the stack with one token, a parsing decision iterator (in tail recursive form), and a reducing procedure. The quote read-macro processing is now in a procedure.

The reduce procedure has a check for the empty list, in case an S-expression with dot notation is malformed.

The data loss of reading an S-expression is now reduced to the one character that follows the S-expression. By requiring whitespace between top level S-expressions, we can exchange data files with other Lisps. We can also start to store S-expression data in separate files to help organize what we have.

The read-token procedure, the lexer, handles the bulk of the parsing. It is fairly big because numbers are handled very inefficiently.

    (read-object
      (lambda ()
        (letrec
          (
            ;;; make unique objects to put in stack
            (tick-token '(tick-token))
            (lpar-token '(lpar-token))
            (rpar-token '(rpar-token))
            (dot-token '(dot-token))

            (read-obj
              (lambda ()
                (read-o (list (read-token)))))

            ;;; We use eq? here to ensure we test
            ;;;   for the unique objects we created.
            (read-o
              (lambda (stack)
                (cond
                  ((eq? (car stack) lpar-token) (read-o (cons (read-token) stack)))
                  ((eq? (car stack) rpar-token) (read-o (reduce (cdr stack) '())))
                  ((eq? (car stack) tick-token) (read-o (cons (read-quote) (cdr stack))))
                  ((null? (cdr stack)) (car stack))	; terminate when only one non-delimiter on stack
                  (#t (read-o (cons (read-token) stack))))))

            ;;; Here we combine several list items
            ;;;   into a single list, and restack.
            ;;; We return a stack because only this
            ;;;   procedure knows how much to discard.
            ;;; We use eq? here to ensure we test
            ;;;   for the unique objects we created.
            (reduce
              (lambda (stack tail)
                (cond
                  ((null? (car stack)) (cons tail (cdr stack)))
                  ((eq? (car stack) lpar-token) (cons tail (cdr stack)))
                  ((eq? (car stack) dot-token) (reduce (cddr stack) (cons (cadr stack) (car tail))))
                  (#t (reduce (cdr stack) (cons (car stack) tail))))))

            ;;; prototype for a read-macro
            (read-quote
              (lambda ()
                (list 'quote (read-obj))))

            ;;;
            ;;; token code omitted
            ;;;

          )

          (read-obj))))

A reader in Scheme

March 25th, 2009 by tk

I’ve created a list reader for my little Scheme. The big flaw in it is that it always performs a one-token lookahead, and always primes the lookahead buffer when it reads. The result is that one token is lost before parsing the remaining text for an S-expression. Also switching from reading S-expressions to character I/O (in the same file) leads to loss of characters. Otherwise, it works correctly, generating the correct list structure.

The list reader in the base interpreter does not have that flaw, which is why it is used in the base interpreter’s REPL. A character lookahead buffer is all that’s needed in the base interpreter.

The flawed code is very imperative, but I wanted to make sure I wasn’t dependent on argument evaluation order.

    (read-object
      (lambda ()
        (letrec
          ((lookahead '())    ; lookahead "character"
            (token '())        ; lookahead token

            (read-obj
              (lambda ()
                (set! lookahead (read-octet))
                (read-token)
                (read-o)))

            ;;; s-expr ->
            ;;;   ' s-expr
            ;;;   ( rest-list
            ;;;   atom
            (read-o
              (lambda ()
                (letrec
                  ((object '()))
                  (cond
                    ((eq? token xtick) (read-token) (list 'quote (read-o)))
                    ((eq? token lpar) (read-token) (read-list))
                    (#t (set! object token) (read-token) object)))))

            ;;; rest-list ->
            ;;;   )
            ;;;   s-expr tail
            (read-list
              (lambda ()
                (letrec
                  ((object '()))
                  (cond
                    ((eq? token rpar) (read-token) '())
                    (#t (set! object (read-o)) (cons object (read-list-tail)))))))

            ;;; tail ->
            ;;;   )
            ;;;   . s-expr )
            ;;;   s-expr tail
            (read-list-tail
              (lambda ()
                (letrec
                  ((object '()))
                  (cond
                    ((eq? token rpar) (read-token) '())
                    ((eq? token xdot) (read-token)
                                      (set! object (read-o))
                                      (cond
                                        ((eq? token rpar) (read-token)))
                                      object)
                    (#t (set! object (read-o)) (cons object (read-list-tail)))))))

            ;;;
            ;;; token code omitted
            ;;;

          )

          (read-obj))))

Adding string and character types

March 19th, 2009 by tk

This turned out to be very easy to do. All I had to do was decide on a data structure, and then prepend it with a “type mark”. I also needed display and write procedures to make the data viewable.

I’ve opted to make the octet the fundamental data type for mass storage. So I built the character type on top of the octet type.

    (integer->char
      (lambda (x)
        (list 'char-mark (integer->octet x))))

    (char->integer
      (lambda (x)
        (octet->integer (cadr x))))

The string type is a simple follow-up. The Scheme string function requires that all its arguments are characters.

    (string
      (lambda x
        (cons 'string-mark x)))

All that’s needed is a write procedure to display it. Note that my Scheme implementation guarantees that octets are unique. Characters are not unique.

    (write-output
      (lambda (x)
        (cond
          ((null? x) (write-octet lpar) (write-octet rpar))
          ((eq? x #t) (write-octet sharp) (write-octet letter-t))
          ((eq? x #f) (write-octet sharp) (write-octet letter-f))
          ((number? x) (write-number x))
          ((char? x) (write-escaped-char-output x))
          ((string? x) (write-escaped-string-output x))
          ((octet? x) (write-escaped-octet x))
          ((octet-string? x) (write-escaped-octet-string x))
          ((procedure? x) (write-output '***unprintable***))
          ((symbol? x) (write-sym x))
          (#t (write-octet lpar)
              (write-output (car x))
              (write-tail (cdr x))
              (write-octet rpar)))))

    (write-tail
      (lambda (x)
        (cond
          ((null? x) '())
          ((eq? x #t) (write-dot) (write-octet sharp) (write-octet letter-t))
          ((eq? x #f) (write-dot) (write-octet sharp) (write-octet letter-f))
          ((number? x) (write-dot) (write-number x))
          ((char? x) (write-dot) (write-escaped-char-output x))
          ((string? x) (write-dot) (write-escaped-string-output x))
          ((octet? x) (write-dot) (write-escaped-octet x))
          ((octet-string? x) (write-dot) (write-escaped-octet-string x))
          ((procedure? x) (write-dot) (write-output '***unprintable***))
          ((symbol? x) (write-dot) (write-sym x))
          (#t (write-octet spc)
              (write-output (car x))
              (write-tail (cdr x))))))

    (write-escaped-string-output
      (lambda (x)
        (write-octet xquote)
        (for-each write-escaped-string-char (cdr x))
        (write-octet xquote)))

    (write-escaped-string-char
      (lambda (x)
        (cond
          ((eq? (cadr x) xbackslash) (write-octet xbackslash) (write-octet xbackslash))
          ((eq? (cadr x) xquote) (write-octet xbackslash) (write-octet xquote))
          (#t (write-char-output x)))))

    (write-escaped-char-output
      (lambda (x)
        (write-octet sharp)
        (write-octet xbackslash)
        (write-octet (cadr x))))

The number of type tests, and its duplication in the above code points out that I would have been better off if I had a special “type” node for “non-pairs”, and made the current type nodes the subtype nodes. Of course, address tagging would eliminate the need for any kind of type node.

I wonder if it’s appreciated that in traditional Lisp, the LAMBDA and FUNARG atoms act as type nodes for lambda expressions and closures, respectively.

Uniformity in Scheme syntax evaluation

March 15th, 2009 by tk

I went ahead and made the continuation style evaluator more uniform in the handling of syntax expressions.

I also moved the recognition test for syntax expressions up into the body of ev-c. This makes evsyntax-c a procedure that handles only syntax expressions. The interpreter is much faster now. I attribute this to the still primitive CONSing of arguments in a procedure call. The vast majority of procedure calls are by name, and the redundant test bypasses the call to evsyntax-c. This removes two levels of internal interpreter calls for each interpreted procedure call. The new code adds a redundant lookup only for syntax expressions. The lookup was already redundant when interpretation of procedure calls involved evsyntax-c.

    (syntax?
      (lambda (exp env)
        (eq? (first (bound-value (lookup (car exp) env))) prim-syntax-mark)))

    (ev-c
      (lambda (exp env c)
        (cond
          ((constant? exp) (evconst-c exp env c))
          ((symbol? exp) (evvar-c exp env c))
          ((syntax? exp env) (evsyntax-c exp env c))
          (#t (evcall-c exp env c)))))

    (evsyntax-c
      (lambda (exp env c)
        (evsyntaxc-c (bound-value (lookup (car exp) env)) exp env c)))

    (evsyntaxc-c
      (lambda (cmddef exp env c)
        (cond
          ((eq? (second cmddef) 'quote) (evquote-c exp env c))
          ((eq? (second cmddef) 'cond) (evcond-c exp env c))
          ((eq? (second cmddef) 'set!) (evset-c exp env c))
          ((eq? (second cmddef) 'letrec) (evletrec-c exp env c))
          ((eq? (second cmddef) 'lambda) (evlambda-c exp env c))
          (#t (undefined-c exp env c)))))