Archive for March, 2009

A small step for compiling Scheme

Friday, March 27th, 2009

This foray into compiling Scheme uses a stack that is stored in the heap, rather than in a conventional stack array or vector. The compiler will be “naive”, producing inefficient code.

I settled on using register EAX as the top of the evaluation stack, and with the rest of the stack in a linked list. A “fixed” node points to the stack via its CDR field, but does not hold anything in its CAR field. This works well with the implemented convention of calling machine language routines with both the argument list and the result held in EAX.

First, we show a code sequence for setting up code, compiling it, and then displaying the result as a “flattened” list.

  (letrec
    ((displayx
        (lambda (x)
          (cond
            ((symbol? (car x)) (display x) (newline))
            (#t (for-each displayx x)))))
      (scheme-code '()))

    (set! scheme-code '(fn #t #f #f))
    (set! code (compile scheme-code))
    (for-each displayx code))

We eventually want to compile procedures, but for now we will compile simple expressions. Boolean constants are pretty simple – we just combine the unchanging pieces of code and send the combination back as a result.

    (compile
      (lambda (exp)
        (compile-exp exp)))

    (compile-exp
      (lambda (exp)
        (display 'compiling-exp=) (display exp) (newline)
        (cond
          ((constant? exp) (compile-constant exp))
          ((symbol? exp) (compile-symbol exp))
          (#t (compile-proccall exp)))))

    (compile-constant
      (lambda (exp)
        (cond
          ((eq? exp #t) (list push-eval true-eval))
          ((eq? exp #f) (list push-eval false-eval))
          (#t '((error exp =constant-not-implemented=))))))

    (push-eval
      '((mov     edx HeapBase)               ; get heap base
        (mov     edx (edx offsetEVALSTACK offsetCDR))  ; arg2 = stack tail
        (mov     ecx eax)                    ; arg1 = item to push
        (call    m_cons1)
        (mov     edx HeapBase)               ; get heap base
        (mov     (edx offsetEVALSTACK offsetCDR) eax)  ; update (push) stack
      ))

    (true-eval
      '((mov     edx HeapBase)
        (lea     eax (edx offsetTRUE))))

    (false-eval
      '((mov     edx HeapBase)
        (lea     eax (edx offsetFALSE))))

Haven’t yet decided how to handle symbols, but we need to state what kind of code to produce and where it will be.

    (compile-symbol
      (lambda (exp)
        '((loadvar ***unknown***))))

Procedure calls are handled by compiling the subexpressions, followed by a currently undetermined protocol for calling closures.

The subexpression results go on the stack with the leftmost expression being the deepest. We then unstack and CONS the results, effectively reversing the subexpression results and detaching the results from the stack. The EAX register ends up referencing the constructed argument list. The number of arguments that are combined into a single list is controlled by the argument list recursion. The result is compatible with my xeq1 interface protocol.

Since the code we’re constructing is a nested data structure, and we’re parsing a nested data structure (as opposed to sequential text as in the list I/O), argument evaluation order of the compiler procedures is not important.

    (compile-proccall
      (lambda (exp)
        (list
          (compile-arglist exp)
          '(callclosure***))))

    (compile-arglist
      (lambda (exp)
        (cond
          ((null? exp) (list push-eval emptylist-eval))
          (#t (list
              (compile-exp (car exp))
              (compile-arglist (cdr exp))
              cons-eval)))))

A better Scheme reader

Thursday, March 26th, 2009

The previous Scheme reader was based on recursive descent. But what I really wanted was something like LR(0) parsing, in order to avoid reading an extra token. In LR(0) parsing, a starting token is put on a stack, and then a state machine decides what to do next. You can read in a token and “shift” it to a stack, “reduce” the end items on the stack to a single item, or “halt” a successful parse. Lisp is simple enough that the one item on the stack top represents a parse state.

The new method is much simpler. It has an initialization procedure that starts the stack with one token, a parsing decision iterator (in tail recursive form), and a reducing procedure. The quote read-macro processing is now in a procedure.

The reduce procedure has a check for the empty list, in case an S-expression with dot notation is malformed.

The data loss of reading an S-expression is now reduced to the one character that follows the S-expression. By requiring whitespace between top level S-expressions, we can exchange data files with other Lisps. We can also start to store S-expression data in separate files to help organize what we have.

The read-token procedure, the lexer, handles the bulk of the parsing. It is fairly big because numbers are handled very inefficiently.

    (read-object
      (lambda ()
        (letrec
          (
            ;;; make unique objects to put in stack
            (tick-token '(tick-token))
            (lpar-token '(lpar-token))
            (rpar-token '(rpar-token))
            (dot-token '(dot-token))

            (read-obj
              (lambda ()
                (read-o (list (read-token)))))

            ;;; We use eq? here to ensure we test
            ;;;   for the unique objects we created.
            (read-o
              (lambda (stack)
                (cond
                  ((eq? (car stack) lpar-token) (read-o (cons (read-token) stack)))
                  ((eq? (car stack) rpar-token) (read-o (reduce (cdr stack) '())))
                  ((eq? (car stack) tick-token) (read-o (cons (read-quote) (cdr stack))))
                  ((null? (cdr stack)) (car stack))	; terminate when only one non-delimiter on stack
                  (#t (read-o (cons (read-token) stack))))))

            ;;; Here we combine several list items
            ;;;   into a single list, and restack.
            ;;; We return a stack because only this
            ;;;   procedure knows how much to discard.
            ;;; We use eq? here to ensure we test
            ;;;   for the unique objects we created.
            (reduce
              (lambda (stack tail)
                (cond
                  ((null? (car stack)) (cons tail (cdr stack)))
                  ((eq? (car stack) lpar-token) (cons tail (cdr stack)))
                  ((eq? (car stack) dot-token) (reduce (cddr stack) (cons (cadr stack) (car tail))))
                  (#t (reduce (cdr stack) (cons (car stack) tail))))))

            ;;; prototype for a read-macro
            (read-quote
              (lambda ()
                (list 'quote (read-obj))))

            ;;;
            ;;; token code omitted
            ;;;

          )

          (read-obj))))

A reader in Scheme

Wednesday, March 25th, 2009

I’ve created a list reader for my little Scheme. The big flaw in it is that it always performs a one-token lookahead, and always primes the lookahead buffer when it reads. The result is that one token is lost before parsing the remaining text for an S-expression. Also switching from reading S-expressions to character I/O (in the same file) leads to loss of characters. Otherwise, it works correctly, generating the correct list structure.

The list reader in the base interpreter does not have that flaw, which is why it is used in the base interpreter’s REPL. A character lookahead buffer is all that’s needed in the base interpreter.

The flawed code is very imperative, but I wanted to make sure I wasn’t dependent on argument evaluation order.

    (read-object
      (lambda ()
        (letrec
          ((lookahead '())    ; lookahead "character"
            (token '())        ; lookahead token

            (read-obj
              (lambda ()
                (set! lookahead (read-octet))
                (read-token)
                (read-o)))

            ;;; s-expr ->
            ;;;   ' s-expr
            ;;;   ( rest-list
            ;;;   atom
            (read-o
              (lambda ()
                (letrec
                  ((object '()))
                  (cond
                    ((eq? token xtick) (read-token) (list 'quote (read-o)))
                    ((eq? token lpar) (read-token) (read-list))
                    (#t (set! object token) (read-token) object)))))

            ;;; rest-list ->
            ;;;   )
            ;;;   s-expr tail
            (read-list
              (lambda ()
                (letrec
                  ((object '()))
                  (cond
                    ((eq? token rpar) (read-token) '())
                    (#t (set! object (read-o)) (cons object (read-list-tail)))))))

            ;;; tail ->
            ;;;   )
            ;;;   . s-expr )
            ;;;   s-expr tail
            (read-list-tail
              (lambda ()
                (letrec
                  ((object '()))
                  (cond
                    ((eq? token rpar) (read-token) '())
                    ((eq? token xdot) (read-token)
                                      (set! object (read-o))
                                      (cond
                                        ((eq? token rpar) (read-token)))
                                      object)
                    (#t (set! object (read-o)) (cons object (read-list-tail)))))))

            ;;;
            ;;; token code omitted
            ;;;

          )

          (read-obj))))

Adding string and character types

Thursday, March 19th, 2009

This turned out to be very easy to do. All I had to do was decide on a data structure, and then prepend it with a “type mark”. I also needed display and write procedures to make the data viewable.

I’ve opted to make the octet the fundamental data type for mass storage. So I built the character type on top of the octet type.

    (integer->char
      (lambda (x)
        (list 'char-mark (integer->octet x))))

    (char->integer
      (lambda (x)
        (octet->integer (cadr x))))

The string type is a simple follow-up. The Scheme string function requires that all its arguments are characters.

    (string
      (lambda x
        (cons 'string-mark x)))

All that’s needed is a write procedure to display it. Note that my Scheme implementation guarantees that octets are unique. Characters are not unique.

    (write-output
      (lambda (x)
        (cond
          ((null? x) (write-octet lpar) (write-octet rpar))
          ((eq? x #t) (write-octet sharp) (write-octet letter-t))
          ((eq? x #f) (write-octet sharp) (write-octet letter-f))
          ((number? x) (write-number x))
          ((char? x) (write-escaped-char-output x))
          ((string? x) (write-escaped-string-output x))
          ((octet? x) (write-escaped-octet x))
          ((octet-string? x) (write-escaped-octet-string x))
          ((procedure? x) (write-output '***unprintable***))
          ((symbol? x) (write-sym x))
          (#t (write-octet lpar)
              (write-output (car x))
              (write-tail (cdr x))
              (write-octet rpar)))))

    (write-tail
      (lambda (x)
        (cond
          ((null? x) '())
          ((eq? x #t) (write-dot) (write-octet sharp) (write-octet letter-t))
          ((eq? x #f) (write-dot) (write-octet sharp) (write-octet letter-f))
          ((number? x) (write-dot) (write-number x))
          ((char? x) (write-dot) (write-escaped-char-output x))
          ((string? x) (write-dot) (write-escaped-string-output x))
          ((octet? x) (write-dot) (write-escaped-octet x))
          ((octet-string? x) (write-dot) (write-escaped-octet-string x))
          ((procedure? x) (write-dot) (write-output '***unprintable***))
          ((symbol? x) (write-dot) (write-sym x))
          (#t (write-octet spc)
              (write-output (car x))
              (write-tail (cdr x))))))

    (write-escaped-string-output
      (lambda (x)
        (write-octet xquote)
        (for-each write-escaped-string-char (cdr x))
        (write-octet xquote)))

    (write-escaped-string-char
      (lambda (x)
        (cond
          ((eq? (cadr x) xbackslash) (write-octet xbackslash) (write-octet xbackslash))
          ((eq? (cadr x) xquote) (write-octet xbackslash) (write-octet xquote))
          (#t (write-char-output x)))))

    (write-escaped-char-output
      (lambda (x)
        (write-octet sharp)
        (write-octet xbackslash)
        (write-octet (cadr x))))

The number of type tests, and its duplication in the above code points out that I would have been better off if I had a special “type” node for “non-pairs”, and made the current type nodes the subtype nodes. Of course, address tagging would eliminate the need for any kind of type node.

I wonder if it’s appreciated that in traditional Lisp, the LAMBDA and FUNARG atoms act as type nodes for lambda expressions and closures, respectively.

Uniformity in Scheme syntax evaluation

Sunday, March 15th, 2009

I went ahead and made the continuation style evaluator more uniform in the handling of syntax expressions.

I also moved the recognition test for syntax expressions up into the body of ev-c. This makes evsyntax-c a procedure that handles only syntax expressions. The interpreter is much faster now. I attribute this to the still primitive CONSing of arguments in a procedure call. The vast majority of procedure calls are by name, and the redundant test bypasses the call to evsyntax-c. This removes two levels of internal interpreter calls for each interpreted procedure call. The new code adds a redundant lookup only for syntax expressions. The lookup was already redundant when interpretation of procedure calls involved evsyntax-c.

    (syntax?
      (lambda (exp env)
        (eq? (first (bound-value (lookup (car exp) env))) prim-syntax-mark)))

    (ev-c
      (lambda (exp env c)
        (cond
          ((constant? exp) (evconst-c exp env c))
          ((symbol? exp) (evvar-c exp env c))
          ((syntax? exp env) (evsyntax-c exp env c))
          (#t (evcall-c exp env c)))))

    (evsyntax-c
      (lambda (exp env c)
        (evsyntaxc-c (bound-value (lookup (car exp) env)) exp env c)))

    (evsyntaxc-c
      (lambda (cmddef exp env c)
        (cond
          ((eq? (second cmddef) 'quote) (evquote-c exp env c))
          ((eq? (second cmddef) 'cond) (evcond-c exp env c))
          ((eq? (second cmddef) 'set!) (evset-c exp env c))
          ((eq? (second cmddef) 'letrec) (evletrec-c exp env c))
          ((eq? (second cmddef) 'lambda) (evlambda-c exp env c))
          (#t (undefined-c exp env c)))))