A better Scheme reader

March 26th, 2009 by tk

The previous Scheme reader was based on recursive descent. But what I really wanted was something like LR(0) parsing, in order to avoid reading an extra token. In LR(0) parsing, a starting token is put on a stack, and then a state machine decides what to do next. You can read in a token and “shift” it to a stack, “reduce” the end items on the stack to a single item, or “halt” a successful parse. Lisp is simple enough that the one item on the stack top represents a parse state.

The new method is much simpler. It has an initialization procedure that starts the stack with one token, a parsing decision iterator (in tail recursive form), and a reducing procedure. The quote read-macro processing is now in a procedure.

The reduce procedure has a check for the empty list, in case an S-expression with dot notation is malformed.

The data loss of reading an S-expression is now reduced to the one character that follows the S-expression. By requiring whitespace between top level S-expressions, we can exchange data files with other Lisps. We can also start to store S-expression data in separate files to help organize what we have.

The read-token procedure, the lexer, handles the bulk of the parsing. It is fairly big because numbers are handled very inefficiently.

    (read-object
      (lambda ()
        (letrec
          (
            ;;; make unique objects to put in stack
            (tick-token '(tick-token))
            (lpar-token '(lpar-token))
            (rpar-token '(rpar-token))
            (dot-token '(dot-token))

            (read-obj
              (lambda ()
                (read-o (list (read-token)))))

            ;;; We use eq? here to ensure we test
            ;;;   for the unique objects we created.
            (read-o
              (lambda (stack)
                (cond
                  ((eq? (car stack) lpar-token) (read-o (cons (read-token) stack)))
                  ((eq? (car stack) rpar-token) (read-o (reduce (cdr stack) '())))
                  ((eq? (car stack) tick-token) (read-o (cons (read-quote) (cdr stack))))
                  ((null? (cdr stack)) (car stack))	; terminate when only one non-delimiter on stack
                  (#t (read-o (cons (read-token) stack))))))

            ;;; Here we combine several list items
            ;;;   into a single list, and restack.
            ;;; We return a stack because only this
            ;;;   procedure knows how much to discard.
            ;;; We use eq? here to ensure we test
            ;;;   for the unique objects we created.
            (reduce
              (lambda (stack tail)
                (cond
                  ((null? (car stack)) (cons tail (cdr stack)))
                  ((eq? (car stack) lpar-token) (cons tail (cdr stack)))
                  ((eq? (car stack) dot-token) (reduce (cddr stack) (cons (cadr stack) (car tail))))
                  (#t (reduce (cdr stack) (cons (car stack) tail))))))

            ;;; prototype for a read-macro
            (read-quote
              (lambda ()
                (list 'quote (read-obj))))

            ;;;
            ;;; token code omitted
            ;;;

          )

          (read-obj))))

Leave a Reply