Converting assembly language to the Scheme/Lisp format

February 28th, 2009 by tk

I have found another code generation bug – it is necessary to add +0 in front of negative numbers. The current emphasis is on “closing the loop” – where a Scheme system base can be built by Scheme itself.

The first step to closing the loop is done. The MASM assembly language line-oriented code has been converted into Scheme forms. As noted in previous posts, I have a translator for converting the Scheme forms to MASM code.

I’ve already used the nested list feature to insert debugging code into the Scheme assembly program. The nested list feature can also be exploited to provide a small amount of code refactoring – the next step.

Starter code for Lisp-style assembly code

February 21st, 2009 by tk

I created the following list to get the assembly code generation started. This means another step toward an all-Lisp development environment.

The dot functions generate a symbol that starts with the “.” character, because the base interpreter does not allow the “.” character to start a symbol. The base interpreter will interpret, in that position, the “.” character as a single-character separator.

The “+” character in front of 0 is due to a deficiency – the code generator functions don’t handle the special numeric 0 encoding. It also shows a way to handle arbitrary expressions that must start with digits.

The symbol textequ must also be associated with the same procedure that equ and label use.

      (set! code
        (list
          (list (dot386))
          (list (dotmodel)  'flat)

          '(textequ ExitProcess <_ExitProcess@4>)
          '(extern ExitProcess:near)

          (list (dotcode))

          '(label _start near)
          '(push +0)
          '(call ExitProcess)

          (list (dotdata))

          '(end _start)))

Repeatedly applying the translate function to the list items yields

.386
.model flat
ExitProcess textequ <_ExitProcess@4>
extern ExitProcess:near
.code
_start label near
push +0
call ExitProcess
.data
end _start

When coupled with Microsoft’s Win32 API library (such as the one provided for VC++), the above code will assemble, link, and run in a 32-bit MASM environment.

Macro assembly, part 2

February 14th, 2009 by tk

If you look carefully, you can see the first attempt to create macros actually treats the “macro” as a command to generate code.

As a second attempt, we can treat a macro as a command to insert instructions. We can envision the result of the insertion as creating a nested list of instructions. Unfortunately, the label syntax means we can’t just nest it verbatim. The nested instruction sequence

((op1 a)
  (label1
    (op2 b)
    (op3 c))
  (op4))

will actually look like

((op1 a)
  (label1 (op2 b) (op3 c))
  (op4))

where label1 is interpreted as a machine instruction called label1.

After letting the issue sit for a while, I have chosen to eliminate the special label notation. We can simply use (equ label1 $) or (label label1 near) to generate an equivalent MASM instruction. And then, instead of using

'((op1 a b)
  (macro mac1)
  (op2 a b))

we can use

(list
  '(op1 a b)
  mac1
  '(op2 a b))

to nest instruction sequences.

Well, now I can see that quasi-quoting (aka backquoting) would be more elegant. However, the quasi-quoting is going to have to wait.

First attempt at macro assembly

January 24th, 2009 by tk

Now that I’ve got an interpreter that I’m satisfied with, I’m going back to the problem of compilation and assembly.

I moved the “assembler” up one level, so that it is interpreted by my high level interpreter, rather than the base interpreter. I had to copy a few of the support routines from the lower level.

Then I made an attempt at a macro facility for the assembler. It didn’t come together quite right. After I stepped back and thought about it, I realized that I had created code generators, rather than macros. That was because the “macros” wrote directly to the output file, rather than sending back data for another translation.

Here are the primary procedures:

  (translate
    (lambda (x)
      (cond
        ((symbol? x)
          (write-sym x)
          (write-octet (integer->octet 58))
          (write-newline))
        ((find-codegen (first x)) ((second (find-codegen (first x))) x))
        (#t
          (write-sym (car x))
          (cond
            ((null? (cdr x)) (quote ()))
            (#t
              (write-octet (integer->octet 32))
              (write-operand (car (cdr x)))
              (cond
                ((null? (cdr (cdr x))) (quote ()))
                (#t
                  (write-octet (integer->octet 44))
                  (write-operand (car (cdr (cdr x))))))))
          (write-newline)))))

  (find-codegen
    (lambda (x)
      (assq x codegenlist)))

  (write-operand-list-tail
    (lambda (x)
      (cond
        ((null? x) (quote ()))
        (#t
          (write-octet (integer->octet 44))
          (write-operand (car x))
          (write-operand-list-tail (cdr x))))))

  (write-newline
    (lambda ()
      (write-octet (integer->octet 13))
      (write-octet (integer->octet 10))))

By giving the code generator the entire form, it is possible to share code generators for several assembly instructions. The following shows how the code generators can be written and shared:

    ;;;
    ;;; Define code generators
    ;;;
    (set! label-op-opd
      (lambda (form)
        (write-sym (second form))
        (write-octet (integer->octet 32))   ; the space character
        (write-sym (first form))
        (write-octet (integer->octet 32))   ; the space character
        (write-operand (third form))
        (write-newline)))

    (set! op-opd*
      (lambda (form)
        (write-sym (car form))
        (write-octet (integer->octet 32))   ; the space character
        (write-operand (car(cdr form)))
        (write-operand-list-tail (cdr(cdr form)))
        (write-newline)))

    ;;;
    ;;; Bind names to code generators
    ;;;
    (set! codegenlist
      (list
        (list 'equ   label-op-opd)
        (list 'label label-op-opd)
        (list 'db    op-opd*)
        (list 'dd    op-opd*)))

    ;;;
    ;;; Run translator
    ;;;
    (open-output (cdr 'SchCode.inc))
    (for-each translate code)
    (close-output)

Pretty printing

November 30th, 2008 by tk

Finally got a modest prettyprinter together. The sprawling style I’ve been using reflects the fact that I’m using a very primitive text editor. Without help from the editor, it’s pretty hard to match parentheses without the sprawling style. The prettyprinter will generate compact notation, so that future Lisp code examples will look like this:

  (pp-linear
    (lambda (x indent)
      (cond
        ((constant? x) (display x))
        ((symbol? x) (display x))
        ((procedure? x) (display '***unprintable***))
        (#t (pp-linear-list x indent)))))

Adding continuations, assignment and letrec

November 7th, 2008 by tk

The assignment expression evaluator is fairly straightforward. The evalter-x procedure does not see any subexpressions, so we leave it alone.

    (evset-c
      (lambda (exp env c)
        (ev-c
          (third exp)
          env
          (lambda (value)
            (c (evalter-x (lookup (second exp) env) value))
          )
        )
      )
    )

Most of the hard work of handling letrec has already been done, so it’s mostly a case of converting the evaluation procedures to CPS.

    (evletrec-c
      (lambda (exp env c)
        (evletrecbody-c
          (evletvar-x (second exp))    ; create binding list for "let" variables
          (second exp)
          (cdr(cdr exp))
          env
          c
        )
      )
    )

    (evletrecbody-c
      (lambda (bindlist initlist seq env c)
        (evletrecbody2-c bindlist initlist seq (append bindlist env) c)
      )
    )

    (evletrecbody2-c
      (lambda (bindlist initlist seq env c)
        (evletbind-x bindlist (evletexp-x initlist env))
        (c (evseq-x seq env))
      )
    )

In an expression sequence, as in evletrecbody2-c, we apply the continuation only to the last expression. Subexpressions are handled by evletexp-x, so that evletbind-x does not handle any subexpressions. We first embed evletbind-x, and separately redo the expression sequence handler.

    (evletrecbody2-c
      (lambda (bindlist initlist seq env c)
        (evletexp-c
          initlist
          env
          (lambda (value-list)
            (evletbind-x bindlist value-list)
          )
        )
        (evseq-c seq env c)
      )
    )

We’ll move the sequence handler into the lambda expression.

    (evletrecbody2-c
      (lambda (bindlist initlist seq env c)
        (evletexp-c
          initlist
          env
          (lambda (value-list)
            (evletbind-x bindlist value-list)
            (evseq-c seq env c)
          )
        )
      )
    )

It becomes clear that in spite of evletbind-x not handling subexpressions, we will need to provide a continuation passing version.

    (evletrecbody2-c
      (lambda (bindlist initlist seq env c)
        (evletexp-c
          initlist
          env
          (lambda (value-list)
            (evletbind-c
              bindlist
              value-list
              (lambda (bitbucket)
                (evseq-c seq env c)
              )
            )
          )
        )
      )
    )

    (evletbind-c
      (lambda (bindlist vallist c)
        (c (evletbind-x bindlist vallist))
      )
    )

And then, finally, the initialization expressions of letrec are handled similarly to procedure arguments.

    (evletexp-c
      (lambda (initlist env c)
        (cond
          ( (null? initlist) (c '()) )
          ( #t
            (evletexp-c
              (cdr initlist)
              env
              (lambda (value-list)
                (ev-c
                  (second (car initlist))
                  env
                  (lambda (value)
                    (c (cons value value-list))
                  )
                )
              )
            )
          )
        )
      )
    )

All that’s left is to change our apply procedure, and we can eliminate much of the old non-continuation code.

    (apply
      (lambda (p args)
        (evapply-c p args yield)
      )
    )

Adding continuations, conditionals

November 5th, 2008 by tk

The first transformation of the conditional evaluator is straightforward.

    (evcond-c
      (lambda (exp env c)
        (cond
          ( (null? (cdr exp))  (c (undefined-x)) )
          ( #t
            (c (evcondlist-x
              (ev-x (car (car (cdr exp))) env)
              (cdr exp)
              env
            ))
          )
        )
      )
    )

We twist the code to bring the subexpression evaluator into tail call position.

    (evcond-c
      (lambda (exp env c)
        (cond
          ( (null? (cdr exp))  (c (undefined-x)) )
          ( #t
            (ev-c
              (car (car (cdr exp)))
              env
              (lambda (test-value)
                (c (evcondlist-x
                  test-value
                  (cdr exp)
                  env
                ))
              )
            )
          )
        )
      )
    )

Then we twist again to bring the list sequencing procedure call into tail call position.

    (evcond-c
      (lambda (exp env c)
        (cond
          ( (null? (cdr exp))  (c (undefined-x)) )
          ( #t
            (ev-c
              (car (car (cdr exp)))
              env
              (lambda (test-value)
                (evcondlist-c test-value (cdr exp) env c)
              )
            )
          )
        )
      )
    )

Since the list sequencer for the conditional expression has nearly identical code, we show only the results:

    (evcondlist-c
      (lambda (val condlist env c)
        (cond
          ( val                     (evseq-c (cdr (car condlist)) env c) )
          ( (null? (cdr condlist))  (c (undefined-x)) )
          ( #t
            (ev-c
              (car (car (cdr condlist)))
              env
              (lambda (test-value)
                (evcondlist-c test-value (cdr condlist) env c)
              )
            )
          )
        )
      )
    )

Adding continuations, expression sequences

November 5th, 2008 by tk

The first transformation of the expression sequence evaluator is the following:

    (evseq-c
      (lambda (seq env c)
        (cond
          ( (null? seq)   (c (undefined-x)) )
          ( #t            (evseqlist-c seq env c) )
        )
      )
    )

    (evseqlist-c
      (lambda (seq env c)
        (cond
          ( (null? (cdr seq))  (ev-c (car seq) env c) )
          ( #t
            (ev-x
              (car seq)
              env
            )
            (evseqlist-c (cdr seq) env c)
          )
        )
      )
    )

As can be seen, the evaluator was not converted back to a pure applicative form. The continuation for ev-x is not c; it’s the whole expression (evseqlist-c (cdr seq) env c). So the next transformation requires a lambda expression.

    (evseqlist-c
      (lambda (seq env c)
        (cond
          ( (null? (cdr seq))  (ev-c (car seq) env c) )
          ( #t
            (ev-c
              (car seq)
              env
              (lambda (bitbucket)
                (evseqlist-c (cdr seq) env c)
              )
            )
          )
        )
      )
    )

We’ve created a procedure that ignores its argument. For uniformity, we created the continuation with a single formal argument. It also prevents complaints from a system which performs arity checks. The last expression in a sequence receives the original continuation, which was easier to see before we wrapped it in a new continuation.

Adding continuations, procedure calls

November 3rd, 2008 by tk

At this point, the evaluator for the procedure call expression is (reformatted)

    (evcall-c
      (lambda (exp env c)
        (c (evcallv-x
             (evlist-x
               exp
               env
             )
        ))
      )
    )

The evlist-x procedure recursively invokes the expression evaluator. So we would like to have its call in a tail call position. As with the continuation c alone, we can push the entire “prefix” into the evlist-x procedure to put the evlist-x call in the tail call position. We use the Scheme lambda and its lexical scoping to ensure that the continuation used on the result of evcallv-x is still the same continuation given to evcall-c. And, of course, we create evlist-c when we add the continuation argument.

    (evcall-c
      (lambda (exp env c)
        (evlist-c
          exp
          env
          (lambda (explist)
            (c (evcallv-x explist))
          )
        )
      )
    )

    (evlist-c
      (lambda (explist env c)
        (c (evlist-x explist env))
      )
    )

This is pretty much the essence of the continuation passing style (CPS). We’re basically creating a control stack by putting the current continuation inside a new continuation. Each continuation is a closure, and each continuation closure acts as a stack frame. We use tail call along the recursion path. And we pass the continuation down to some end procedure that will execute it in tail call position.

Next, we inline the evlist-x code into evlist-c and distribute the continuation call to the branches of the cond expression. We can do this if the cond tests are trivial. As can be seen below, we need to figure out how to handle the two recursive calls that are in an argument list.

    (evlist-c
      (lambda (explist env c)
        (cond
          ( (null? explist) (c '()) )
          ( #t
            (c (cons
              (ev-x (car explist) env)
              (evlist-x
                (cdr explist)
                env
              )
            ))
          )
        )
      )
    )

The innermost nonprimitive calls are ev-x and evlist-x. We build the next continuation based on which of these two we choose as the next tail call.

In terms that other writers use, the continuation which is the “rest of the computation” is the lambda expression, and the “hole” in the continuation is the single argument for this lambda expression.

If we choose to call evlist-c as the next function, then our first “twist” of the code yields the following:

    (evlist-c
      (lambda (explist env c)
        (cond
          ( (null? explist) (c '()) )
          ( #t
            (evlist-c
              (cdr explist)
              env
              ;; continuation 1
              (lambda (value-list)
                (c (cons
                    (ev-x (car explist) env)
                    value-list
                ))
              )
            )
          )
        )
      )
    )

Putting the call to ev-x in the continuation means we recurse through evlist-c before evaluating the “first” argument. This effectively causes right-to-left evaluation of the function arguments being interpreted.

The second twist replaces ev-x in argument position with ev-c in tail call position.

    (evlist-c
      (lambda (explist env c)
        (cond
          ( (null? explist) (c '()) )
          ( #t
            (evlist-c
              (cdr explist)
              env
              ;; continuation 1
              (lambda (value-list)
                (ev-c
                  (car explist)
                  env
                  ;; continuation 2
                  (lambda (value)
                    (c (cons value value-list))
                  )
                )
              )
            )
          )
        )
      )
    )

The argument list evaluation is completely transformed, and now we go back to the procedure call evaluator:

    (evcall-c
      (lambda (exp env c)
        (evlist-c
          exp
          env
          (lambda (explist)
            (c (evcallv-x explist))
          )
        )
      )
    )

As long as the “prefix” is only the continuation argument c, we don’t need to create a lambda expression.

    (evcall-c
      (lambda (exp env c)
        (evlist-c
          exp
          env
          (lambda (explist)
            (evcallv-c explist c)
          )
        )
      )
    )

    (evcallv-c
      (lambda (valuelist c)
        (evapply-c (car valuelist) (cdr valuelist) c)
      )
    )

    (evapply-c
      (lambda (cmddef arglist c)
        (cond
          ( (eq? (first cmddef) mach-mark)   (c (mach-call-x cmddef arglist)) )
          ( (eq? (first cmddef) prim-mark)   (c (prim-call-x cmddef arglist)) )
          ( (eq? (first cmddef) proc-mark)   (proc-call-c cmddef arglist c) )
          ( #t                               (c (undefined-x)) )
        )
      )
    )

    (proc-call-c
      (lambda (cmddef arglist c)
        (evseq-c
          (procbody cmddef)
          (add-args (procformals cmddef) arglist (procenv cmddef))
          c
        )
      )
    )

We were able to skip transforming most of proc-call-c because the procedure calls do not evaluate subexpressions. We are just binding values that have already been generated, and extracting information out of a closure (procedure) structure.

Adding continuations

November 2nd, 2008 by tk

With tail call optimization implemented, I converted the set!-rich version of the evaluator back to a more function-style version. I decided not to give the details for that, even though it produced a version that was not quite the same as where I started from. With tail call optimization implemented, we know that we can limit stack growth by shoving as much computation as possible into tail calls.

Continuations are a bit difficult to understand at first. The notion of “the rest of the computation” is easy enough. The difficulty is in seeing how the continuation “fits” into the evaluation process.

I have looked at Reynolds’s tutorial paper, Definitional Interpreters for Higher-Order Programming Languages (Proceedings of 25th ACM National Conference, 1972), that shows one evolution of Scheme-like continuations. There are actually two versions of the continuation system presented there. There’s a continuation, which is the directly executable “rest of the computation” procedure, and a cont function which handles continuations in the form of “micro-ops”.

With the help of the Reynolds paper, I have adapted the evaluator into a set of procedures using continuation passing style. It will take a few days to describe the changes.

As in the Reynolds paper, we start with the directly executable version. In order to handle environments, I started with the apply procedure. But to handle continuations, it seemed more natural to start with the eval procedure.

We create a new ev-c procedure that evaluates an expression and then uses thse result as an argument to a continuation. The continuation is explicitly exposed as an argument to ev-c. A continuation that returns its argument value is supplied. The purpose of this continuation is to stop the chain of continuations that we will create, and in the process, yield a value to the caller of eval. I named this continuation yield.

    (yield (lambda (x) x) )

    (eval
      (lambda (x env)
        (ev-c x env yield)
      )
    )

    (ev-c
      (lambda (exp env c)
        (c (ev-x exp env))
      )
    )

As may be guessed, ev-x is the reconstituted function-style evaluator. We can inline the ev-x procedure, and then instead of keeping the continuation as an outer level call, we can distribute the continuation call, as inner level calls.

    (ev-c
      (lambda (exp env c)
        (cond
          ( (constant? exp)      (c (evconst-x exp)) )
          ( (symbol? exp)        (c (evvar-x exp env)) )
          ( (symbol? (car exp))  (c (evsyntax-x exp env)) )
          ( #t                   (c (evcall-x exp env)) )
        )
      )
    )

At this point, we want to make a distinction between two kinds of procedures. The Reynolds paper calls them trivial and serious. Trivial procedures are ones that are known to always terminate and yield a result. The serious procedures are all the others.

We know that evconst-x is a trivial call. It always returns the exp argument. The evvar-x is also a trivial call. It always returns with a result (as long as we don’t create or supply an invalid environment). Basically, neither of these procedures will call an evaluator procedure to (recursively) handle subexpressions.

On the other hand, the expressions that are supplied to our evaluator may be (recursively) nonterminating by design. To handle these expressions, we want our evaluator to recurse via tail calls, to take advantage of tail call optimization. To do this, we pass along the continuation, rather than waiting for the evaluator procedure to return.

    (ev-c
      (lambda (exp env c)
        (cond
          ( (constant? exp)      (c (evconst-x exp)) )
          ( (symbol? exp)        (c (evvar-x exp env)) )
          ( (symbol? (car exp))  (evsyntax-c exp env c) )
          ( #t                   (evcall-c exp env c) )
        )
      )
    )

    (evsyntax-c
      (lambda (exp env c)
        (evsyntaxc-c (bound-value (lookup (car exp) env)) exp env c)
      )
    )

    (evsyntaxc-c
      (lambda (cmddef exp env c)
        (cond
          (
            (eq? (first cmddef) prim-syntax-mark)
            (cond
              ( (eq? (second cmddef) 'quote)     (c (evquote-x exp)) )
              ( (eq? (second cmddef) 'cond)      (evcond-c exp env c) )
              ( (eq? (second cmddef) 'set!)      (evset-c exp env c) )
              ( (eq? (second cmddef) 'letrec)    (evletrec-c exp env c) )
              ( (eq? (second cmddef) 'lambda)    (c (evlambda-x exp env)) )
              ( #t                               (c (undefined-x)) )
            )
          )
          ( #t                                   (evcall-c exp env c) )
        )
      )
    )

    (evcall-c
      (lambda (exp env c)
        (c (evcallv-x (evlist-x exp env)) )
      )
    )

Three of the syntax expressions (quote, lambda, undefined) are recognized as trivial. It might also be noted that we could distribute the continuation to the cond branches because all of the cond tests had trivial procedure calls only.

Next, we will look at how the procedure call might be handled.