Archive for October, 2009

Rethinking the machine code for Scheme

Saturday, October 17th, 2009

It has been a while since I’ve worked on my Scheme implementation. I need to revisit my work.

The assembly that was generated by my last compiling attempt was a bit difficult to look at. Constantly reloading registers from HeapBase obscured the code, making it difficult to validate by inspection. There are definitely ways to make it better.

I have now found online copies of the “Lambda” papers by Guy Steele and Gerald Sussman, plus a description and listing of Steele’s Rabbit compiler. At the moment, they can be found at the (read scheme) web site listed in the Lisp/Scheme links.

Meanwhile, I am rethinking my Scheme implementation.

There were a few aspects I was especially unhappy with.

There was, of course, the repetitive reloading of the heap base pointer to get addresses of the special nodes located at the beginning of the active heap space. Every time a cons was performed, these nodes might move to a new heap space. There are two solutions. One is to keep the heap base pointer in a register, for example, the x86 EBX register. This way, whenever the GC is invoked, the new base address would be readily available. The push-eval macro would then be

(push-eval
      '((mov     edx (ebx offsetEVALSTACK offsetCDR))  ; arg2 = stack tail
        (mov     ecx eax)                    ; arg1 = item to push
        (call    m_cons1)
        (mov     (ebx offsetEVALSTACK offsetCDR) eax)  ; update (push) stack
      ))

The other (old) solution is to keep the special nodes in fixed locations, so that their addresses can be obtained without registers. We eliminate the base register and replace offsets with labels of the actual object nodes. This changes the above sequence to

(push-eval
      '((mov     edx (objEVALSTACK offsetCDR))  ; arg2 = stack tail
        (mov     ecx eax)                    ; arg1 = item to push
        (call    m_cons1)
        (mov     (objEVALSTACK offsetCDR) eax)  ; update (push) stack
      ))

Another annoyance was the long-winded comparison for determining if a node was a pair or not. If the CAR field pointed to a type node, the node was not a pair node. Every time I added a new type, the pair? function that was inlined into the evaluator had to be changed – it was a series of tests of type nodes, which kept growing. A new implementation would go back to the old atom mark style, with the type attribute as secondary information. Each node containing this special atom mark would be the start of an object.  The inlined pair? function would not need to change every time a new object data type was created.

Add to this list the lack of a continuation handling strategy for my machine code. One solution is to simply pass a continuation on to the machine code. After performing its task, the machine code would then tail-call the apply function to handle the continuation. This requires that the machine code is called without stacking a return address – the continuation replaces the return address. We can add the continuation as a hidden argument, much like OO languages add their hidden object pointer argument. In this manner, apply does not need to be aware of the presence or absence of the continuation.