Archive for March, 2007

Bootstrapping Lisp – loading programs

Wednesday, March 28th, 2007

So far we’ve only dealt with the internal format. Every aspect, including test programs, has been handled in assembly language code.

With conventional machine code, we would use the numbers known as addresses to reference data and instructions; and we would use numbers to represent each instruction atom. However, there is no absolute numeric definition of the instructions, either in absolute numbers or in absolute addresses or offsets. So it seems more convenient to use some kind of human readable format for loading programs.

For now, I’ve settled on a fixed-size instruction format, quadruples. Extra operands are added to pad instructions to four atoms. The extra operands are ignored by the interpreter. One “instruction” known only to the “boot loader” is the label instruction. Even with this simple format, it takes four passes to construct the full instruction list and label bindings (in the form of Lisp lists) without subroutines. Getting a development system up and running currently had higher priority than creating coding idioms.

The label bindings are kept in a table that is private to the boot loader. Before the instruction list can be executed via IFGO, the label bindings must be passed on to the interpreter. One solution is to extend the IFGO instruction to allow the replacement of the current association list. Another solution is to simply write another interpreting layer on top of the base interpreter, which bypasses the problem. A third solution is to hack the bindings using dynamic code.

Dynamic code is the set of instructions created by the base code and executed in the same process as the base code. The reason for using dynamic code is the lack of indirection when binding. When we have a label binding binding = (lbl . ilist) in the label table, we want to introduce the binding to the interpreter as if we were using the Lisp SET function: set[quote[y]; car[binding]]; set[y; cdr[binding]]. Our instruction set only allows the SETQ binding set[quote[y]; cdr[binding]], so the instruction for set[quote[lbl]; cdr[binding]] will need to be constructed, and then executed.

After solving the label binding problem, further development of the programming system does not require direct use of assembly language. We can generate assembly code if we need to create a new base system.

Bootstrapping Lisp – altered byte data and branching, byte generation

Wednesday, March 14th, 2007

Byte data was originally implemented as nonatomic nodes. They are now byte atoms.

I changed the IFGO label atom to an instruction list atom. The IFGO branches by replacing the current instruction list (in the machine state), using the instruction list atom’s value. This removes the need to skip labels and the need for label processing in the interpreter, and moves the notion of embedded labels to an, as yet unwritten, assembler for the instruction set. It also removes the need for keeping a full instruction list in the top level of the machine state, moving the full instruction list (or fragments of it) into the environment. The primary reason for this change is to provide a means for creating on-the-fly and self-modifying code.

I also created a BYTE function as a way to generate arbitrary byte atoms. For parsing, it allows creating byte values that can be used for comparisons. Obviously, it is useful for generating arbitrary files. There is currently no instruction that allows a “quoted” or “immediate” form of data. Strings must be bound to atoms before they can be used. So the solution used was to encode numeric values in a “bit list”, using true and false atoms to represent 1 and 0, respectively. (Actually, similarly to IFGO, only the true atom will be interpreted as a 1 bit. All others are interpreted as a 0 bit.)

Bootstrapping Lisp – correcting to prefix notation

Saturday, March 10th, 2007

The instruction format I chose, (result-atom . s-expr), didn’t follow the Lisp convention of prefix format. So I engaged in a correction phase that put the result atom as the last atom of the instruction, instead of the first. This positions the instruction atom as the first item in the list representing an instruction, as per Lisp convention.

This destroys an equivalency with the conventional Lisp functions. With the old format, we could define the tail end of the instruction as a conventional Lisp function call. The new instructions have names that indicate their new status as “binding” versions of the corresponding Lisp functions.

With this change, in addition to getting a conventional instruction format, we get an instruction set that is very much like a conventional “hardwired” instruction set – with list processing instructions instead of arithmetic and logic instructions.

It’s not clear whether this change was a good idea or not. There are some Lisp systems that generate VM code in the style of Pascal P-code and Java bytecode. So one can argue that it isn’t necessary to follow convention at the lowest level. Nevertheless, the change is a done deal.

Bootstrapping Lisp – predicates and branches

Tuesday, March 6th, 2007

At this point I modified the sequential Lisp to provide conditional jumps with an IFGO instruction. The result atom of the IFGO instruction is bound to NIL. The IFGO instruction has a “test” atom that is supposed to have either the value true or false. We introduce atoms for these Boolean values. And, for now, false is not represented by NIL.

We create a destination label by inserting an atom between instructions in the instruction list. The label itself is treated as a no-op instruction – an instruction that does nothing but take up space (and time). In order to find the location of a label, we need to have access to the entire instruction list – not just the “tail” of the instruction list. The machine state was modified to include the instruction list that would be searched for labels.

A predicate is a function that returns true or false.

It might be noted that the predicate EQ, introduced in this build, does not return NIL for false. EQ is probably best described as an “identity” function because it returns a value of true when the addresses of the two values are equal.

Whereas most Lisps test for false and its aliases (such as NIL and 0), the IFGO instruction in this bootstrap Lisp tests only for true. This automatically treats NIL as a false value, but has the unexpected result that many other values are treated as false. One of the IFGO operands is a label atom. If the test atom is bound to the true value, interpretation continues after the designated label.