Predicates and character types in the base interpreter
Saturday, January 2nd, 2010I created a new runtime with different data structures for primitive data types; and the GC, list reader, and list writer to go with the new data types. However, the evaluator is not progressing at all.
As a result, I went back to the working base interpreter, and implemented some type predicates and some data conversion procedures in assembly language. Two tables are used to expand the library of primitive procedures. One is the dispatch table, which is indexed by number, so that there are no external names hard-bound to the primitives. The second table is the initialization table that binds names to symbol locations and dispatch indexes. This makes it easy to add new primitives.
Rather than treating characters and octets (aka bytes) as separate data types, I have gone to Unix mode and equated them. This also means that octet strings and text strings are synonymous.
This is counter to the growing use of Unicode. The problem with Unicode is that a valid Unicode-encoded text is a structured data type, rather than a simple sequence of same-sized data.
The base display procedure was changed to output octets as byte values directly to the console display, rather than a number in text form. The base read procedure was changed to input string and character literals.
The purpose of adding type predicates such as symbol?, number?, and char? is to divorce the high level Scheme code from the underlying data structures that are hard-coded by the assembly language code. This allows one to avoid exposing the built-in data structures, eliminating ugly low level manipulations such as the use of (car ’symbol) to get the anonymous type mark associated with symbols.
The data conversion procedures are added for the same purpose. You can build up the symbols from integer values with these procedures:
- integer->char
- list->string
- string->symbol
and break down symbol names with inverse procedures – all without exposing the ugly details of an inefficient implementation.
The divorce also allows creating code which is easier to port to other Scheme systems. Most Schemes will be far more efficient than the version I have created.