Adding string and character types

March 19th, 2009 by tk

This turned out to be very easy to do. All I had to do was decide on a data structure, and then prepend it with a “type mark”. I also needed display and write procedures to make the data viewable.

I’ve opted to make the octet the fundamental data type for mass storage. So I built the character type on top of the octet type.

    (integer->char
      (lambda (x)
        (list 'char-mark (integer->octet x))))

    (char->integer
      (lambda (x)
        (octet->integer (cadr x))))

The string type is a simple follow-up. The Scheme string function requires that all its arguments are characters.

    (string
      (lambda x
        (cons 'string-mark x)))

All that’s needed is a write procedure to display it. Note that my Scheme implementation guarantees that octets are unique. Characters are not unique.

    (write-output
      (lambda (x)
        (cond
          ((null? x) (write-octet lpar) (write-octet rpar))
          ((eq? x #t) (write-octet sharp) (write-octet letter-t))
          ((eq? x #f) (write-octet sharp) (write-octet letter-f))
          ((number? x) (write-number x))
          ((char? x) (write-escaped-char-output x))
          ((string? x) (write-escaped-string-output x))
          ((octet? x) (write-escaped-octet x))
          ((octet-string? x) (write-escaped-octet-string x))
          ((procedure? x) (write-output '***unprintable***))
          ((symbol? x) (write-sym x))
          (#t (write-octet lpar)
              (write-output (car x))
              (write-tail (cdr x))
              (write-octet rpar)))))

    (write-tail
      (lambda (x)
        (cond
          ((null? x) '())
          ((eq? x #t) (write-dot) (write-octet sharp) (write-octet letter-t))
          ((eq? x #f) (write-dot) (write-octet sharp) (write-octet letter-f))
          ((number? x) (write-dot) (write-number x))
          ((char? x) (write-dot) (write-escaped-char-output x))
          ((string? x) (write-dot) (write-escaped-string-output x))
          ((octet? x) (write-dot) (write-escaped-octet x))
          ((octet-string? x) (write-dot) (write-escaped-octet-string x))
          ((procedure? x) (write-dot) (write-output '***unprintable***))
          ((symbol? x) (write-dot) (write-sym x))
          (#t (write-octet spc)
              (write-output (car x))
              (write-tail (cdr x))))))

    (write-escaped-string-output
      (lambda (x)
        (write-octet xquote)
        (for-each write-escaped-string-char (cdr x))
        (write-octet xquote)))

    (write-escaped-string-char
      (lambda (x)
        (cond
          ((eq? (cadr x) xbackslash) (write-octet xbackslash) (write-octet xbackslash))
          ((eq? (cadr x) xquote) (write-octet xbackslash) (write-octet xquote))
          (#t (write-char-output x)))))

    (write-escaped-char-output
      (lambda (x)
        (write-octet sharp)
        (write-octet xbackslash)
        (write-octet (cadr x))))

The number of type tests, and its duplication in the above code points out that I would have been better off if I had a special “type” node for “non-pairs”, and made the current type nodes the subtype nodes. Of course, address tagging would eliminate the need for any kind of type node.

I wonder if it’s appreciated that in traditional Lisp, the LAMBDA and FUNARG atoms act as type nodes for lambda expressions and closures, respectively.

Leave a Reply