I wrote a Scheme-like interpreter called jellyfish. It’s available here. However, it’s my own implementation, so it doesn’t comply with RnRS. Scheme seems to be a standard for self-made interpreters, but I feel there are surprisingly few articles explaining it. Here, I’d like to briefly describe its implementation method.
I’m assuming readers who have written a brainfuck interpreter and want to write a slightly more advanced custom interpreter (not limited to Scheme). I’d be happy if you think “Oh, it’s easy!”
Jellyfish can run programs like this. I’ve implemented simple built-in functions (including eval) and higher-order functions.
;
; quick sort
;
(define (part c y x)
(if (null? y)
'()
(if (c (car y) x)
(cons (car y) (part c (cdr y) x))
(part c (cdr y) x))))
(define (quicksort x)
(if (null? x)
'()
((lambda (pivot)
(append
(quicksort (part < x pivot))
(cons
(car x)
(quicksort (part > x pivot)))))
(car x))))
;
; fizz buzz
;
(define check_fizzbuzz
(lambda (x)
(if (= (modulo x 15) 0)
"FizzBuzz"
(if (= (modulo x 5) 0)
"Buzz"
(if (= (modulo x 3) 0)
"Fizz"
x)))))
(define fizzbuzz
(lambda (a b)
(if (> a b)
nil
(begin
(display (check_fizzbuzz a) " ")
(fizzbuzz (+ a 1) b)))))
(define x '(fizzbuzz 1 50))
(display x " => ") (eval x) (newline)
(define x '(quicksort '(32 77 70 21 3 2 73)))
(display x " => " (eval x)) (newline)
First, let’s consider the following:
- Interpreter or compiler?
- What language to implement it in?
- What about syntax (written in EBNF, etc.)?
- What about the parser?
Jellyfish is a pure interpreter (doesn’t use intermediate code, what’s it called?) written in C. While script languages like Ruby or Python make it even easier to write interpreters, I didn’t want to run an interpreter on top of a script language, so I wrote it in C. Well, any language is fine. The syntax is as follows. I use bison for the parser and flex for the lexer. Bison and flex automatically generate parsing programs (C programs) if you write the syntax rules.
You can implement the lexer yourself, but for parsers, unless it’s a simple syntax like LL grammar, it’s better to rely on existing parsers.
S-expression = INTEGER | CHARACTER | SYMBOL | STRING | NIL | TRUE | FALSE | ( {S-expression} )
In jellyfish, Scheme programs run in the following steps:
Lexical analysis (extract parentheses, integers, symbols, etc. as tokens from input)
↓
Syntax analysis (check if tokens satisfy the syntax)
↓
Evaluation (implement eval() function and throw everything there)
Let’s look at each one.
Lexical Analysis
This is part of the flex program (lexer.l). In lexical analysis, write flex syntax rules like this:
[-]?[0-9]+ { sscanf(yytext, "%d", (int*)&yylval); return INTEGER; }
[(] { paren_num ++; return LEFT_PAREN; }
[)] { paren_num --; return RIGHT_PAREN; }
"\n" { if(paren_num != 0) prompt_multi();}
"nil" { return NIL; }
"#t" { return TRUE; }
"#f" { return FALSE; }
Syntax Analysis
Following EBNF, write the bison program (parser.y). Like this. $$ becomes the return value of that syntax. For example, for nil or boolean values, return them as is, for INTEGER, call integer2sexp() and return the result. When enclosed in parentheses, connect with cons() to create a list structure with S-expressions. The _noeval suffix is a remnant from when I originally called eval() immediately after parsing. In jellyfish (and Scheme), there are not just functions but also special forms, so that doesn’t work.
input :
| input exp_noeval
{
struct s_exp *e = eval($2);
if(interactive){
write_sexp(e);
putc('\n',stdout);
prompt();
}
sexp_free(e,1);
}
exp_noeval : INTEGER { $$ = integer2sexp($1);}
| CHARACTER { $$ = character2sexp($1);}
| SYMBOL { $$ = symbol2sexp($1);}
| STRING { $$ = string2sexp($1);}
| NIL { $$ = nil;}
| TRUE { $$ = sexp_t;}
| FALSE { $$ = sexp_f;}
| QUOTE exp_noeval
{
struct s_exp *car = symbol2sexp($1);
struct s_exp *cdr = cons($2, nil);
$$ = cons(car,cdr);
}
| LEFT_PAREN members_noeval
{
$$ = $2;
}
members_noeval : RIGHT_PAREN { $$ = nil; }
| exp_noeval members_noeval
{
$$ = cons($1,$2);
}
eval()
For a custom interpreter, all that’s left is to create eval(). Like this, for integers etc., the evaluation value stays as is, for symbols, pull from the symbol table. Otherwise, it’s either a special form, built-in function, or closure call, so throw to each respective process. The contents of jf_apply_special() and jf_apply_builtin() for calling built-in functions and special forms are as you’d imagine, simply evaluating and returning a new S-expression. You might be curious about jf_apply_clojure() which executes user-defined functions, but this adds arguments to the symbol table and evaluates the closure body.
struct s_exp *
eval(struct s_exp *e)
{
struct s_exp *car, *cdr, *q, *p;
if(is_singleton(e))
return e;
switch(e->type){
case S_EXP_INTEGER:
case S_EXP_CHARACTER:
case S_EXP_STRING:
case S_EXP_BUILTIN:
case S_EXP_SPECIAL:
case S_EXP_CLOJURE:
return e;
case S_EXP_SYMBOL:
q = st_find(global_table, e->u.obj);
p = sexp_copy(q);
sexp_free(e,1);
return p;
}
car = e->u.pair.car;
cdr = e->u.pair.cdr;
sexp_free(e,0);
q = eval(car);
car = q;
switch(car->type){
case S_EXP_SPECIAL: return jf_apply_special(car,cdr);
case S_EXP_CLOJURE: return jf_apply_clojure(car,cdr);
case S_EXP_BUILTIN: return jf_apply_builtin(car,cdr);
}
return nil;
}
It started from seeing a bison sample program (calculator?) and thinking I could make a Scheme interpreter, with the silly motivation that it would be nice to run university algorithm assignments on my own interpreter, so it was mostly successful, but there are some regrets:
- I wrote it without reading the specification, so it’s not compatible => Oh well
- I implemented it without thinking about memory management => Later I passed valgrind checks
- I spent more time writing the interpreter than on Lisp itself, so I don’t understand Lisp => Let’s buy Land of Lisp
In the future, I’m planning to implement macros or switch to VM-based.
I’ve rambled on, but if you want to make your own Scheme interpreter, I think you can make something that looks like it by making a prefix notation calculator, making an if statement… This might be the shortest route.