REPL

An ordinary REPL. Still connecting via pipes as before. It worked when I fed in each S-expression typed at the top level.

Disassemble

I added a command called (disasm ..). You can check the disassembled results like this.

>>(disasm (+ 1 2 3))
=== code ===
 0 0x12000002 ;FRAME
 1 0x0000003c ;15
 2 0x25000002 ;CONSTNUM
 3 0x0000000c ;3
 4 0x13000002 ;ARGUMEMT
 5 0x25000002 ;CONSTNUM
 6 0x00000008 ;2
 7 0x13000002 ;ARGUMEMT
 8 0x25000002 ;CONSTNUM
 9 0x00000004 ;1
10 0x13000002 ;ARGUMEMT
11 0x03000002 ;REFER_GLOBAL
12 0x094bf2f3 ;+
13 0x15000002 ;APPLY
14 0x0000000c ;3
15 0x27000002 ;DISASM
16 0x00000002 ;HALT

Garbage Collection

I implemented and embedded Cheney’s copy GC. It reserves two regions of the same size called From-space and To-space, and allocates in From-space. When From-space fills up, all objects reachable from root objects (registers, symbol table, etc.) are copied to To-space. When all copying is finished, From-space and To-space are swapped. It’s a simple algorithm, but you need to be careful not to forget roots. There are disadvantages like poor heap efficiency and not being a conservative GC. Well, the heap shouldn’t be a problem currently, and since we can determine whether stack or register elements are objects or primitives, an exact GC is fine. The advantages are excellent - no fragmentation, good cache locality, etc. So far, this is about copy GC.

What Cheney announced was to traverse objects in breadth-first order. (probably just this??) It consumes less machine stack and is more efficient. However, because it’s breadth-first, nearby objects become separated and cache effectiveness decreases. (meaning pair objects and their car/cdr objects become separated) As an improvement, there’s also an idea to approximate depth-first traversal.

In my own scheme, the following become roots:

  • accumulator
  • closure (register that holds the current closure object)
  • stack elements below the stack pointer
  • symbol table
  • boxes
  • stack elements when continuation objects are saved

Boxes were originally managed by a different method, but I decided to handle them with the vm_obj structure. All non-primitive types use this structure. (the forwarding member is no longer needed since it’s only used for already-copied objects…)

struct vm_obj
{
	unsigned char tag;

	/* 
	 * memory allocated size contain string, closure etc.
	 * set by myalloc() and must not change the value.
	 */
	int size;

	/* 
	 * point new memory address in to-space, used by GC
	 */
	struct vm_obj *forwarding;

	union{
		vm_data box;
		struct {
			int size;
		} stack;
		struct {	
			int size;
		} closure;
		struct {
			vm_data car;
			vm_data cdr;
		} pair;
	} u;
};

References to free variables in strings and closures are placed immediately after this structure. It might be clearer to write the code here. I wonder what the usual practice is. You could manage only this structure with GC and rely on malloc() for strings, etc.

size = sizeof(struct vm_obj) + strlen(str) +1;
obj = myalloc(size, reg_a, reg_c, reg_s);
obj->tag = VM_OBJ_STRING;
string = (char*)obj + sizeof(struct vm_obj);
strcpy(string,str);

Also, GC runs automatically when From-space fills up, but I added an instruction to trigger GC at any timing. The +640 and +880 indicate the currently allocated memory size (in bytes).

$ make run
compiler/main.scm | vm/myscmvm
=== INFO ===
HOST_BIT:            32
CODE_MAX:          2048
HEAP_CODE_BASE:    1365
STACK_MAX:         1024
=== hash table ===
y        = 0x00000400 ;256
/        = 0x08bd523b ;closure<1395,1397>
=        = 0x08bd50ab ;closure<1365,1367>
hoge     = 0x08bd5083 ;"hogehogemanju"
<        = 0x08bd514b ;closure<1377,1379>
*        = 0x08bd5213 ;closure<1392,1394>
boss     = 0x08bd506f ;("orenosubete" "damedamehime")
cons     = 0x08bd5173 ;closure<1380,1382>
x        = 0x000001ec ;123
>        = 0x08bd5123 ;closure<1374,1376>
cdr      = 0x08bd51c3 ;closure<1386,1388>
null?    = 0x08bd51eb ;closure<1389,1391>
+        = 0x08bd50fb ;closure<1371,1373>
modulo   = 0x08bd5263 ;closure<1398,1400>
-        = 0x08bd50d3 ;closure<1368,1370>
car      = 0x08bd519b ;closure<1383,1385>
>>(gcdump)
=== gc ===
from_start: 0x0000000008bd5008
from_free:  0x0000000008bd5288[+640]
to_start:   0x0000000008bd9e30
to_free:    0x0000000008bd9e30
undef
>>(list "hoge" "foo" "a-chan" "nya-")
("hoge" "foo" "a-chan" "nya-")
>>(gcdump)
=== gc ===
from_start: 0x0000000008bd5008
from_free:  0x0000000008bd5378[+880]
to_start:   0x0000000008bd9e30
to_free:    0x0000000008bd9e30
undef
>>(gcrun)
Info: collecting garbage...
undef
>>(gcdump)
=== gc ===
from_start: 0x0000000008bd9e30
from_free:  0x0000000008bda0b0[+640]
to_start:   0x0000000008bd5008
to_free:    0x0000000008bd5008
undef
>>