Introduction
I created a naive and experimental VMM using KVM. It creates virtual machines by calling /dev/kvm through ioctl, and can boot the Linux Kernel and user processes on them. I also implemented a very simple serial console emulation that can be recognized by the kernel’s device driver, allowing operation from the login shell. On the other hand, networking and disks are not yet supported at this time.
Recently, KVM has been used not only as a traditional virtual machine, but also to strengthen isolation levels in multi-tenant cloud environments, such as Google gVisor 1, Kata Containers 2, and Amazon Firecracker 3, for use in containers and micro VMs. The gokvm I created this time is implemented in Go using only the standard library, with about 1,500 lines total (at the time of writing this blog post), so I think it can be useful as a starting point for those interested in KVM and the Linux boot process like myself.
Let me reflect on what and how I implemented it while looking at the commit log.
2021/1/30 Project Launch 632c6e0
The first commit. I just placed README.md, .gitignore, and LICENSE files, nothing particularly noteworthy. I was researching similar projects 4 5 and LWN.net articles 6. I couldn’t find any minimal implementation that boots Linux to userland. I only did a quick search, so I may have missed something. Perhaps kvmtool 4 was originally in that position, but the code felt a bit too large. kvm-host.c 5 is about 250 lines of C code and can boot the kernel, but it doesn’t seem to reach userland.
2021/2/4 Build bzImage and initrd, Implement KVM Wrapper 69e3ebb
I made it possible to generate bzImage and initrd for testing from the make command. bzImage is the Linux Kernel itself, and initrd corresponds to a temporary in-memory filesystem. I used Linux Kernel version 5.10, the latest at the time the project started. After running make tinyconfig, I used make menuconfig to enable additionally required configs. initrd was based on Busybox. The .config files for Linux Kernel and Busybox are managed in the repository, so please refer to them for details. For CI selection, since Github Actions cannot use /dev/kvm, I chose Travis CI.
The KVM API has a structure that is controlled via ioctl. Therefore, I implemented a simple ioctl wrapper function that receives a file descriptor and can implement various KVM APIs. For structures, I ported the necessary ones from Linux header files as Go language structures.
I also initialized various registers. I configured all segment descriptors with Base=0 and Limit=0xFFFFFFFF for flat segmentation. The G (Granularity) flag was set to 1 with a unit of 4Kbyte. For simplicity, I set CR0’s PE (Protected Mode Enable) to 1 so that the Kernel boots in Protected Mode immediately after startup. In other words, no bootstrap mechanism like a bootloader is needed. At this point, the guest physical memory size is fixed at 1GB. RIP was changed to point to the kernel’s start address.
Source: Intel 64 and IA-32 ArchitecturesSoftware Developers Manual
Source: Intel 64 and IA-32 ArchitecturesSoftware Developers Manual
I added code to calculate 2+2=4 6 as a test. Furthermore, I introduced the Go language linter golanglint-ci to set up a test-driven development environment.
2021/2/5 Introduce code climate cb71b9d
I became concerned about CI. I introduced Code Climate, a service that automatically measures coverage and code quality.
2021/2/6 Add Missing KVM APIs 458753d
KVM_SET_TSS_ADDR, KVM_SET_IDENTITY_MAP_ADDR, KVM_CREATE_IRQCHIP, KVM_CREATE_PIT2, KVM_GET_SUPPORTED_HV_CPUID, and KVM_SET_CPUID2 were missing.
I configured CPUID to be “KVMKVMKVM” 7 as follows:
eax = 0x40000001
ebx = 0x4b4d564b
ecx = 0x564b4d56
edx = 0x4d
20201/2/7 Boot Kernel 099cc55
I researched how to boot bzImage by referring to The Linux/x86 Boot Protocol 8. I quote the relevant parts in the table below.
Offset/Size Proto Name Meaning 01F1/1 ALL(1) setup_sects The size of the setup in sectors 0218/4 2.00+ ramdisk_image initrd load address (set by boot loader) 021C/4 2.00+ ramdisk_size initrd size (set by boot loader) 0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line Source: The Linux/x86 Boot Protocol
bzImage has the magic number HdrS at offset 0x202 as shown below, so it’s good to find it first and then look at other headers.
$ hexdump -C bzImage -s 0x202 -n 4
00000202 48 64 72 53 |HdrS|
00000206
This time I placed the kernel at 0x100000, and set RIP to the same address as the initial value. Real mode code and bootloader are outside the scope this time. The Protected Mode entry point is (setup_sects + 1) * 512 bytes from the beginning of bzImage, using setup_sects in the boot protocol.
I placed initrd at 0x0f000000. I’m cutting corners on this initrd address implementation, so I want to fix it later. By the way, kvmtool dynamically searches for free memory regions 9. In summary, the overall memory map is as follows:
InitialRegState GuestPhysAddr Binary files [+ offsets in the file]
0x00000000 +------------------+
| |
RSI --> 0x00010000 +------------------+ bzImage [+ 0]
| |
| boot protocol |
| |
+------------------+
| |
0x00020000 +------------------+
| |
| cmdline |
| |
+------------------+
| |
RIP --> 0x00100000 +------------------+ bzImage [+ 512 * (setup_sects + 1)]
| |
| Protected Mode |
| Kernel |
| |
+------------------+
| |
0x0f000000 +------------------+ initrd [+ 0]
| |
| initrd |
| |
+------------------+
| |
0x40000000 +------------------+
Kernel command line parameters should be written as a null-terminated string at the address given by cmd_line_ptr. At this point, when I boot it, I can get the kernel boot messages via IO port 0x3f8. The init process startup is still to come.
2021/2/11 Add 8250 UART Serial Emulation 95a61ba
The kernel boot messages are output, but for some reason there’s no output from user processes. I patched the kernel at the serial initialization point for debugging, and modified Busybox’s inittab, but I couldn’t figure out the cause. I used strace on QEMU and kvmtool’s ioctl, and realized that I need to properly emulate Serial.
The kernel initializes Serial during boot. This initialization is not sufficient with simple IO port read/write, and requires mimicking the 8250 family, implementing things like interrupts and queue mechanisms.
I mimicked the behavior of necessary registers (PBR, THR, IET, LCR, etc.) while looking at the 8250 family specifications. I ignored registers that aren’t very relevant and cut corners on the implementation. For interrupts, I’m sending them with KVM_IRQ_LINE by setting Level from 0->1, but I wonder if this is correct.
With this implementation, the login shell is output and key input also works. I placed /etc/passwd and /etc/init.d/rcS appropriately so that I can log in as root user.
2021/2/12 Custom Implementation of termios 08037be
The part that receives Raw Input depended on golang.org/x/term. I re-implemented only the necessary parts myself. Just manipulate termios with ioctl.
2021/2/12 Introduce goreleaser 20b2850
A convenient tool that automatically releases based on git tags. I was able to release it as v0.0.1, albeit clumsily.
Future Work
I want to expand device emulation such as disks and networking. I also want to work on OSes other than Linux and architectures other than x86-64.


