Introduction
A progress report on gokvm development 1 2.
Until now, I only supported a single virtual CPU.
I wanted to support SMP (Symmetric Multiprocessing) for multi-CPU, and after about 2-3 weeks of trial and error, I was able to implement it successfully.
As far as I know, while searching for efforts to create a VMM with KVM,
I couldn’t find many resources that explain what SMP support specifically looks like in implementation.
Although this is a clumsy article, I’d be happy if it helps those who challenge creating their own VMM in the future.
As usual, I’ll introduce the development progress in commit units. Of course, in reality, I started with much messier implementation,
and after repeating git rebase many times, I adjusted the granularity so it can be explained, so the commit timestamps are not reliable.
#34 Generate Multiple vCPU Threads
I received a pull request. First, I changed it so that multiple vCPUs can be generated with ioctl(fd,KVM_CREATE_VCPU,...).
Then I generate individual threads for each vCPU and issue ioctl(fd,KVM_RUN,...) independently for each vCPU.
vCPUs need to issue ioctl from the same thread throughout their lifetime.
In the case of Go language, goroutines are often used instead of threads,
so I called runtime.LockOSThread() to statically associate goroutines with threads.
ce22a91 Implementation of struct mpf_intel
For vCPUs to be recognized by the kernel, it’s necessary to make them recognize data structures according to the Intel MultiProcessor Specification 3.
This data structure corresponds to struct mpc_table pointed to by PhysPtr of struct mpf_intel in the Linux kernel.
Reading the code 4, I could read the checksum, version, and magic number,
so I only skimmed through the specification.
Where should this data structure be placed? Reading the specification, it says within the first 1KB of Extended BIOS Data Area (EBDA), so I decided to place it there. EBDA is typically placed at 0x0009FC00 5, so I followed that.
- a. In the first kilobyte of Extended BIOS Data Area (EBDA), or
- b. Within the last kilobyte of system base memory (e.g., 639K-640K for systems with 640 KB of base memory or 511K-512K for systems with 512 KB of base memory) if the EBDA segment is undefined, or
- c. In the BIOS ROM address space between 0F0000h and 0FFFFFh.
In this specification, I learned that the CPU responsible for booting is called Boot Strap Processor (BSP), and other CPUs are called Application Processor (AP).
In this commit, I only prepared struct_mpf_intel.
Since several data structures appear, I’ll summarize them as a memory map in the diagram below.
0x0009fc00 (EBDA) ---> +--------------------------------------------------------+
| 16 Bytes Alignment |
| |
| |
| |
0x0009fc00 + 0x30 ---> +--------------------------------------------------------+
| struct mpf_intel (16 Bytes) |
| |
| // physical address of struct mpc_table |
| .phys_ptr = EBDA start + 0x40 |
| .signature = "_MP_" |
| .length = 1 |
| .Specification = 4 |
| |
0x0009fc00 + 0x40 ---> +--------------------------------------------------------+
| struct mpc_table |
| |
| .phys_ptr = EBDA start + 0x40 |
| .signature = "PCMP" |
| .length = 1 |
| .specification = 4 |
| // number of entries (mpc_cpu, mpc_bus, etc.) |
| .oem_count = 2 |
| |
| +-------------------------------------------------+ |
| | struct mpc_cpu (for processor #0) | |
| | .type = 0 | |
| | .apic_id = 0 | |
| | .cpu_flag = ENABLE_PROECSSOR | BOOT_PROCESSOR| |
| +-------------------------------------------------+ |
| |
| +-------------------------------------------------+ |
| | struct mpc_cpu (for processor #1) | |
| | .type = 0 | |
| | .apic_id = 1 | |
| | .cpu_flag = ENABLE_PROECSSOR | |
| +-------------------------------------------------+ |
| |
+--------------------------------------------------------+
3478e15 Implementation of struct mpc_table
I placed struct mpc_table immediately after struct mpf_intel. I’m not sure if this location is correct.
Here, I set the address of the Local APIC (LAPIC). LAPIC is typically placed at 0xFEE00000, so I followed that.
When a vCPU accesses this memory, it seems to be replaced with access to LAPIC emulated by the host kernel through KVM.
I didn’t have to trap memory access and emulate it in userland… and so on.
It was successfully recognized at boot time.
Originally, struct mpc_cpu should be placed so that memory addresses are continuous after struct mpc_table,
but for simplicity this time, I implemented it by embedding struct mpc_cpu as a member of struct mpc_table.
$ dmesg
[ 0.031995][ T0] Intel MultiProcessor Specification v1.4
[ 0.033955][ T0] mpc: 9fc40-9fc6a
4ac446f Implementation of struct mpc_cpu
struct mpc_table is the header of the table structure, and table entries such as struct mpc_cpu and struct mpc_bus continue in arbitrary numbers.
Here I set the number of vCPUs to 2 for now.
Entries are distinguished by Type.
Source: Intel MultiProcessor Specification 3
Since Local APIC exists for each vCPU, I reused the vCPU serial number. There are enable/disable and BSP/AP control flags for each vCPU, so be careful. I made only the first vCPU as BSP and the rest as AP. The following diagram is easy to understand for the relationship between Local APIC and CPU, and the distinction between BSP and AP.
Source: Intel MultiProcessor Specification 3
When configured properly up to here, vCPUs are detected at boot time and messages flow.
$ dmesg
[ 0.038781][ T0] MPTABLE: processor found.
[ 0.039497][ T0] Processor #0 (Bootup-CPU)
[ 0.040918][ T0] MPTABLE: processor found.
[ 0.041626][ T0] Processor #1
e02bd77 Enable CONFIG_SMP in Guest Kernel
For some reason, I noticed the existence of the CONFIG_SMP flag at this timing.
Of course, enabling this flag is a prerequisite, so I addressed it.
9a53099 EAGAIN Handling for ioctl KVM_RUN
At this stage, when I increased the CPU count to 2, gokvm started panicking for some reason.
When I investigated, EAGAIN was being returned from ioctl KVM_RUN corresponding to AP.
Why EAGAIN? KVM should be returning it, so I traced it.
I found that KVM has internal state for each vCPU 6.
Reading the explanation, AP is initialized by an INIT signal, and until then
it seems to be in KVM_MP_STATE_UNINITIALIZED state.
I confirmed from KVM’s code that when calling ioctl KVM_RUN in KVM_MP_STATE_UNINITIALIZED state,
EAGAIN is returned 7.
KVM_MP_STATE_UNINITIALIZED:
the vcpu is an application processor (AP) which has not yet received an INIT signal [x86]
I also learned here that when the guest kernel starts up, only the BSP is running,
and APs start up according to the INIT-SIPI-SIPI sequence using LAPIC (requires investigation).
[ 0.713385][ T1] do_boot_cpu:1057: smpboot: Setting warm reset code and vector.
[ 0.715064][ T1] wakeup_secondary_cpu_via_init:805: smpboot: Asserting INIT
[ 0.716266][ T1] wakeup_secondary_cpu_via_init:816: smpboot: Waiting for send to finish...
[ 0.717332][ T1] wakeup_secondary_cpu_via_init:821: smpboot: Deasserting INIT
[ 0.718299][ T1] wakeup_secondary_cpu_via_init:827: smpboot: Waiting for send to finish...
[ 0.719375][ T1] wakeup_secondary_cpu_via_init:846: smpboot: #startup loops: 2
[ 0.720569][ T1] wakeup_secondary_cpu_via_init:849: smpboot: Sending STARTUP #1
[ 0.721341][ T1] wakeup_secondary_cpu_via_init:853: smpboot: After apic_write
[ 0.722505][ T1] wakeup_secondary_cpu_via_init:873: smpboot: Startup point 1
[ 0.723653][ T1] wakeup_secondary_cpu_via_init:875: smpboot: Waiting for send to finish...
[ 0.725354][ T1] wakeup_secondary_cpu_via_init:849: smpboot: Sending STARTUP #2
[ 0.726552][ T1] wakeup_secondary_cpu_via_init:853: smpboot: After apic_write
[ 0.727728][ T1] wakeup_secondary_cpu_via_init:873: smpboot: Startup point 1
[ 0.729328][ T1] wakeup_secondary_cpu_via_init:875: smpboot: Waiting for send to finish...
[ 0.730929][ T1] wakeup_secondary_cpu_via_init:892: smpboot: After Startup
Looking at kvmtool’s implementation as a reference, it indeed had an implementation that ignores EAGAIN 8
c8d5459 Support for Arbitrary Number of vCPUs
Until now I was limited to 2 vCPUs, but with this commit I supported an arbitrary number of vCPUs. There doesn’t seem to be anything particularly noteworthy.
Conclusion
I was glad to be able to implement it to the point where it works. Looking back, it turns out KVM handles most things well. I learned about data structures and processing flow related to CPU initialization in an SMP environment. There are still many things I want to do, but I’ll work on them steadily bit by bit.
During this implementation, I also received some pull requests on github.com. It’s a niche area, but I’m happy there are people interested in it.
Source: Intel MultiProcessor Specification
Source: Intel MultiProcessor Specification