bobuhiro11/gokvm - GitHub

Introduction

Continuing gokvm development 1 2 3.

I’ve been developing to enable VMs on gokvm to handle PCI devices. The road will be long, but ultimately I want to establish IP connectivity between the VM and the outside via virtio-net. At the moment, I was able to get the guest kernel to recognize the virtio-net device as a network interface, so I’ll leave a log up to that point for now. The main things I did can be divided into: (1) making the guest Linux kernel recognize the virtio-net device as a PCI device, and (2) registering it as a network interface by completing virtio-net device initialization. Operations on virt queue and packet exchange are not included in this article. As usual, I’ll leave the implementation progress in commit units.

fc02176d Add lspci Command

busybox includes the lspci command, but the pci.ids file 4 does not exist. pci.ids is a file where numbers such as vendor ID and device ID are paired with corresponding strings. With this file, you can output in a human-readable format. I addressed this to make debugging smoother later on.

e126392e IO Emulation for PCI Config Space

An important phase for the kernel to recognize PCI devices. There seem to be several ways to read PCI Config space, but here I accessed it using a method called type 1 5. The IO port addresses used here are as follows:

  • 0xcf8: Corresponds to the address register. Corresponds to bus number, device number, Function number, and offset within PCI Config space.
  • 0xcfc ~ 0xcff: Corresponds to data.

The address register is 32-bit wide and is interpreted as follows:

Position Content
Bit 31 Enable Bit
Bit 30-24 Reserved
Bit 23-16 Bus Number
Bit 15-11 Device Number
Bit 10-8 Function Number
Bit 7-0 Register Offset

Roughly, the procedure for reading data at a certain offset in PCI Config space is as follows:

  1. Write the bus number, device number, Function number, and offset within PCI Config space of the PCI device you want to access to the address register associated with 0xcf8 in the format described in the table.
  2. Request IO for 0xcfc ~ 0xcff and read/write the data pointed to by the address register.

The guest kernel checks integrity with the pci_check_type1 function 5, but for some reason this function doesn’t pass. At this point I didn’t understand the cause well, so I decided to address it in the next commit.

5952d2dc Pass pci_sanity_check

As I proceeded with the investigation, I realized that I was handling the offset within PCI Config space incorrectly. The Register Offset in Bits 7-0 of the address register can only be specified with 4-byte alignment, and finer granularity is the offset from 0xcfc of the port number (0xcfc ~ 0xcff) when accessing data. In other words, you can get the correct offset with the following formula:

(Lower 8 bits of address register) & 0xfc + (IO port number for data read/write) - 0xCFC

After fixing it, I was able to proceed to PCI device detection processing.

[    0.673777][    T1] PCI: Probing PCI hardware
[    0.674359][    T1] PCI: root bus 00: using default resources
[    0.675125][    T1] PCI: Probing PCI hardware (bus 00)
[    0.675895][    T1] PCI host bridge to bus 0000:00
[    0.676143][    T1] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
[    0.677058][    T1] pci_bus 0000:00: root bus resource [mem 0x00000000-0x7fffffffff]
[    0.678079][    T1] pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
[    0.679222][    T1] pci_bus 0000:00: scanning bus

6a63e515 Register PCI Device

Since multiple PCI devices can exist, I made it possible to handle PCI Config space as a structure. Furthermore, when I registered two devices, virtio-net device and host bridge, they were detected correctly. Good feeling.

/ # lspci
00:00.0 Non-VGA unclassified device: Intel Corporation Device 0000
00:01.0 Non-VGA unclassified device: Red Hat, Inc. Virtio network device

e4c89328 Add ethtool Command

From here on, I’ll be registering the virtio-net device as a network interface, but without the ethtool command it’s troublesome, so I added it. As usual, I built it statically and incorporated it into initrd.

b66267c1 Separate virtio-net into a Separate Package

Refactoring. I separated it into two packages: the pci package that handles all PCI devices and the virtio package that handles virtio-net. I made it possible to abstract PCI devices as interfaces (one of Go language specifications).

a039f674 Add virtio_net.c to dyndbg

I added drivers/net/virtio_net.c 6 to the guest kernel’s kernel parameter dyndbg. Setting this allows you to get debug logs, which makes implementation progress.

779cbcd8 Set Interrupt and BAR in PCI Config Space

You can set the Interrupt Line in PCI Config space as follows. Here, I assigned number 9 to the virtio-net device (I used it because it was available, but was that okay?). BAR can register addresses for device-specific IO space and physical memory space. If the lower 1 bit is 1, it corresponds to IO space address, if 0, it corresponds to physical memory space address. In the case of the virtio-net device (when in legacy mode?), BAR0 has a structure that points to the virtio-net specific header. So, I allocated the available IO address range 0x6200 ~ 0x6300 to BAR0.

Source: Header Type 0x0 - wiki.osdev.org

77713d06 BAR0 Size Determination

Looking at the PCI Config space, BAR0’s base address exists, but its size is not written anywhere. So how does the guest determine the size of the PCI device’s BAR0? Actually, when you write the special value 0xffffffff to BAR0, the PCI device responds by returning the size 7.

  1. First write 0xffffffff to BAR0 (offset 0x10)
  2. Then read BAR0 (offset 0x10) data. For example, suppose 0xffffff00 is returned
  3. The size of BAR0 is obtained by inverting 0s and 1s of 0xffffff00, then adding 1. In this case, the size of BAR0 is 0xff + 1 = 0x100 bytes

7b4d9f10 Add IO Handler for PCI Devices

Since the destination pointed to by BAR0 becomes IO space (can be determined by the least significant bit of BAR0), when processing returns from KVM to the host process with EXITIOIN or EXITIOOUT, I register it so that the appropriate IO handler can be executed.

8cb14d5a Register virtio-net Device as Network Interface

For virtio devices, the destination of BAR0 corresponds to Virtio Header 8 (citation needed), so I implemented it as a structure. To recognize the virtio-net device as a network interface, it’s necessary to properly handle QUEUE_PFN and QUEUE_SEL of Virtio Header. Roughly speaking, during the initialization phase, the guest kernel selects the queue number using QUEUE_SEL and writes the physical page number (PFN) of that queue to QUEUE_PFN. Here, the physical guest address of the queue is obtained by QUEUE_PFN * page size (4096 bytes). On the gokvm side, I just need to emulate this behavior. I should also care about FEATURES and STATUS, but it wasn’t necessary at this stage, so I’ll skip it for now.

Source: Explanation of BitVisor’s virtio-net driver

Conclusion

I was finally able to recognize the virtio-net device from the guest. I know the notification method between guest and host and the location of the virt queue, so I want to proceed with implementation steadily.

/ # ethtool -i eth0
driver: virtio_net
version: 1.0.0
firmware-version:
expansion-rom-version:
bus-info: 0000:00:01.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

/ # ip link set eth0 up
[   77.178801][    C1] xmit_skb:1648: eth0: xmit 00000000bddcbde5 33:33:00:00:00:02
Queue Notify was written!