bobuhiro11/gokvm - GitHub

Introduction

When adapting gokvm to vhost-user, I investigated the initialization part, so I’m leaving it as a memo. The QEMU documentation Vhost-user Protocol 1 summarizes it in detail, but there are parts that can’t be understood without actually running it (exception handling, request order, log output, etc.), so I tried it.

After various trial and error, I was able to test it easily with just QEMU and DPDK. Here, I ran QEMU in server mode and DPDK in client mode. Server mode is responsible for creating the Unix Domain Socket for vhost-user, and the client is responsible for connecting to that socket.

Building dpdk-skeleton

DPDK has several sample programs, and here I decided to use dpdk-skeleton. It wasn’t included in the DPDK package in my distribution, so I built it as follows.

$ git clone git@github.com:DPDK/dpdk.git
$ cd dpdk
$ meson setup -Dexamples=skeleton build
$ cd build
$ ninja
$ file ./examples/dpdk-skeleton
./examples/dpdk-skeleton: ELF 64-bit LSB shared object, x86-64, ...

Starting the Virtual Machine

I booted a lightweight VM image, Cirros, from QEMU. Here path=$HOME/vhost-net0 corresponds to the Unix Domain Socket. logfile=$HOME/vhost-net0.log allows logging the communication content (this is needed for later investigation). Since I ran it in server mode this time, I set server=on.

$ wget https://github.com/eprasad/virt-cirros/raw/master/virt-cirros-0.3.4-x86_64-disk.img
$ sudo qemu-system-x86_64 -enable-kvm \
  -hda ./virt-cirros-0.3.4-x86_64-disk.img \
  -nographic -serial mon:stdio \
  -chardev socket,id=char0,path=$HOME/vhost-net0,server=on,logfile=$HOME/vhost-net0.log \
  -netdev type=vhost-user,id=mynet0,chardev=char0,vhostforce=on \
  -device virtio-net-pci,mac=aa:aa:aa:aa:aa:aa,netdev=mynet0

Starting dpdk-skeleton

When dpdk-skeleton starts, it immediately tries to connect to the socket prepared by QEMU. After connection, negotiation runs according to the Vhost-user Protocol. You can check what messages are exchanged by looking at standard output.

$ sudo bash -c "echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages"
$ sudo ./examples/dpdk-skeleton -l 1 --no-pci \
  --vdev=net_tap0,iface=tap0 \
  --vdev=net_vhost0,iface=$HOME/vhost-net0,client=1,queues=1
...
VHOST_CONFIG: vhost-user client: socket created, fd: 43
VHOST_CONFIG: new device, handle is 0, path is /home/bobuhiro11/vhost-net0

Core 1 forwarding packets. [Ctrl+C to quit]
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES
VHOST_CONFIG: negotiated Vhost-user protocol features: 0xcbf
VHOST_CONFIG: read message VHOST_USER_GET_QUEUE_NUM
VHOST_CONFIG: read message VHOST_USER_SET_SLAVE_REQ_FD
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:48
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:49
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 0
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 1
VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
VHOST_CONFIG: negotiated Virtio features: 0x7000ffc3

Checking Negotiation Contents

Let’s look at the log from earlier.

$ hexdump vhost-net0.log
0000000 0001 0000 0001 0000 0000 0000 000f 0000
0000010 0001 0000 0000 0000 0010 0000 0001 0000
0000020 0008 0000 0cbf 0000 0000 0000 0011 0000
0000030 0001 0000 0000 0000 0015 0000 0009 0000
0000040 0000 0000 0003 0000 0001 0000 0000 0000
0000050 0001 0000 0001 0000 0000 0000 000d 0000
0000060 0001 0000 0008 0000 0000 0000 0000 0000
0000070 000d 0000 0001 0000 0008 0000 0001 0000
0000080 0000 0000 0012 0000 0001 0000 0008 0000
0000090 0000 0000 0001 0000 0012 0000 0001 0000
00000a0 0008 0000 0001 0000 0001 0000 0002 0000
00000b0 0001 0000 0008 0000 ffc3 7000 0000 0000
00000c0

The Vhost-user Protocol is in binary format and can be parsed by checking against the specification 2. It has a structure where multiple variable-length messages are consecutive, and each message consists of a header and payload as follows:

  • Header (32-bit request type, 32-bit flags, 32-bit size)
  • Payload (size matches the header’s size field)

With this in mind, reading the log from the beginning in order, it can be interpreted as follows:

  1. VHOST_USER_GET_FEATURES (1) is sent from QEMU to DPDK. The flag is 1 and the size is 0. The response from DPDK is a 64-bit integer where each bit indicates the presence of a feature. For example, features include VHOST_USER_F_PROTOCOL_FEATURES.
0001 0000 0001 0000 0000 0000
  1. VHOST_USER_GET_PROTOCOL_FEATURES (15) is sent. A 64-bit integer is received as a response.
000f 0000 0001 0000 0000 0000
  1. VHOST_USER_SET_PROTOCOL_FEATURES (16) is sent. The flag is 1 and the size is 8. In other words, QEMU sends the payload 0xcbf to DPDK as the conclusion of the negotiation.
0010 0000 0001 0000 0008 0000 0cbf 0000 0000 0000

Each bit of 0xcbf can be interpreted as follows:

#define VHOST_USER_PROTOCOL_F_MQ                    0
#define VHOST_USER_PROTOCOL_F_LOG_SHMFD             1
#define VHOST_USER_PROTOCOL_F_RARP                  2
#define VHOST_USER_PROTOCOL_F_REPLY_ACK             3
#define VHOST_USER_PROTOCOL_F_MTU                   4
#define VHOST_USER_PROTOCOL_F_BACKEND_REQ           5
#define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN          6
#define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION        7
#define VHOST_USER_PROTOCOL_F_PAGEFAULT             8
#define VHOST_USER_PROTOCOL_F_CONFIG                9
#define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD      10
#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER        11
#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD       12
#define VHOST_USER_PROTOCOL_F_RESET_DEVICE         13
#define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14
#define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
#define VHOST_USER_PROTOCOL_F_STATUS               16
  1. VHOST_USER_GET_QUEUE_NUM (17) queries how many queues the backend (DPDK) has.
0011 0000 0001 0000 0000 0000
  1. VHOST_USER_SET_BACKEND_REQ_FD (21) is sent. Note that this used to be called VHOST_USER_SET_SLAVE_REQ_FD.
0015 0000 0009 0000 0000 0000
  1. VHOST_USER_SET_OWNER (3) is sent. QEMU becomes the front-end.
0003 0000 0001 0000 0000 0000
  1. VHOST_USER_GET_FEATURES (1) is sent.
0001 0000 0001 0000 0000 0000
  1. VHOST_USER_SET_VRING_CALL (13) is sent.
000d 0000 0001 0000 0008 0000 0000 0000 0000 0000
  1. VHOST_USER_SET_VRING_CALL (13) is sent again.
000d 0000 0001 0000 0008 0000 0001 0000 0000 0000
  1. VHOST_USER_SET_VRING_ENABLE (18) is sent. It tells DPDK, the backend, to enable the vring.
0012 0000 0001 0000 0008 0000 0000 0000 0001 0000
  1. VHOST_USER_SET_VRING_ENABLE (18) is sent.
0012 0000 0001 0000 0008 0000 0001 0000 0001 0000
  1. VHOST_USER_SET_FEATURES (2) is sent. The payload is 0x7000ffc3. VHOST_USER_F_PROTOCOL_FEATURES (30) and others apply.
0002 0000 0001 0000 0008 0000 ffc3 7000 0000 0000

Conclusion

By combining the sample application included with DPDK and QEMU, I was able to trace the negotiation process. With this understanding, I should be able to start implementing it in gokvm.