It’s sufficiently mature and rarely causes trouble, but since it’s a technology that will continue to be used, knowing its behavior is meaningful.

Full Virtualization and Paravirtualization of NICs

A quick review. Full virtualization is a type of virtualization where the guest itself cannot recognize that it is running in a virtual environment. The host needs to completely emulate the behavior of devices used by the guest. To emulate devices, a large number of traps occur, and the CPU is used for host-side processing. Also, when device emulation is performed in a host-side user process, delays due to scheduling waits occur.

On the other hand, with paravirtualization, the guest recognizes that it is running in a virtual environment and cooperates with the host to improve performance. virtio operates virtual devices through device drivers dedicated to paravirtualization.

Architecture

The configuration consists of a front-end driver and a back-end driver connected by vring.

Front-end driver
Runs on the guest side. Sends I/O issued by the guest OS to the back-end driver via vring.
Back-end driver
Runs on the host side. Sends I/O received via vring to physical devices.

virtio.png

https://www.cs.cmu.edu/~412/lectures/Virtio_2015-10-14.pdf

Virtio Specification

OASIS, the organization that standardizes SAML and ebXML, summarizes the specification as Virtual I/O Device (VIRTIO) Version 1.1. It defines the specifications for callbacks and various data structures.

Virtqueue

The part responsible for actual I/O. By sharing a portion of guest physical memory with the host side, bidirectional reads and writes can be realized. It also has a mechanism to send notifications in both directions: to the host via MMIO, and to the guest via interrupts. Functions like disable_cd and enable_cb can communicate interrupt suppression to the other side. For example, when the guest-side driver doesn’t need interrupts for a certain period, it tells the host side with disable_cd. When sending a notification, call kick.

struct virtqueue_ops {
  int (*add_buf)(struct virtqueue *vq,
         struct scatterlist sg[],
         unsigned int out_num,
         unsigned int in_num,
         void *data);
  void (*kick)(struct virtqueue *vq);
  void *(*get_buf)(struct virtqueue *vq, unsigned int *len);
  void (*disable_cb)(struct virtqueue *vq);
  bool (*enable_cb)(struct virtqueue *vq);
};

https://elixir.bootlin.com/linux/v2.6.31/source/include/linux/virtio.h#L61

Vring

Vring is an implementation of the Virtqueue specification using a ring queue structure. It consists of three data regions. I’ll summarize the detailed behavior of each data region another time.

vring_desc
Array of “guest physical address and length”
vring_avail
Which descriptors are available
vring_used
Which descriptors have been used

vring.png

https://www.cs.cmu.edu/~412/lectures/Virtio_2015-10-14.pdf

Code Reading

struct virtio_driver corresponds to the front-end driver. A Virtio device corresponds to struct virtio_device, which allows access to struct virtio_config_ops and struct virtiqueue.

// Corresponds to front-end driver
struct virtio_driver {
  struct device_driver driver;
  const struct virtio_device_id *id_table;
  const unsigned int *feature_table;
  unsigned int feature_table_size;
  int (*probe)(struct virtio_device *dev);
  void (*remove)(struct virtio_device *dev);
  void (*config_changed)(struct virtio_device *dev);
};

// Corresponds to virtio device
struct virtio_device {
  int index;
  struct device dev;
  struct virtio_device_id id;
  struct virtio_config_ops *config;
  struct list_head vqs;
  unsigned long features[1];
  void *priv;
};

https://elixir.bootlin.com/linux/v2.6.31/source/include/linux/virtio.h#L111

PCI Configuration Space

Virtio devices are connected via PCI. The Device ID is 0x1Af4, and the Vendor ID corresponds to 0x1000 ~ 0x1040. Looking at the Subsystem ID, you can determine the Virtio type. Examples of types include:

  1. virtio-net
  2. virtio-blk
  3. virtio-console

PCI I/O Space

The first 24 bytes correspond to VirtioHeader, immediately followed by type-specific configuration (for virtio-net, virtio_net_config). VirtioHeader includes host and guest feature bits, Virtqueue size, and device status. virtio_net_config stores the maximum number of NIC queues, MTU, MAC address, etc.

lspci -s xx:yy.z -vvv | grep "I/O port"
cat /proc/ioports

# As a result of this command, the first 24 bytes are VirtioHeader
# After that corresponds to virtio_net_config
hexdump -s $((16#XXXX)) -n 64 /dev/port

vhost-net

A problem with the virtio-net host driver was that when the guest issues I/O, vCPU processing stops and control transitions to the host with vmexit. For example, this problem occurs when the guest sends packets externally. Therefore, the vhost-net mechanism was created where a vhost-$pid kernel thread takes over that processing. This kernel thread is created one for each NIC queue.

vhost-net.png

https://www.redhat.com/ja/blog/deep-dive-virtio-networking-and-vhost-net

Terminology

Let’s organize this since it’s confusing.

virtio
virtio API specification
vqueue (virt_queue, virtqueue)
Transport (queue where data actually flows) API specification
vring (virtio_ring)
Implementation of vqueue using a ring queue
VirtioHeader
Field at the beginning of PCI I/O space for configuration
virtio-net
Mechanism that provides a virtual NIC using vring. Can also refer to guest-side or host-side driver implementation
vhost (vhost-net)
Implementation where the host-side implementation of virtio-net is separated from QEMU
vhost-user
vhost-net replaced with a host-side user process

Author

Created by Rusty Russell, the developer of Linux ipchains and its successor netfilter/iptables. He is also the developer of the x86 hypervisor lguest.

References