It’s sufficiently mature and rarely causes trouble, but since it’s a technology that will continue to be used, knowing its behavior is meaningful.
Full Virtualization and Paravirtualization of NICs
A quick review. Full virtualization is a type of virtualization where the guest itself cannot recognize that it is running in a virtual environment. The host needs to completely emulate the behavior of devices used by the guest. To emulate devices, a large number of traps occur, and the CPU is used for host-side processing. Also, when device emulation is performed in a host-side user process, delays due to scheduling waits occur.
On the other hand, with paravirtualization, the guest recognizes that it is running in a virtual environment and cooperates with the host to improve performance. virtio operates virtual devices through device drivers dedicated to paravirtualization.
Architecture
The configuration consists of a front-end driver and a back-end driver connected by vring.
- Front-end driver
- Runs on the guest side. Sends I/O issued by the guest OS to the back-end driver via vring.
- Back-end driver
- Runs on the host side. Sends I/O received via vring to physical devices.
Virtio Specification
OASIS, the organization that standardizes SAML and ebXML, summarizes the specification as Virtual I/O Device (VIRTIO) Version 1.1. It defines the specifications for callbacks and various data structures.
Virtqueue
The part responsible for actual I/O. By sharing a portion of guest physical memory with the host side, bidirectional reads and writes can be realized.
It also has a mechanism to send notifications in both directions: to the host via MMIO, and to the guest via interrupts.
Functions like disable_cd and enable_cb can communicate interrupt suppression to the other side.
For example, when the guest-side driver doesn’t need interrupts for a certain period, it tells the host side with disable_cd.
When sending a notification, call kick.
struct virtqueue_ops { int (*add_buf)(struct virtqueue *vq, struct scatterlist sg[], unsigned int out_num, unsigned int in_num, void *data); void (*kick)(struct virtqueue *vq); void *(*get_buf)(struct virtqueue *vq, unsigned int *len); void (*disable_cb)(struct virtqueue *vq); bool (*enable_cb)(struct virtqueue *vq); };https://elixir.bootlin.com/linux/v2.6.31/source/include/linux/virtio.h#L61
Vring
Vring is an implementation of the Virtqueue specification using a ring queue structure. It consists of three data regions. I’ll summarize the detailed behavior of each data region another time.
vring_desc- Array of “guest physical address and length”
vring_avail- Which descriptors are available
vring_used- Which descriptors have been used
Code Reading
struct virtio_driver corresponds to the front-end driver.
A Virtio device corresponds to struct virtio_device, which allows access to struct virtio_config_ops and struct virtiqueue.
// Corresponds to front-end driver struct virtio_driver { struct device_driver driver; const struct virtio_device_id *id_table; const unsigned int *feature_table; unsigned int feature_table_size; int (*probe)(struct virtio_device *dev); void (*remove)(struct virtio_device *dev); void (*config_changed)(struct virtio_device *dev); }; // Corresponds to virtio device struct virtio_device { int index; struct device dev; struct virtio_device_id id; struct virtio_config_ops *config; struct list_head vqs; unsigned long features[1]; void *priv; };https://elixir.bootlin.com/linux/v2.6.31/source/include/linux/virtio.h#L111
PCI Configuration Space
Virtio devices are connected via PCI. The Device ID is 0x1Af4, and the Vendor ID corresponds to 0x1000 ~ 0x1040.
Looking at the Subsystem ID, you can determine the Virtio type. Examples of types include:
- virtio-net
- virtio-blk
- virtio-console
PCI I/O Space
The first 24 bytes correspond to VirtioHeader, immediately followed by type-specific configuration (for virtio-net, virtio_net_config).
VirtioHeader includes host and guest feature bits, Virtqueue size, and device status.
virtio_net_config stores the maximum number of NIC queues, MTU, MAC address, etc.
lspci -s xx:yy.z -vvv | grep "I/O port"
cat /proc/ioports
# As a result of this command, the first 24 bytes are VirtioHeader
# After that corresponds to virtio_net_config
hexdump -s $((16#XXXX)) -n 64 /dev/port
vhost-net
A problem with the virtio-net host driver was that when the guest issues I/O, vCPU processing stops and control transitions to the host with vmexit.
For example, this problem occurs when the guest sends packets externally.
Therefore, the vhost-net mechanism was created where a vhost-$pid kernel thread takes over that processing.
This kernel thread is created one for each NIC queue.
https://www.redhat.com/ja/blog/deep-dive-virtio-networking-and-vhost-net
Terminology
Let’s organize this since it’s confusing.
- virtio
- virtio API specification
- vqueue (virt_queue, virtqueue)
- Transport (queue where data actually flows) API specification
- vring (virtio_ring)
- Implementation of vqueue using a ring queue
- VirtioHeader
- Field at the beginning of PCI I/O space for configuration
- virtio-net
- Mechanism that provides a virtual NIC using vring. Can also refer to guest-side or host-side driver implementation
- vhost (vhost-net)
- Implementation where the host-side implementation of virtio-net is separated from QEMU
- vhost-user
- vhost-net replaced with a host-side user process
Author
Created by Rusty Russell, the developer of Linux ipchains and its successor netfilter/iptables. He is also the developer of the x86 hypervisor lguest.
References
- virtio: Towards a De-Facto Standard For Virtual I/O Devices
- 仮想化環境におけるパケットフォワーディング
- Virtio-networking series
- ハイパーバイザの作り方
- Virtual I/O Device (VIRTIO) Version 1.1
- OSC2011 Tokyo/Fall 濃いバナ(virtio)
- Virtio: An I/O virtualization framework for Linux
- virtio guest side implementation: PCI, virtio device, virtio net and virtqueue
- The evolution of IO Virtualization and DPDK-OVS implementation in Linux


