Brenden Blanco is one of the most prolific developers working on the IO Visor Project, a Linux Foundation
Collaborative Project. Brenden was an employee at PLUMgrid, the startup behind IO
Visor, until it was acquired by VMware.
The interview begins with a history of the layers that stacked up to form
IO Visor. The history begins with the publication of The
BSD Packet Filter: A New Architecture for User-Level Packet Capture
at the 1993 Winter USENIX Conference, in January 1993. This paper
introduced BPF, short for Berkeley Packet Filter, for selecting packets
to be copied to userspace for analysis. BPF was a register-based,
RISC-like virtual machine (analogous to the Java virtual machine). In
Brenden's words:
Two of the major points really seemed very wise to me. It must be
protocol-independent: the kernel should not have to be modified to add
more protocol support... It must be general: the instruction set should
be rich enough to handle unforeseen uses. They have a few more points
about efficiency and generality, but those two first ones have really
ended up creating kind of a platform that has stood the test of time.
Safety was also critical:
Any operating system developer has an inherent distrust of the userspace,
and the system calls that they're defining need to be secure against
attack, as well as ignorance or bad programming. The API they expose
lets a code be uploaded but the kernel has to protect itself against bad
code. It doesn't allow loops, so you can't create an infinite loop in
this virtual machine, and it doesn't allow access to memory that's out of
bounds... That produces something that's limited, but safe. You can't do
everything that a programmer would want, but you can do so efficiently.
BPF came to Linux during the development of the 2.5 release series, according to LWN.
A second fork of the history comes from academic research on operating
system extensibility, which was a theme that grew to prominence during
the 1990s with extensible research operating systems such as SPIN and
the Exokernel.
Many different approaches were proposed, including those based on safe
languages like Modula-3 and Java. In addition, Brenden draws an analogy
between BPF and the development of instruction set-based GPUs.
Brenden says that the movement to software-defined networking has not
been able to speed up operating system evolution. Part of the goal of IO
Visor is to help speed up this evolution, by allowing the networking
subsystem to evolve independent of the rest of the operating system. In
Brenden's words:
Five years ago or so, when we started seeing a move toward
software-defined networking, the hope was that some of this paradigm
would change a little bit. But what I've seen in some of the solutions
is that some of this actually isn't the case. You have a movement of
network functionality from hardware to software but it's still locked
into kind of the operating system life cycle, which is maybe faster than
hardware but isn't in data centers where we've been trying to address
this, it's still slower than the applications change, so the use cases
still change faster than the infrastructure.
Around 2010, Alexei Starovoitov started bringing together these two
branches by extending BPF to form eBPF, which added numerous features
such as extending registers to 32 to 64 bits, increasing the number of
registers from 2 to 10, additional instructions, and the ability to jump
forward and backward (with some restrictions) and most importantly, the
ability to call a restricted set of kernel helper functions and the
ability to work with data structures (“maps”). These changes made the
platform a better target for compiling high-level language code and
better able to interact with its environments. In addition, compilers
for eBPF started being integrated into the kernel, to make eBPF execution
faster on key architectures.
Brenden introduces the BCC, or BPF Compiler
Collection, one of the primary sub-projects within IO Visor. BCC
provides a set of tools for building software in high-level languages,
such as C and Python, into BPF program objects, loading those programs
into running kernels, and interacting with them once they are loaded.
BCC also includes a large suite of example programs.
BPF programs are loaded into a kernel by attaching them to “hook
points.” Typically, the programs attached to a hook point are invoked
when some particular event occurs. For example, in networking, a BPF
program might be invoked whenever a packet is received on a particular
device, or for performance monitoring a BPF program might be attached to
a “kprobe” point.
Brenden explains how to use BCC to load a BPF program into the kernel
using a 1-line Python program. A more sophisticated program can retain
handles and use them to interact with the program at runtime. Brenden
describes how restrictions on BPF are reflected in what a C programmer
can include in a program. Keeping the in-kernel safety verifier simple
and correct is paramount, which tends to reduce the maximum complexity of
programs that can be loaded. The verifier is a limiting factor that
continues to evolve to make BPF more useful over time.
Currently the Open vSwitch community is considering whether to replace
the use of the Open vSwitch kernel module by BPF. Brenden is in favor of
the idea and offers some of his thoughts. First, his experience at
PLUMgrid shows that BPF is flexible enough to support a wide variety of
network applications, including those that Open vSwitch implements.
Second, it's an enjoyable experience for a single developer to be able to
cover the entire infrastructure for an application.
In the future, Brenden is looking forward to BPF usage becoming
ubiquitous for Linux performance monitoring and other non-networking use
cases, such as storage acceleration and security. BPF is already, for
example, used
for sandboxing in Chrome. He's also looking forward to BPF for
hardware offload; it can already be used for hardware offload on
Netronome NICs.
For more information on IO Visor, please visit iovisor.org. To talk to Brenden and
other IO Visor developers, visit the #iovisor channel on the oftc.net IRC
network. Brenden's nick is bblanco. You can also tweet to Brenden at @BrendenBlanco.
More about BPF:
In episode 11, John Fastabend from Intel talks about
BPF on network edge nodes.
In episode 4, Thomas Graf from Cisco talks about
Cilium, which uses BPF to address the question of how to address policy
in a legacy-free container environment that scales to millions of
endpoints.
Packet
Pushers PQ Show 60, from 2015, interviewed Pere Monclus from
PLUMgrid about the IO Visor Project and Linux networking.
OVS Orbit is produced by Ben Pfaff. The
intro music in this episode is Drive,
featuring cdk and DarrylJ, copyright 2013, 2016 by Alex. The bumper
music is Yeah Ant
featuring Wired Ant and Javolenus, copyright 2013 by Speck. The outro
music is Space
Bazooka featuring Doxen Zsigmond, copyright 2013 by Kirkoid. All
content is licensed under a Creative Commons Attribution 3.0
Unported (CC BY 3.0) license.