BPF and formal verification

I spent the spring of 2015 researching the Berkeley packet filter (BPF) and its formal verification with my programming languages professor, Joe Gibbs Politz. The project took some unexpected turns and we learned a lot about Coq and applied formal verification in the process.[1]

Other than a little experience with the proof assistant Isabelle, this was my first foray into formal verification. If you consider yourself a Coq guru, this reflection might seem facile. If you’re interested in formal verification or are starting your first real-world Coq project, though, I think you’ll find it worthwhile.

The main goal of our work was to create a translation validator for BPF programs. This idea was based on the Jitk project from MIT.

A translation validator proves some definition of correctness for an individual BPF program. This distinction is critical to understand when validating programs or compilers. Formally verifying a compiler proves that all outputs of that compiler are correct.[2] Translation validation means that you have something to check the correctness of that compiler’s output. Using a translation validator, we can write a BPF metalanguage, like pcap syntax, that compiles to BPF, and be confident in the correctness of each output by using the translation validator to check it.

A quick BPF intro

BPF allows a userland program to specify which data link layer packets (e.g. Ethernet or 802.11 wireless packets) it wishes to receive from the kernel. For example, a program may want all UDP packets from a certain source IP address to a certain subnet of destination IP addresses. In most cases, the user wants to see transport layer packets like TCP and UDP rather than link layer packets. pcaplib is typically used to compile user-supplied transport layer rules into a BPF program that filters link layer packets, hiding the link layer details from the user.

These rules are submitted in the form of a BPF program to be executed by a simple virtual machine in the kernel. If the program returns zero, the packet is ignored. If the return value is non-zero, it indicates the number of bytes of the packet to return to the userland program.

The BPF language is essentially a 32-bit assembly language, consisting of 64-bit instructions that strongly resemble a RISC ISA like MIPS. This means that:

OpenBSD, which uses classic BPF, defines an instruction like this:

 struct bpf_insn {
         u_int16_t       code;
         u_char          jt;
         u_char          jf;
         u_int32_t       k;

The struct’s fields are, in order:

Of course, not all fields are relevant to all instructions. For example, arithmetic instructions cannot change control flow, so the jump offsets will be unused.

The BPF language is simple enough to be executed by a single C function, bpf_filter(), which resides in sys/net/bpf_filter.c. Its body is just a while (1) that contains a case statement on the current instruction.

The following state is available to the program:

In both the OpenBSD and the Linux kernels, 16 32-bit memory words are available (defined as BPF_MEMWORDS in sys/net/bpf.h).

For a more detailed understanding of BPF, the OpenBSD man page and the corresponding source code are great resources. The code is particularly clean and well-commented and lacks the complicating JIT compilers that most other Unix-like OSs include. Looking at the pcap implementation built atop it will give you a better idea of how BPF is used in action. There are also a few real-world BPF program examples at the bottom of the man page, one of which we’re about to explore.

BPF’s halting problem

There’s a pretty apparent problem: program duration. Can a user submit an infinite loop, either by mistake or in an attempt to gum up the kernel’s networking? We have two reasonable options to prevent this: a timeout with a default return value, or a restriction that only allows programs to step or jump forward, never revisiting future instructions. BPF uses the latter solution. This allows for more reliable results and demands simpler BPF program logic.

The below BPF program, taken from the OpenBSD man page and translated into a more readable representation, returns all Ethernet packets between IP addresses and

ld_half P[12]
jeq_imm 2048 0 8        # not an Ethernet packet? jump to return 0
ld_word P[26]
jeq_imm 0x8003700f 0 2      # is the source
ld_word P[30]
jeq_imm 0x80037023 3 4      # is the dest
jeq_imm 0x80037023 0 3      # is the source
ld_word P[30]
jeq_imm 0x8003700f 0 1      # is the dest
ret_k   MAX_INT-1
ret_k   0

The second to last instruction uses MAX_INT-1 as the return value, which is an idiom used to return the entire packet.

The purpose of BPF

BPF is meant to make packet filtering far more efficient, preventing unnecessary copying of packet data by hooking filtering directly into the kernel’s network stack. However, it also creates a very dangerous situation: an ad hoc virtual machine. While BPF seems relatively safe and pedestrian, we can’t forget that it’s effectively running user-submitted assembly code with kernel privileges.

While it seems that the program is restricted to a tiny memory window and a few instructions, implementation bugs can and have allowed attackers to write to other memory or jump to an arbitrary instruction. This situation has been exploited many times, as outlined by the Jitk paper. Because of this, the BPF virtual machine is a prime candidate for careful auditing.

BPF and state

BPF is also a fantastic target for formal verification because it can only jump forward. This avoids the halting problem – BPF programs have a finite set of possible states and must terminate. We can therefore model BPF and analyze its programs completely rather than resorting to heuristics that give false positives.

How should we represent that state of the BPF VM? The OpenBSD man page conveniently tells us:

A filter program is an array of instructions with all branches forwardly directed, terminated by a “return” instruction. Each instruction performs some action on the pseudo-machine state, which consists of an accumulator, index register, scratch memory store, and implicit program counter.

In Coq, we can therefore use a record of the below form to represent the VM’s state:

 Record vm_state : Type := make_state {
     acc : option W;
     x_reg : option W;
     ins : list instr;
     pkt : list B;
     smem : scratch_mem

Types W and B represent a word and a byte respectively, which we’ll soon discuss in greater detail. scratch_mem is a synonym for Vector.t (option W) 16. Here, we see our first safety feature - option W is used so that uninitialized memory can be represented with None, allowing us to recognize accesses to uninitialized memory.

ins begins with the instruction being executed. Because we always step forward, past instructions don’t exist in our state.


As you’re likely aware, Coq only allows well-founded recursion - recursion that will provably always terminate. Allowing infinite recursion would undermine the foundation of Coq’s proof system.

This restriction is an inherent challenge when modeling most execution environments. If the program being executed is infinite, infinite recursion is exactly what we want to model![4] When seeking an easy, concise solution, we can either choose a more abstract model of execution, or we can use a realistic hack we call fuel. Fuel is simply an additional argument representing the number of iterations we allow before exiting with an out-of-fuel error.

Fuel, while a bit restrictive in some cases, works perfectly for BPF because it only allows stepping forward. For an n-instruction BPF program, we can use n for our fuel and be sure that we’ll never run out.

Our evaluator

The easiest way to begin a terminating evaluator in Coq is to write an evaluator for a single istrution. The core of our BPF evaluator is the function step of type:

Fixpoint step (s:vm_state) : state

step is analogous to the body of the while (1) loop in OpenBSD’s BPF evaluator.

As shown above, vm_state is simply a record containing the current state of the virtual machine. We introduce a new type end_state representing the two possible final states of a BPF program: a return value or an error.

Inductive end_state : Type :=
  | Ret : W -> end_state
  | Error : string -> end_state.

Our evaluation function step executes the next instruction, returning either the vm_state resulting from that execution or, if it reached the terminating step, an end_state.

Inductive state : Type :=
  | Cont : vm_state -> state
  | End : end_state -> state.

Because the head of the vm_state’s instruction list is the next instruction, step can pattern match on this list, constructing a simple Fixpoint.

step only takes us through one instruction. We can easily use it to create a full evaluation engine, though:

Fixpoint prog_eval (fuel:nat) (s:vm_state) : end_state :=
  match fuel with
    | O => Error "ran out of fuel"
    | S n =>
        match step s with
          | End e_s => e_s
          | ContState cs => prog_eval n cs

Coq doesn’t recognize the transitivity of the fixpoint. That is, even though the instruction list in prog_eval’s argument s decreases by the definition of step, Coq isn’t aware of that. This is where the fuel helps us. We simply decrease on the fuel rather than (ins s), returning an error if we run out.

run_filter, shown below, is then added to make the evaluator easier to use. It computes the necessary fuel, creating an initial VM state with uninitialized memory and registers.

Definition run_filter (l:list instr) : end_state :=
  prog_eval (length l) (init_state l).

Data representation and atomicity

Clearly, Coq is a very high-level language, insofar as it’s a language at all. We therefore bridge a chasm by using it to represent hardware-level data, but doing so often surprisingly elegant.

Numbers are probably the simplest and most pervasive concept that we must translate between these two levels of abstraction. Coq programs generally use numbers of type nat, an inductive type:

Inductive nat : Set :=
  | O : nat
  | S : nat -> nat.

Fixed sized binary data - mostly 32-bit ints in our case - are not well represented with nat. Clearly, they have a limited number of possible values, and their physical representation (an ordered vector of bits) is relevant for hardware logic like arithmetic. Furthermore, nat is by default extracted as an inductive type of that form, making for incredibly inefficient OCaml code. Using directly extracted nats, you can actually get a stack overflow when attempting operations as simple as comparison![5]

We could better represent n-bit ints with ordered set of bits. Unlike nat, each given int type has a constant, finite number of values and no intrinsic concept of predecessor or successor.

For this, we chose the Bedrock library’s word type:

Inductive word : nat -> Set :=
| WO : word O
| WS : bool -> forall n, word n -> word (S n).

It takes a single parameter: the size of the word in bits. Our BPF VM therefore operates mostly on type word 32, for which we create the synonym W.

The bedrock library also provides many sane, low-level arithmetic operations on words. One notable exception is division, a difficult operation to implement with combinatorial logic. It also raises an issue which we’ll later address: what happens in exception-raising or undefined cases like a division-by-zero?

A first look at extraction

Generally speaking, Coq is slow and hard to run. Extraction allows us to translate our Coq program into OCaml,[6] which is faster and more portable. Extraction also allows us to import verified Coq programs into OCaml code. The translation is relatively direct - many Coq types and functions can be converted to OCaml with only minor syntax changes.

The process of extraction is extremely elegant and simple. Here’s a minimal example, extracting nat into an OCaml file nat_def.ml:

Extraction Language Ocaml.
Extraction "nat_def.ml" nat.

Of course, the OCaml code is somewhat less safe than its corresponding Coq program. We must trust the extraction logic, which is not fully verified, and the OCaml compiler, which isn’t verified at all. We must also sometimes use the extracted OCaml in precarious or downright dangerous ways.

The simplest example of this is unsafe type-casting using Obj.magic. CompCert uses Obj.magic in two places. This is necessary when extracting, for example, subset types, and not writing and extracting conversion functions in Coq. The resulting code can be extremely hard to comprehend, especially if you aren’t the one writing it. If you think you’ve mastered OCaml extraction, try understanding CompCert’s Obj.magic usage in their lexer’s token stream. It took me more time than I’m willing to admit.

Extraction and real-world programs

There is a truth that has been generally accepted by the CompCert community: verifying a lexer is hard, a verified lexer would be very slow, and it’s therefore not a priority. This means that Coq programs generally use lexers written in OCaml, usually one generated by ocamllex.

This demands of the author additional extraction acrobatics. If we could write everything from the lexer to the interpreter or code generator in Coq, the OCaml code interacting with the extracted program would only have to feed it filenames.

Thankfully, we do now have a verified parser written in Coq. Jacques-Henri Jourdan wrote a Menhir backend for Coq using the --coq flag and files with the .vy extension. As of May 2015, it isn’t yet described in the Menhir documentation, but the section has been drafted and should be included in the next release.

The only example I could find of a Menhir parser written in Coq was CompCert’s. Jacques-Henri does a great job of describing it in his blog post. One topics that is only passingly mentioned is the need for a queue of lexed tokens. because of a few subtle flaws in the C standard, a C parser must keep some state beyond a simple LR(1) parser, making CompCert’s implementation complex and verbose. To this end, I wrote a minimal example parser for Menhir’s Coq backend. Because CompCert’s parser uses a much more complicated interface with the lexer and is integrated with the rest of its sizeable codebase, I’d suggest starting with the (forthcoming) documentation and my minimal implemenation if you’re starting out with Coq in Menhir.

Anecdotally, Jacques-Henri used the above-discussed concept of fuel to deal with the possibility of nonterminating parsers.

As you’ve likely recognized, implementation of extraction tools and peripheral utilities like parsers is a very nascent field. Most of what CompCert uses is in-house and undocumented.

Keeping extraction safe and simple (or, keeping Obj.magic far away)

One of the most common and fundamental problems with extraction is that OCaml cannot understand the construction of subset types extracted from Coq.[7] Even sum types can be difficult or impossible to construct in certain contexts.

The simplest solution to this is using an additional layer of abstraction written in Coq that converts our Coq types to simple algebraic types when possible. My minimal example defines its types and conversion functions in Datatypes.v

Data representation and abstaction

The ultimate goal is to fully represent the values as they exist when extracted. When this is the case, you can use the Extract Constant feature to directly translate your Coq type to the equivalent OCaml type. CompCert’s extraction logic makes heavy use of this. For example:

Extract Constant Cabs.char_code => "int64".

Otherwise, the Coq extractor will do its best in creating an equivalent type, which is often complex, unreadable, or very slow.

Choosing a type for your numeric values may seem unimportant. However, the type that you use can greatly affect proofs involving those types.


CompCert, a nearly complete implementation of the ISO C90 and ANSI C standards, dominates the world of applied and systems-level Coq. It is funded and hosted by INRIA, which is also home to Coq itself. The project is impressively advanced and successful, and has hosted many skilled developers since it began in 2005. If you want to do something systems-related in Coq, studying the relevant CompCert source code and journal articles is a good first step. They generally only document proof- and module-level design, so one often has to reverse-engineer their Coq style and conventions.

Far more needs to be written about Coq coding style. CompCert, being probably the most complex and successful Coq application, is a prime candidate for inspection. Our fuel approach and many of our extraction techniques were adapted from methods used by CompCert. We also used a verified parser implemented in Coq which was written to be used in CompCert.

There is, however, one big obstacle: CompCert’s license. Much of its code is bound to a non-commercial agreement written by INRIA. The rest is dual-licensed, allowing the user to choose between INRIA’s non-commercial license and the GPLv2.[8] This seems to be standard for software developed at INRIA. As long as non-free is included, CompCert and adaptations of its code can’t be used in core open source contexts - for example, inclusion in an operating system’s base system or use as the required compiler for major open source projects.

I’m very appreciative of everything CompCert has contributed. The world of verified systems tools could use more diversity, though, and CompCert’s license makes it clear that if a verified language tool is integrated into a major open source toolchain, it won’t be CompCert.

Evaluation vs. translation validation

To be continued…




  1. We learned a good deal about packet filtering and kernel networking too, but that’s an article for another day.  ↩

  2. As I’ll discuss later, the definition of “correct” is non-obvious and is unique to each application.  ↩

  3. An immediate is just a constant value hard-coded into an assembly instruction. I’ll occasionally a little bit of necessary and innocuous computer architecture jargon, but it should be manageable.  ↩

  4. Additionally, even functions that do clearly terminate can be infuriatingly hard to prove. The Coq logic behind Fixpoint, which is used to write provably terminating functions, only recognizes this property when the function pattern matches one of its arguments and makes a recursive call with a value bound in the match expression. Program Fixpoint allows you to manually prove the function’s termination as an “obligation”. In my experience, though, this is impressively difficult for apparently trivial proofs. I’ve only had success using simple Fixpoints or, when that isn’t possible, fuel.  ↩

  5. This was a source of confusion for us when running tests of real BPF programs in our extracted evalator. At first, we thought our or OCaml code was interfacing with the extracted Coq improperly. However, we soon realized that Ocaml has a stack limit of about 262,067 frames, and that nat-to-nat comparisons of IP addresses would overflow the stack.  ↩

  6. Extraction also exists for Haskell and Scheme, and there is a plugin available to extract to Python. However, the OCaml extractor is the most mature and is essentially the only one used in production.  ↩

  7. A subset type is a type that is restricted to values for which a Prop holds true. For example, {n : nat | n > 0} represents non-zero nats, and {l : list A | l <> [] } represents non-empty lists.  ↩

  8. See the LICENSE document in the CompCert codebase for more information.  ↩