Education
[Honors]
Publications
Abstract
This report presents our methodology for and experience with formalizing a specification of LLVM IR in the Verified LLVM Project (Vellvm). Vellvm provides a specification for a large, practical subset of LLVM IR in the Rocq Proof Assistant in the support of verified compilers and program transformations. Program transformations often rely on the subtle details, and as a result Vellvm’s semantics are quite comprehensive: for instance we provide a sophisticated low-level memory model to support low-level operations, such as casts between pointers and integers, and justify optimizations in their presence. Our approach implements the semantics via monadic interpreters, which rely on a coinductively-defined data structure called ITrees. Crucially, this methodology supports the extraction of an executable interpreter, proved to refine the specification. We use the reference interpreter to validate the accuracy of the formalization, employing random differential testing between Clang and Vellvm implemented in our own LLVM IR program generator (GenLLVM), as well as via state-of-the-art C compiler testing frameworks (CSmith and YARPGen). Such testing has found bugs in both the Vellvm semantics and Clang. We believe that tools from the Vellvm project can be useful for other LLVM IR-related projects, and that the overall methodology applies to other formal verification efforts.
Abstract
The Vellvm project is an extensive undertaking to formalize a large subset of LLVM IR in the Coq proof assistant. LLVM IR is an intermediate language that is widely used as a common target for front-end compilers. Compilers for different languages can share the LLVM infrastructure for performing optimizations and generating target code for various instruction set architectures. The bedrock of many languages is LLVM IR, and formalizing it gives us an important tool for compiler verification efforts. In this talk we will explore the difficulties that come up when attempting to formalize a large real world language. We’ll touch upon how proof assistants like Coq can reveal issues in informal specifications, how testing tools like QuickChick can help bridge the gap between informal real world implementations and our formal Coq semantics, and we will also touch upon aspects of the LLVM IR could be updated to better suit verification efforts. Furthermore, we’ll discuss some of the challenges we have encountered along the way, including limitations with the Coq type system, extraction bugs, and performance issues for proofs and extracted code.
Abstract
This paper provides a novel approach to reconciling complex low-level memory model features, such as pointer--integer casts, with desired refinements that are needed to justify the correctness of program transformations. The idea is to use a "two-phase" memory model, one with an unbounded memory and corresponding unbounded integer type, and one with a finite memory; the connection between the two levels is made explicit by a notion of refinement that handles out-of-memory behaviors. This approach allows for more optimizations to be performed and establishes a clear boundary between the idealized semantics of a program and the implementation of that program on finite hardware. The two-phase memory model has been incorporated into an LLVM IR semantics, demonstrating its utility in practice in the context of a low-level language with features like undef and bitcast. This yields infinite and finite memory versions of the language semantics that are proven to be in refinement with respect to out-of-memory behaviors. Each semantics is accompanied by a verified executable reference interpreter. The semantics justify optimizations, such as dead-alloca-elimination, that were previously impossible or difficult to prove correct.
Teaching
- [Teaching Assistant] 25Fa
- [Teaching Assistant] 25Sp
- [Teaching Assistant] 24Fa
- [Head Teaching Assistant] 23Sp,23Su,23Fa,24Su
- [Teaching Assistant] 22Su,22Fa
- [Head Teaching Assistant] 24Sp
- [Teaching Assistant] 23Fa