Information | Hanxi (Gary) Chen

Education

Cornell University - Ithaca, NY, USA 2025-

Ann S. Bowers College of Computing and Information Science

Ph.D. in Computer Science

University of Pennsylvania - Philadelphia, PA, USA 2020-2025

Uncoordinated Dual Bachelor's Degree & Master's Degree between Penn Engineering & The Wharton School

M.S.E. in Computer and Information Science

B.S.E. in Computer Science Mathematics Minor, Summa Cum Laude

B.S. in Economics Business Analytics Concentration, Summa Cum Laude

[GPA]: 3.97/4.00

[Honors]

Wharton Research Scholars, MCIT TA Award, CIS Faculty Appreciation Award, Eta Kappa Nu Honors Society, Tau Beta Pi Honors Society, Beta Gamma Sigma Honors Society

Publications

[NFM'25] Vellvm: Formalizing the Informal LLVM: (Experience Report)

Calvin Beck, Hanxi Chen, Steve Zdancewic

Abstract

This report presents our methodology for and experience with formalizing a specification of LLVM IR in the Verified LLVM Project (Vellvm). Vellvm provides a specification for a large, practical subset of LLVM IR in the Rocq Proof Assistant in the support of verified compilers and program transformations. Program transformations often rely on the subtle details, and as a result Vellvm’s semantics are quite comprehensive: for instance we provide a sophisticated low-level memory model to support low-level operations, such as casts between pointers and integers, and justify optimizations in their presence. Our approach implements the semantics via monadic interpreters, which rely on a coinductively-defined data structure called ITrees. Crucially, this methodology supports the extraction of an executable interpreter, proved to refine the specification. We use the reference interpreter to validate the accuracy of the formalization, employing random differential testing between Clang and Vellvm implemented in our own LLVM IR program generator (GenLLVM), as well as via state-of-the-art C compiler testing frameworks (CSmith and YARPGen). Such testing has found bugs in both the Vellvm semantics and Clang. We believe that tools from the Vellvm project can be useful for other LLVM IR-related projects, and that the overall methodology applies to other formal verification efforts.

[CoqPL'25] Vellvm: Formalizing the Informal

Calvin Beck, Hanxi Chen, Steve Zdancewic

Abstract

The Vellvm project is an extensive undertaking to formalize a large subset of LLVM IR in the Coq proof assistant. LLVM IR is an intermediate language that is widely used as a common target for front-end compilers. Compilers for different languages can share the LLVM infrastructure for performing optimizations and generating target code for various instruction set architectures. The bedrock of many languages is LLVM IR, and formalizing it gives us an important tool for compiler verification efforts. In this talk we will explore the difficulties that come up when attempting to formalize a large real world language. We’ll touch upon how proof assistants like Coq can reveal issues in informal specifications, how testing tools like QuickChick can help bridge the gap between informal real world implementations and our formal Coq semantics, and we will also touch upon aspects of the LLVM IR could be updated to better suit verification efforts. Furthermore, we’ll discuss some of the challenges we have encountered along the way, including limitations with the Coq type system, extraction bugs, and performance issues for proofs and extracted code.

[ICFP'24] A Two-Phase Infinite/Finite Low-Level Memory Model: Reconciling Integer–Pointer Casts, Finite Space, and undef at the LLVM IR Level of Abstraction

Calvin Beck, Irene Yoon, Hanxi Chen, Yannick Zakowski, Steve Zdancewic

Abstract

This paper provides a novel approach to reconciling complex low-level memory model features, such as pointer--integer casts, with desired refinements that are needed to justify the correctness of program transformations. The idea is to use a "two-phase" memory model, one with an unbounded memory and corresponding unbounded integer type, and one with a finite memory; the connection between the two levels is made explicit by a notion of refinement that handles out-of-memory behaviors. This approach allows for more optimizations to be performed and establishes a clear boundary between the idealized semantics of a program and the implementation of that program on finite hardware. The two-phase memory model has been incorporated into an LLVM IR semantics, demonstrating its utility in practice in the context of a low-level language with features like undef and bitcast. This yields infinite and finite memory versions of the language semantics that are proven to be in refinement with respect to out-of-memory behaviors. Each semantics is accompanied by a verified executable reference interpreter. The semantics justify optimizations, such as dead-alloca-elimination, that were previously impossible or difficult to prove correct.

Teaching

Cornell University

[CS 4110] Programming Languages and Logics

[Teaching Assistant] 25Fa

University of Pennsylvania

[CIS 5521] Compilers and Interpreters

[Teaching Assistant] 25Sp

[CIS 5520] Advanced Programming

[Teaching Assistant] 24Fa

[CIS 5020/CIT 5960/CIS 3200] Analysis of Algorithms

[Head Teaching Assistant] 23Sp, 23Su, 23Fa, 24Su
[Teaching Assistant] 22Su, 22Fa

[NETS 3120] Theory of Networks

[Head Teaching Assistant] 24Sp

[CIS 5000] Software Foundations

[Teaching Assistant] 23Fa