The LEO Compiler: A Verifying Approach for Correctness
In this blog post, we explore the design of the LEO compiler, a unique and powerful tool that goes beyond traditional compilers. Unlike ordinary compilers that produce machine code, the LEO compiler generates machine-checked formal proofs of its correctness during its operation. These proofs are then verified by the ACL2 theorem prover, based on an ACL2 formalization of LEO, the language it compiles. This revolutionary approach ensures that the LEO compiler produces correct outputs for any input, guaranteeing the integrity of the compiled programs.
Architecture: LEO Compiler Phases
The LEO compiler comprises several phases, each responsible for specific transformations and checks. These phases collaborate to translate LEO code text into an R1CS circuit, a key component used in cryptographic protocols. The sequence of phases involves native Abstract Syntax Trees (ASTs) and the conversion of ASTs to ACL2 ASTs through JSON representations.
Proof Generation: Formal Verification at Every Step
An essential feature of the LEO compiler is that it generates machine-checkable proofs at each phase. For instance, when a phase transforms an AST into another AST, it generates a proof showing the correctness of the transformation. These proofs are based on the ACL2 formalization of LEO, which defines correctness relations between various artifacts. The generated proofs are then verified by the ACL2 theorem prover, ensuring the correctness of each compiler phase.
Compiler Phases: Parsing, AST Conversion, ASG Conversion, Canonicalization, Type Inference, Optimizations, Circuit Synthesis
- Parsing: The LEO parser follows the ABNF grammar of LEO, which is part of the ACL2 formalization of LEO. It converts LEO code text into tokens and then into ASTs, catching lexical and syntactical errors early in the process.
- AST Conversion: The parsing phase produces an abstract syntax tree (AST). The AST conversion phase simplifies this tree by removing unnecessary nodes and handling errors effectively.
- ASG Conversion: The compiler uses an Abstract Semantic Graph (ASG) for efficient optimization passes. ASGs contain more context than ASTs, making them suitable for optimizations, but they can be easily converted back to ASTs for proof generation.
- Canonicalization: This phase performs canonicalization transformations, simplifying subsequent phases and facilitating syntactic equality checks.
- Type Inference: This phase checks and infers correct types for variables and literals in a program. It recommends explicit type definitions where necessary.
- Optimizations: The compiler performs constant folding, propagation, loop unrolling, and function inlining optimizations, enhancing program efficiency.
- Circuit Synthesis: The final phase translates the ASG into an R1CS relation. The compiler leverages handcrafted circuits and a library of R1CS gadgets to compose efficient R1CS relations.
Verifying R1CS Gadgets in LEO CORE
The LEO compiler includes native implementations of cryptographic hash functions like Blake2s. To maintain the zero-knowledge feature, formal methods are used to verify R1CS relations, which are not generated from LEO programs. This verification process relies on the ACL2 theorem prover and Kestrel Institute’s Axe toolkit. The Prover checks the relationship between inputs and outputs specified in the R1CS against the given specification. The Axe Prover employs rewriting and substitution tactics to solve constraints and verify the correctness of the R1CS.
Conclusion: A Milestone in Compiler Technology
The LEO compiler represents a remarkable advancement in compiler technology, as it goes beyond traditional compilation processes. By generating machine-checkable proofs at every phase and using formal methods to verify R1CS relations, the LEO compiler ensures the correctness and integrity of its operation. This approach opens up new possibilities in building secure and verifiable cryptographic applications while instilling confidence in developers and users alike.