X86 instruction semantics and basic block symbolic execution

Freek Verbeek 📧, Abhijith Bharadwaj, Joshua Bockenek, Ian Roessle, Timmy Weerwag and Binoy Ravindran

October 13, 2021

This is a development version of this entry. It might change over time and is not stable. Please refer to release versions for citations.

Abstract

This AFP entry provides semantics for roughly 120 different X86-64 assembly instructions. These instructions include various moves, arithmetic/logical operations, jumps, call/return, SIMD extensions and others. External functions are supported by allowing a user to provide custom semantics for these calls. Floating-point operations are mapped to uninterpreted functions. The model provides semantics for register aliasing and a byte-level little-endian memory model. The semantics are purposefully incomplete, but overapproximative. For example, the precise effect of flags may be undefined for certain instructions, or instructions may simply have no semantics at all. In those cases, the semantics are mapped to universally quantified uninterpreted terms from a locale. Second, this entry provides a method to symbolic execution of basic blocks. The method, called “se_step” (for: symbolic execution step) fetches an instruction and updates the current symbolic state while keeping track of assumptions made over the memory model. A key component is a set of theorems that prove how reads from memory resolve after writes have occurred. Thirdly, this entry provides a parser that allows the user to copy-paste the output of the standard disassembly tool objdump into Isabelle/HOL. A couple small and explanatory examples are included, including functions from the word count program. Several examples can be supplied upon request (they are not included due to the running time of verification): functions from the floating-point modulo function from FDLIBM, the GLIBC strlen function and the CoreUtils SHA256 implementation.

License

BSD License

Topics

Session X86_Semantics