POSIX Lexing with Derivatives of Regular Expressions

Fahad Ausaf 🌐, Roy Dyckhoff 🌐 and Christian Urban 🌐

May 24, 2016

This is a development version of this entry. It might change over time and is not stable. Please refer to release versions for citations.

Abstract

Brzozowski introduced the notion of derivatives for regular expressions. They can be used for a very simple regular expression matching algorithm. Sulzmann and Lu cleverly extended this algorithm in order to deal with POSIX matching, which is the underlying disambiguation strategy for regular expressions needed in lexers. Their algorithm generates POSIX values which encode the information of how a regular expression matches a string--β€”that is, which part of the string is matched by which part of the regular expression. In this paper we give our inductive definition of what a POSIX value is and show (i) that such a value is unique (for given regular expression and string being matched) and (ii) that Sulzmann and Lu’s algorithm always generates such a value (provided that the regular expression matches the string). This holds also when optimisations are included. Finally we show that (iii) our inductive definition of a POSIX value is equivalent to an alternative definition by Okui and Suzuki which identifies POSIX values as least elements according to an ordering of values.

License

BSD License

History

June 17, 2022
Christian Urban added a theory about the Posix definition by Okui and Suzuki.

Topics

Session Posix-Lexing