Theory Definitions
section "Underlying concepts and formal definitions"
theory Definitions
imports "HOL-IMP.Small_Step"
begin
setup ‹
Document_Output.antiquotation_raw_embedded \<^binding>‹lstlisting›
(Scan.lift Parse.embedded)
(fn _ => fn s =>
Latex.string "\\begin{lstlisting}\n" @
Latex.string (cat_lines (trim (forall_string Symbol.is_space) (trim_split_lines s))) @
Latex.string "\n\\end{lstlisting}")
›
text ‹
\null
In a passage of his book \emph{Clean Architecture: A Craftsman's Guide to Software Structure and
Design} (Prentice Hall, 2017), Robert C. Martin defines a computer program as ``a detailed
description of the policy by which inputs are transformed into outputs'', remarking that ``indeed,
at its core, that's all a computer program actually is''. Accordingly, the scope of information flow
control via static type systems is in principle much broader than language-based information flow
security, since this concept promises to cope with information flow correctness in full generality.
This is already shown by a basic program implementing the Euclidean algorithm, in Donald Knuth's
words ``the granddaddy of all algorithms, because it is the oldest nontrivial algorithm that has
survived to the present day'' (from \emph{The Art of Computer Programming, Volume 2: Seminumerical
Algorithms}, third edition, Addison-Wesley, 1997). Here below is a sample such C program, where
variables \lstinline{a} and \lstinline{b} initially contain two positive integers and \lstinline{a}
will finally contain the output, namely the greatest common divisor of those integers.
\null
\<^lstlisting>‹
do
{
r = a % b;
a = b;
b = r;
} while (b);
›
\null
Even in a so basic program, information is not allowed to indistinctly flow from any variable to any
other one, on pain of the program being incorrect. If an incautious programmer swapped \lstinline{a}
for \lstinline{b} in the assignment at line 4, the greatest common divisor output for any two inputs
\lstinline{a} and \lstinline{b} would invariably match \lstinline{a}, whereas swapping the sides of
the assignment at line 5 would give rise to an endless loop. Indeed, despite the marked differences
in the resulting program behavior, both of these potential errors originate in information flowing
between variables along paths other than the demanded ones. A sound implementation of the Euclidean
algorithm does not provide for any information flow from \lstinline{a} to \lstinline{b}, or from
\lstinline{b} to \lstinline{r}.
The static security type systems addressed in \<^cite>‹"Volpano96"›, \<^cite>‹"Volpano97"›, and
\<^cite>‹"Nipkow23-1"› restrict the information flows occurring in a program based on a mapping of each
of its variables to a \emph{domain} along with an \emph{interference relation} between such domains,
including any pair of domains such that the former may interfere with the latter. Accordingly, if
function @{text dom} stands for such a mapping, and infix notation @{text "u ↝ v"} denotes the
inclusion of any pair of domains @{text "(u, v)"} in such a relation (both notations are borrowed
from \<^cite>‹"Rushby92"›), the above errors would be detected at compile time by a static type system
enforcing an interference relation such that:
▪ @{text "dom a ↝ dom r"}, @{text "dom b ↝ dom r"} (line 3),
▪ @{text "dom b ↝ dom a"} (line 4),
▪ @{text "dom r ↝ dom b"} (line 5),
and ruling out any other pair of distinct domains. Such an interference relation would also embrace
the implicit information flow from \lstinline{b} to the other two variables arising from the loop's
termination condition (line 6).
Remarkably, as @{text "dom a ↝ dom r"} and @{text "dom r ↝ dom b"} but @{text "¬ dom a ↝ dom b"},
this interference relation turns out to be intransitive. Therefore, unlike the security static type
systems studied in \<^cite>‹"Volpano96"› and \<^cite>‹"Volpano97"›, which deal with \emph{level-based}, and
then \emph{transitive}, interference relations, a static type system aimed at enforcing information
flow correctness in full generality must be capable of dealing with \emph{intransitive} interference
relations as well. This should come as no surprise, since \<^cite>‹"Rushby92"› shows that this is the
general case already for interference relations expressing information flow security policies.
But the bar can be raised further. Considering the above program again, the information flows needed
for its operation, as listed above, need not be allowed throughout the program. Indeed, information
needs to flow from \lstinline{a} and \lstinline{b} to \lstinline{r} at line 3, from \lstinline{b} to
\lstinline{a} at line 4, from \lstinline{r} to \lstinline{b} at line 5, and then (implicitly) from
\lstinline{b} to the other two variables at line 6. Based on this observation, error detection at
compile time can be made finer-grained by rewriting the program as follows, where \lstinline{i} is a
further integer variable introduced for this purpose.
\null
\<^lstlisting>‹
do
{
i = 0;
r = a % b;
i = 1;
a = b;
i = 2;
b = r;
i = 3;
} while (b);
›
\null
In this program, \lstinline{i} serves as a state variable whose value in every execution step can be
determined already at compile time. Since a distinct set of information flows is allowed for each of
its values, a finer-grained information flow correctness policy for this program can be expressed by
extending the concept of a single, \emph{stateless} interference relation applying throughout the
program to that of a \emph{stateful interference function} mapping program states to interference
relations (in this case, according to the value of \lstinline{i}). As a result of this extension,
for each program state, a distinct interference relation -- that is, the one to which the applied
interference function maps that state -- can be enforced at compile time by a suitable static type
system.
If mixfix notation @{text "s: u ↝ v"} denotes the inclusion of any pair of domains @{text "(u, v)"}
in the interference relation associated with any state @{text s}, a finer-grained information flow
correctness policy for this program can then be expressed as an interference function such that:
▪ @{text "s: dom a ↝ dom r"}, @{text "s: dom b ↝ dom r"} for any @{text s} where \lstinline{i}
$=0$ (line 4),
▪ @{text "s: dom b ↝ dom a"} for any @{text s} where \lstinline{i} $=1$ (line 6),
▪ @{text "s: dom r ↝ dom b"} for any @{text s} where \lstinline{i} $=2$ (line 8),
▪ @{text "s: dom b ↝ dom a"}, @{text "s: dom b ↝ dom r"}, @{text "s: dom b ↝ dom i"} for any
@{text s} where \lstinline{i} $=3$ (line 10),
and ruling out any other pair of distinct domains in any state.
Notably, to enforce such an interference function, a static type system would not need to keep track
of the full program state in every program execution step (which would be unfeasible, as the values
of \lstinline{a}, \lstinline{b}, and \lstinline{r} cannot be determined at compile time), but only
of the values of some specified state variables (in this case, of \lstinline{i} alone). Accordingly,
term \emph{state variable} will henceforth refer to any program variable whose value may affect that
of the interference function expressing the information flow correctness policy in force, namely the
interference relation to be applied.
Needless to say, there would be something artificial about the introduction of such a state variable
into the above sample program, since it is indeed so basic as not to provide for a state machine on
its own, so that \lstinline{i} would be aimed exclusively at enabling the enforcement of such an
information flow correctness policy. Yet, real-world imperative programs, for which error detection
at compile time is truly meaningful, \emph{do} typically provide for state machines such that only a
subset of all the potential information flows is allowed in each state; and even for those which do
not, the addition of some \emph{ad hoc} state variable to enforce such a policy could likely be an
acceptable trade-off.
Accordingly, the goal of this paper is to study information flow control via stateful intransitive
noninterference. First, the notion of termination-sensitive information flow security with respect
to a level-based interference relation, as defined in \<^cite>‹"Nipkow23-1"›, section 9.2.6, is
generalized to that of termination-sensitive information flow correctness with respect to a stateful
interference function having (generally) intransitive interference relations as values. Then, a
static type system is specified and is proven to be capable of enforcing such information flow
correctness policies. Finally, the information flow correctness notion and the static type system
introduced here are proven to degenerate to the counterparts addressed in \<^cite>‹"Nipkow23-1"›,
section 9.2.6, in case of a stateless level-based information flow correctness policy.
Although the operational semantics of the didactic imperative programming language IMP employed in
\<^cite>‹"Nipkow23-1"› is used for this purpose, the introduced concepts are applicable to larger,
real-world imperative programming languages as well, by just affording the additional type system
complexity arising from richer language constructs. Accordingly, the informal explanations
accompanying formal content in what follows will keep making use of sample C code snippets.
For further information about the formal definitions and proofs contained in this paper, see
Isabelle documentation, particularly \<^cite>‹"Paulson23"›, \<^cite>‹"Nipkow23-4"›, \<^cite>‹"Krauss23"›,
\<^cite>‹"Nipkow11"›, and \<^cite>‹"Ballarin23"›.
›
subsection "Global context definitions"
declare [[syntax_ambiguity_warning = false]]