Theory Definitions

(*  Title:       Information Flow Control via Stateful Intransitive Noninterference in Language IMP
    Author:      Pasquale Noce
                 Senior Staff Firmware Engineer at HID Global, Italy
                 pasquale dot noce dot lavoro at gmail dot com
                 pasquale dot noce at hidglobal dot com
*)

section "Underlying concepts and formal definitions"

theory Definitions
  imports "HOL-IMP.Small_Step"
begin

(*<*)
setup ‹
  Document_Output.antiquotation_raw_embedded \<^binding>‹lstlisting›
    (Scan.lift Parse.embedded)
    (fn _ => fn s =>
      Latex.string "\\begin{lstlisting}\n" @
      Latex.string (cat_lines (trim (forall_string Symbol.is_space) (trim_split_lines s))) @
      Latex.string "\n\\end{lstlisting}")
›
(*>*)

text ‹
\null

In a passage of his book \emph{Clean Architecture: A Craftsman's Guide to Software Structure and
Design} (Prentice Hall, 2017), Robert C. Martin defines a computer program as ``a detailed
description of the policy by which inputs are transformed into outputs'', remarking that ``indeed,
at its core, that's all a computer program actually is''. Accordingly, the scope of information flow
control via static type systems is in principle much broader than language-based information flow
security, since this concept promises to cope with information flow correctness in full generality.

This is already shown by a basic program implementing the Euclidean algorithm, in Donald Knuth's
words ``the granddaddy of all algorithms, because it is the oldest nontrivial algorithm that has
survived to the present day'' (from \emph{The Art of Computer Programming, Volume 2: Seminumerical
Algorithms}, third edition, Addison-Wesley, 1997). Here below is a sample such C program, where
variables \lstinline{a} and \lstinline{b} initially contain two positive integers and \lstinline{a}
will finally contain the output, namely the greatest common divisor of those integers.

\null

\<^lstlisting>‹
do
{
    r = a % b;
    a = b;
    b = r;
} while (b);
›

\null

Even in a so basic program, information is not allowed to indistinctly flow from any variable to any
other one, on pain of the program being incorrect. If an incautious programmer swapped \lstinline{a}
for \lstinline{b} in the assignment at line 4, the greatest common divisor output for any two inputs
\lstinline{a} and \lstinline{b} would invariably match \lstinline{a}, whereas swapping the sides of
the assignment at line 5 would give rise to an endless loop. Indeed, despite the marked differences
in the resulting program behavior, both of these potential errors originate in information flowing
between variables along paths other than the demanded ones. A sound implementation of the Euclidean
algorithm does not provide for any information flow from \lstinline{a} to \lstinline{b}, or from
\lstinline{b} to \lstinline{r}.

The static security type systems addressed in \<^cite>‹"Volpano96"›, \<^cite>‹"Volpano97"›, and
\<^cite>‹"Nipkow23-1"› restrict the information flows occurring in a program based on a mapping of each
of its variables to a \emph{domain} along with an \emph{interference relation} between such domains,
including any pair of domains such that the former may interfere with the latter. Accordingly, if
function @{text dom} stands for such a mapping, and infix notation @{text "u ↝ v"} denotes the
inclusion of any pair of domains @{text "(u, v)"} in such a relation (both notations are borrowed
from \<^cite>‹"Rushby92"›), the above errors would be detected at compile time by a static type system
enforcing an interference relation such that:

  ▪ @{text "dom a ↝ dom r"}, @{text "dom b ↝ dom r"} (line 3),

  ▪ @{text "dom b ↝ dom a"} (line 4),

  ▪ @{text "dom r ↝ dom b"} (line 5),

and ruling out any other pair of distinct domains. Such an interference relation would also embrace
the implicit information flow from \lstinline{b} to the other two variables arising from the loop's
termination condition (line 6).

Remarkably, as @{text "dom a ↝ dom r"} and @{text "dom r ↝ dom b"} but @{text "¬ dom a ↝ dom b"},
this interference relation turns out to be intransitive. Therefore, unlike the security static type
systems studied in \<^cite>‹"Volpano96"› and \<^cite>‹"Volpano97"›, which deal with \emph{level-based}, and
then \emph{transitive}, interference relations, a static type system aimed at enforcing information
flow correctness in full generality must be capable of dealing with \emph{intransitive} interference
relations as well. This should come as no surprise, since \<^cite>‹"Rushby92"› shows that this is the
general case already for interference relations expressing information flow security policies.

But the bar can be raised further. Considering the above program again, the information flows needed
for its operation, as listed above, need not be allowed throughout the program. Indeed, information
needs to flow from \lstinline{a} and \lstinline{b} to \lstinline{r} at line 3, from \lstinline{b} to
\lstinline{a} at line 4, from \lstinline{r} to \lstinline{b} at line 5, and then (implicitly) from
\lstinline{b} to the other two variables at line 6. Based on this observation, error detection at
compile time can be made finer-grained by rewriting the program as follows, where \lstinline{i} is a
further integer variable introduced for this purpose.

\null

\<^lstlisting>‹
do
{
    i = 0;
    r = a % b;
    i = 1;
    a = b;
    i = 2;
    b = r;
    i = 3;
} while (b);
›

\null

In this program, \lstinline{i} serves as a state variable whose value in every execution step can be
determined already at compile time. Since a distinct set of information flows is allowed for each of
its values, a finer-grained information flow correctness policy for this program can be expressed by
extending the concept of a single, \emph{stateless} interference relation applying throughout the
program to that of a \emph{stateful interference function} mapping program states to interference
relations (in this case, according to the value of \lstinline{i}). As a result of this extension,
for each program state, a distinct interference relation -- that is, the one to which the applied
interference function maps that state -- can be enforced at compile time by a suitable static type
system.

If mixfix notation @{text "s: u ↝ v"} denotes the inclusion of any pair of domains @{text "(u, v)"}
in the interference relation associated with any state @{text s}, a finer-grained information flow
correctness policy for this program can then be expressed as an interference function such that:

  ▪ @{text "s: dom a ↝ dom r"}, @{text "s: dom b ↝ dom r"} for any @{text s} where \lstinline{i}
$=0$ (line 4),

  ▪ @{text "s: dom b ↝ dom a"} for any @{text s} where \lstinline{i} $=1$ (line 6),

  ▪ @{text "s: dom r ↝ dom b"} for any @{text s} where \lstinline{i} $=2$ (line 8),

  ▪ @{text "s: dom b ↝ dom a"}, @{text "s: dom b ↝ dom r"}, @{text "s: dom b ↝ dom i"} for any
@{text s} where \lstinline{i} $=3$ (line 10),

and ruling out any other pair of distinct domains in any state.

Notably, to enforce such an interference function, a static type system would not need to keep track
of the full program state in every program execution step (which would be unfeasible, as the values
of \lstinline{a}, \lstinline{b}, and \lstinline{r} cannot be determined at compile time), but only
of the values of some specified state variables (in this case, of \lstinline{i} alone). Accordingly,
term \emph{state variable} will henceforth refer to any program variable whose value may affect that
of the interference function expressing the information flow correctness policy in force, namely the
interference relation to be applied.

Needless to say, there would be something artificial about the introduction of such a state variable
into the above sample program, since it is indeed so basic as not to provide for a state machine on
its own, so that \lstinline{i} would be aimed exclusively at enabling the enforcement of such an
information flow correctness policy. Yet, real-world imperative programs, for which error detection
at compile time is truly meaningful, \emph{do} typically provide for state machines such that only a
subset of all the potential information flows is allowed in each state; and even for those which do
not, the addition of some \emph{ad hoc} state variable to enforce such a policy could likely be an
acceptable trade-off.

Accordingly, the goal of this paper is to study information flow control via stateful intransitive
noninterference. First, the notion of termination-sensitive information flow security with respect
to a level-based interference relation, as defined in \<^cite>‹"Nipkow23-1"›, section 9.2.6, is
generalized to that of termination-sensitive information flow correctness with respect to a stateful
interference function having (generally) intransitive interference relations as values. Then, a
static type system is specified and is proven to be capable of enforcing such information flow
correctness policies. Finally, the information flow correctness notion and the static type system
introduced here are proven to degenerate to the counterparts addressed in \<^cite>‹"Nipkow23-1"›,
section 9.2.6, in case of a stateless level-based information flow correctness policy.

Although the operational semantics of the didactic imperative programming language IMP employed in
\<^cite>‹"Nipkow23-1"› is used for this purpose, the introduced concepts are applicable to larger,
real-world imperative programming languages as well, by just affording the additional type system
complexity arising from richer language constructs. Accordingly, the informal explanations
accompanying formal content in what follows will keep making use of sample C code snippets.

For further information about the formal definitions and proofs contained in this paper, see
Isabelle documentation, particularly \<^cite>‹"Paulson23"›, \<^cite>‹"Nipkow23-4"›, \<^cite>‹"Krauss23"›,
\<^cite>‹"Nipkow11"›, and \<^cite>‹"Ballarin23"›.
›


subsection "Global context definitions"

declare [[syntax_ambiguity_warning = false]]


datatype com_flow =
  Assign vname aexp  (‹_ ::= _› [1000, 61] 70) |
  Observe "vname set"  (‹⟨_⟩› [61] 70)

type_synonym flow = "com_flow list"
type_synonym config = "state set × vname set"
type_synonym scope = "config set × bool"


abbreviation eq_states :: "state ⇒ state ⇒ vname set ⇒ bool"
  (‹(_ = _ '(⊆ _'))› [51, 51] 50) where
"s = t (⊆ X) ≡ ∀x ∈ X. s x = t x"

abbreviation univ_states :: "state set ⇒ vname set ⇒ state set"
  (‹(Univ _ '(⊆ _'))› [51] 75) where
"Univ A (⊆ X) ≡ {s. ∃t ∈ A. s = t (⊆ X)}"

abbreviation univ_vars_if :: "state set ⇒ vname set ⇒ vname set"
  (‹(Univ?? _ _)› [51, 75] 75) where
"Univ?? A X ≡ if A = {} then UNIV else X"

abbreviation "tl2 xs ≡ tl (tl xs)"


fun run_flow :: "flow ⇒ state ⇒ state" where
"run_flow (x ::= a # cs) s = run_flow cs (s(x := aval a s))" |
"run_flow (_ # cs) s = run_flow cs s" |
"run_flow _ s = s"

primrec no_upd :: "flow ⇒ vname ⇒ bool" where
"no_upd (c # cs) x =
  ((case c of y ::= _ ⇒ y ≠ x | _ ⇒ True) ∧ no_upd cs x)" |
"no_upd [] _ = True"

primrec avars :: "aexp ⇒ vname set" where
"avars (N i) = {}" |
"avars (V x) = {x}" |
"avars (Plus a⇩₁ a⇩₂) = avars a⇩₁ ∪ avars a⇩₂"

primrec bvars :: "bexp ⇒ vname set" where
"bvars (Bc v) = {}" |
"bvars (Not b) = bvars b" |
"bvars (And b⇩₁ b⇩₂) = bvars b⇩₁ ∪ bvars b⇩₂" |
"bvars (Less a⇩₁ a⇩₂) = avars a⇩₁ ∪ avars a⇩₂"


fun flow_aux :: "com list ⇒ flow" where
"flow_aux ((x ::= a) # cs) = (x ::= a) # flow_aux cs" |
"flow_aux ((IF b THEN _ ELSE _) # cs) = ⟨bvars b⟩ # flow_aux cs" |
"flow_aux ((c;; _) # cs) = flow_aux (c # cs)" |
"flow_aux (_ # cs) = flow_aux cs" |
"flow_aux [] = []"

definition flow :: "(com × state) list ⇒ flow" where
"flow cfs = flow_aux (map fst cfs)"


function small_stepsl ::
 "com × state ⇒ (com × state) list ⇒ com × state ⇒ bool"
  (‹(_ →*'{_'} _)› [51, 51] 55)
where
"cf →*{[]} cf' = (cf = cf')" |
"cf →*{cfs @ [cf']} cf'' = (cf →*{cfs} cf' ∧ cf' → cf'')"

by (atomize_elim, auto intro: rev_cases)
termination by lexicographic_order

lemmas small_stepsl_induct = small_stepsl.induct [split_format(complete)]


subsection "Local context definitions"

text ‹
In what follows, stateful intransitive noninterference will be formalized within the local context
defined by means of a \emph{locale} \<^cite>‹"Ballarin23"›, named @{text noninterf}. Later on, this
will enable to prove the degeneracy of the following definitions to the stateless level-based
counterparts addressed in \<^cite>‹"Volpano96"›, \<^cite>‹"Volpano97"›, and \<^cite>‹"Nipkow23-1"›, and
formalized in \<^cite>‹"Nipkow23-2"› and \<^cite>‹"Nipkow23-3"›, via a suitable locale interpretation.

Locale @{text noninterf} contains three parameters, as follows.

  ▪ A stateful interference function @{text interf} mapping program states to \emph{interference
    predicates} of two domains, intended to be true just in case the former domain is allowed to
    interfere with the latter.

  ▪ A function @{text dom} mapping program variables to their respective domains.

  ▪ A set @{text state} collecting all state variables.

As the type of the domains is modeled using a type variable, it may be assigned arbitrarily by any
locale interpretation, which will enable to set it to @{typ nat} upon proving degeneracy. Moreover,
the above mixfix notation @{text "s: u ↝ v"} is adopted to express the fact that any two domains
@{text u}, @{text v} satisfy the interference predicate @{text "interf s"} associated with any state
@{text s}, namely the fact that @{text u} is allowed to interfere with @{text v} in state @{text s}.

Locale @{text noninterf} also contains an assumption, named @{text interf_state}, which serves the
purpose of supplying parameter @{text state} with its intended semantics, namely standing for the
set of all state variables. The assumption is that function @{text interf} maps any two program
states agreeing on the values of all the variables in set @{text state} to the same interference
predicate. Correspondingly, any locale interpretation instantiating parameter @{text state} as the
empty set must instantiate parameter @{text interf} as a function mapping any two program states,
even if differing in the values of all variables, to the same interference predicate -- namely, as a
constant function. Hence, any such locale interpretation refers to a single, stateless interference
predicate applying throughout the program. Unsurprisingly, this is the way how those parameters will
be instantiated upon proving degeneracy.

The one just mentioned is the only locale assumption. Particularly, the following formalization does
not rely upon the assumption that the interference predicates returned by function @{text interf} be
\emph{reflexive}, although this will be the case for any meaningful real-world information flow
correctness policy.

\null
›

locale noninterf =
  fixes
    interf :: "state ⇒ 'd ⇒ 'd ⇒ bool"
      (‹(_: _ ↝ _)› [51, 51, 51] 50) and
    dom :: "vname ⇒ 'd" and
    state :: "vname set"
  assumes
    interf_state: "s = t (⊆ state) ⟹ interf s = interf t"


context noninterf
begin

text ‹
\null

Locale parameters @{term interf} and @{term dom} are provided with their intended semantics by the
definitions of functions @{text sources} and @{text correct}, which are formalized here below based
on the following underlying ideas.

As long as a stateless transitive interference relation between domains is considered, the condition
for the correctness of the value of a variable resulting from a full or partial program execution
need not take into account the execution flow producing it, but rather the initial program state
only. In fact, this is what happens with the stateless level-based correctness condition addressed
in \<^cite>‹"Volpano96"›, \<^cite>‹"Volpano97"›, and \<^cite>‹"Nipkow23-1"›: the resulting value of a variable
of level \emph{l} is correct if the same value is produced for any initial state agreeing with the
given one on the value of every variable of level not higher than \emph{l}.

Things are so simple because, for any variables \lstinline{x}, \lstinline{y}, and \lstinline{z}, if
@{text "dom z ↝ dom y"} and @{text "dom y ↝ dom x"}, transitivity entails @{text "dom z ↝ dom x"},
and these interference relationships hold statelessly. Therefore, \lstinline{z} may be counted among
the variables whose initial values are allowed to affect \lstinline{x} independently of whether some
intermediate value of \lstinline{y} may affect \lstinline{x} within the actual execution flow.

Unfortunately, switching to stateful intransitive interference relations puts an end to that happy
circumstance -- indeed, even statefulness or intransitivity alone would suffice for this sad ending.
In this context, deciding about the correctness of the resulting value of a variable \lstinline{x}
still demands the detection of the variables whose initial values are allowed to interfere with
\lstinline{x}, but the execution flow leading from the initial program state to the resulting one
needs to be considered to perform such detection.

This is precisely the task of function @{text sources}, so named after its finite state machine
counterpart defined in \<^cite>‹Rushby92›. It takes as inputs an execution flow @{term cs}, an initial
program state @{term s}, and a variable \lstinline{x}, and outputs the set of the variables whose
values in @{term s} are allowed to affect the value of \lstinline{x} in the state @{term s'} into
which @{term cs} turns @{term s}, according to @{term cs} as well as to the information flow
correctness policy expressed by parameters @{term interf} and @{term dom}.

In more detail, execution flows are modeled as lists comprising items of two possible kinds, namely
an assignment of the value of an arithmetic expression @{term a} to a variable \lstinline{z} or else
an \emph{observation} of the values of the variables in a set @{term X}, denoted through notations
@{term "z ::= a :: com_flow"} (same as with assignment commands) and @{term "⟨X⟩"} and keeping track
of explicit and implicit information flows, respectively. Particularly, item @{term "⟨X⟩"} refers to
the act of observing the values of the variables in @{term X} leaving the program state unaltered.
During the execution of an IMP program, this happens upon any evaluation of a boolean expression
containing all and only the variables in @{term X}.

Function @{text sources} is defined along with an auxiliary function @{text sources_aux} by means of
mutual recursion. Based on this definition, @{term "sources cs s x"} contains a variable @{term y}
if there exist a descending sequence of left sublists @{text "cs⇩_n⇩₊⇩₁"}, @{term "cs⇩_n @ [c⇩_n]"}, ...,
@{term "cs⇩₁ @ [c⇩₁]"} of @{term cs} and a sequence of variables @{text "y⇩_n⇩₊⇩₁"}, ..., @{term y⇩₁}, where
$n \ge 1$, @{text "cs⇩_n⇩₊⇩₁ = cs"}, @{text "y⇩_n⇩₊⇩₁ = x"}, and @{prop "y⇩₁ = y"}, satisfying the following
conditions.

  ▪ For each positive integer $i \le n$, @{term c⇩_i} is an assignment @{text "y⇩_i⇩₊⇩₁ ::= a⇩_i"} where:

    ▪ @{prop "y⇩_i ∈ avars a⇩_i"},

    ▪ @{text "run_flow cs⇩_i s: dom y⇩_i ↝ dom y⇩_i⇩₊⇩₁"}, and

    ▪ the right sublist of @{text "cs⇩_i⇩₊⇩₁"} complementary to @{term "cs⇩_i @ [c⇩_i]"} does not comprise
any assignment to variable @{text "y⇩_i⇩₊⇩₁"} (as assignment @{term c⇩_i} would otherwise be irrelevant),

  or else an observation @{term "⟨X⇩_i⟩"} where:

    ▪ @{prop "y⇩_i ∈ X⇩_i"} and

    ▪ @{text "run_flow cs⇩_i s: dom y⇩_i ↝ dom y⇩_i⇩₊⇩₁"}.

  ▪ @{term cs⇩₁} does not comprise any assignment to variable @{term y}.

In addition, @{term "sources cs s x"} contains variable @{term x} also if @{term cs} does not
comprise any assignment to variable @{term x}.

\null
›

function
  sources :: "flow ⇒ state ⇒ vname ⇒ vname set" and
  sources_aux :: "flow ⇒ state ⇒ vname ⇒ vname set" where

"sources (cs @ [c]) s x = (case c of
  z ::= a ⇒ if z = x
    then sources_aux cs s x ∪ ⋃ {sources cs s y | y.
      run_flow cs s: dom y ↝ dom x ∧ y ∈ avars a}
    else sources cs s x |
  ⟨X⟩ ⇒
    sources cs s x ∪ ⋃ {sources cs s y | y.
      run_flow cs s: dom y ↝ dom x ∧ y ∈ X})" |

"sources [] _ x = {x}" |

"sources_aux (cs @ [c]) s x = (case c of
  _ ::= _ ⇒
    sources_aux cs s x |
  ⟨X⟩ ⇒
    sources_aux cs s x ∪ ⋃ {sources cs s y | y.
      run_flow cs s: dom y ↝ dom x ∧ y ∈ X})" |

"sources_aux [] _ _ = {}"

proof (atomize_elim)
  fix a :: "flow × state × vname + flow × state × vname"
  {
    assume
     "∀cs c s x. a ≠ Inl (cs @ [c], s, x)" and
     "∀s x. a ≠ Inl ([], s, x)" and
     "∀s x. a ≠ Inr ([], s, x)"
    hence "∃cs c s x. a = Inr (cs @ [c], s, x)"
      by (metis obj_sumE prod_cases3 rev_exhaust)
  }
  thus
   "(∃cs c s x. a = Inl (cs @ [c], s, x)) ∨
    (∃s x. a = Inl ([], s, x)) ∨
    (∃cs c s x. a = Inr (cs @ [c], s, x)) ∨
    (∃s x. a = Inr ([], s, x))"
    by blast
qed auto

termination by lexicographic_order

lemmas sources_induct = sources_sources_aux.induct

text ‹
\null

Predicate @{text correct} takes as inputs a program @{term c}, a set of program states @{term A},
and a set of variables @{term X}. Its truth value equals that of the following termination-sensitive
information flow correctness condition: for any state @{term s} agreeing with a state in @{term A}
on the values of the state variables in @{term X}, if the \emph{small-step} program semantics turns
configuration @{term "(c, s)"} into configuration @{term "(c⇩₁, s⇩₁)"}, and @{term "(c⇩₁, s⇩₁)"} into
configuration @{term "(c⇩₂, s⇩₂)"}, then for any state @{term t⇩₁} agreeing with @{term s⇩₁} on the
values of the variables in @{term "sources cs s⇩₁ x"}, where @{term cs} is the execution flow leading
from @{term "(c⇩₁, s⇩₁)"} to @{term "(c⇩₂, s⇩₂)"}, the small-step semantics turns @{term "(c⇩₁, t⇩₁)"} into
some configuration @{term "(c⇩₂', t⇩₂)"} such that:

  ▪ @{prop "c⇩₂' = SKIP"} (namely, @{term "(c⇩₂', t⇩₂)"} is a \emph{final} configuration) just in case
@{prop "c⇩₂ = SKIP"}, and

  ▪ the value of variable \lstinline{x} in state @{term t⇩₂} is the same as in state @{term s⇩₂}.

Here below are some comments about this definition.

  ▪ As @{term "sources cs s⇩₁ x"} is the set of the variables whose values in @{term s⇩₁} are allowed
to affect the value of \lstinline{x} in @{term s⇩₂}, this definition requires any state @{term t⇩₁}
indistinguishable from @{term s⇩₁} in the values of those variables to produce a state where variable
\lstinline{x} has the same value as in @{term s⇩₂} in the continuation of program execution.

  ▪ Configuration @{term "(c⇩₂', t⇩₂)"} must be the same one for \emph{any} variable \lstinline{x}
such that @{term s⇩₁} and @{term t⇩₁} agree on the values of any variable in @{term "sources cs s⇩₁ x"}.
Otherwise, even if states @{term s⇩₂} and @{term t⇩₂} agreed on the value of \lstinline{x}, they could
be distinguished all the same based on a discrepancy between the respective values of some other
variable. Likewise, if state @{term t⇩₂} alone had to be the same for any such \lstinline{x}, while
command @{term c⇩₂'} were allowed to vary, state @{term t⇩₁} could be distinguished from @{term s⇩₁}
based on the continuation of program execution. This is the reason why the universal quantification
over @{term x} is nested within the existential quantification over both @{term c⇩₂'} and @{term t⇩₂}.

  ▪ The state machine for a program typically provides for a set of initial states from which its
execution is intended to start. In any such case, information flow correctness need not be assessed
for arbitrary initial states, but just for those complying with the settled tuples of initial values
for state variables. The values of any other variables do not matter, as they do not affect function
@{term interf}'s ones. This is the motivation for parameter @{term A}, which then needs to contain
just one state for each of such tuples, while parameter @{term X} enables to exclude the state
variables, if any, whose initial values are not settled.

  ▪ If locale parameter @{term state} matches the empty set, @{term s} will be any state agreeing
with some state in @{term A} on the value of possibly even no variable at all, that is, a fully
arbitrary state provided that @{term A} is nonempty. This makes @{term s} range over all possible
states, as required for establishing the degeneracy of the present definition to the stateless
level-based counterpart addressed in \<^cite>‹"Nipkow23-1"›, section 9.2.6.

Why express information flow correctness in terms of the small-step program semantics, instead of
resorting to the big-step one as happens with the stateless level-based correctness condition in
\<^cite>‹"Nipkow23-1"›, section 9.2.6? The answer is provided by the following sample C programs, where
\lstinline{i} is a state variable.

\null

\<^lstlisting>‹
y = i;
i = (i) ? 1 : 0;
x = i + y;
›

\null

\<^lstlisting>‹
x = 0;
if (i == 10)
{
  x = 10;
}
i = (i) ? 1 : 0;
x += i;
›

\null

Let \lstinline{i} be allowed to interfere with \lstinline{x} just in case \lstinline{i} matches 0 or
1, and \lstinline{y} be never allowed to do so. If @{term s⇩₁} were constrained to be the initial
state, for both programs \lstinline{i} would be included among the variables on which @{term t⇩₁}
needs to agree with @{term s⇩₁} in order to be indistinguishable from @{term s⇩₁} in the value of
\lstinline{x} resulting from the final assignment. Thus, both programs would fail to be labeled as
wrong ones, although in both of them the information flow blatantly bypasses the sanitization of the
initial value of \lstinline{i}, respectively due to an illegal explicit flow and an illegal implicit
flow. On the contrary, the present information flow correctness definition detects any such illegal
information flow by checking every partial program execution on its own.

\null
›

abbreviation ok_flow :: "com ⇒ com ⇒ state ⇒ state ⇒ flow ⇒ bool" where
"ok_flow c⇩₁ c⇩₂ s⇩₁ s⇩₂ cs ≡
  ∀t⇩₁. ∃c⇩₂' t⇩₂. ∀x.
    s⇩₁ = t⇩₁ (⊆ sources cs s⇩₁ x) ⟶
      (c⇩₁, t⇩₁) →* (c⇩₂', t⇩₂) ∧ (c⇩₂ = SKIP) = (c⇩₂' = SKIP) ∧ s⇩₂ x = t⇩₂ x"

definition correct :: "com ⇒ state set ⇒ vname set ⇒ bool" where
"correct c A X ≡
  ∀s ∈ Univ A (⊆ state ∩ X). ∀c⇩₁ c⇩₂ s⇩₁ s⇩₂ cfs.
    (c, s) →* (c⇩₁, s⇩₁) ∧ (c⇩₁, s⇩₁) →*{cfs} (c⇩₂, s⇩₂) ⟶
      ok_flow c⇩₁ c⇩₂ s⇩₁ s⇩₂ (flow cfs)"


abbreviation interf_set :: "state set ⇒ 'd set ⇒ 'd set ⇒ bool"
  (‹(_: _ ↝ _)› [51, 51, 51] 50) where
"A: U ↝ W ≡ ∀s ∈ A. ∀u ∈ U. ∀w ∈ W. s: u ↝ w"

abbreviation ok_flow_aux ::
 "config set ⇒ com ⇒ com ⇒ state ⇒ state ⇒ flow ⇒ bool" where
"ok_flow_aux U c⇩₁ c⇩₂ s⇩₁ s⇩₂ cs ≡
  (∀t⇩₁. ∃c⇩₂' t⇩₂. ∀x.
    (s⇩₁ = t⇩₁ (⊆ sources_aux cs s⇩₁ x) ⟶
      (c⇩₁, t⇩₁) →* (c⇩₂', t⇩₂) ∧ (c⇩₂ = SKIP) = (c⇩₂' = SKIP)) ∧
    (s⇩₁ = t⇩₁ (⊆ sources cs s⇩₁ x) ⟶ s⇩₂ x = t⇩₂ x)) ∧
  (∀x. (∃p ∈ U. case p of (B, Y) ⇒
    ∃s ∈ B. ∃y ∈ Y. ¬ s: dom y ↝ dom x) ⟶ no_upd cs x)"

text ‹
\null

The next step is defining a static type system guaranteeing that well-typed programs satisfy this
information flow correctness criterion. Whenever defining a function, and the pursued type system is
obviously no exception, the primary question that one has to answer is: which inputs and outputs
should it provide for? The type system formalized in \<^cite>‹"Nipkow23-3"› simply makes a pass/fail
decision on an input program, based on an input security level, and outputs the verdict as a boolean
value. Is this still enough in the present case? The answer can be found by considering again the
above C program that computes the greatest common divisor of two positive integers \lstinline{a},
\lstinline{b} using a state variable \lstinline{i}, along with its associated stateful interference
function. For the reader's convenience, the program is reported here below.

\null

\<^lstlisting>‹
do
{
    i = 0;
    r = a % b;
    i = 1;
    a = b;
    i = 2;
    b = r;
    i = 3;
} while (b);
›

\null

As @{prop "s: dom a ↝ dom r"} only for a state @{term s} where \lstinline{i} $=0$, the type system
cannot determine that the assignment \lstinline{r = a % b} at line 4 is well-typed without knowing
that \lstinline{i} $=0$ whenever that step is executed. Consequently, upon checking the assignment
\lstinline{i = 0} at line 3, the type system must output information indicating that \lstinline{i}
$=0$ as a result of its execution. This information will then be input to the type system when it is
recursively invoked to check line 4, so as to enable the well-typedness of the next assignment to be
ascertained.

Therefore, in addition to the program under scrutiny, the type system needs to take a set of program
states as input, and as long as the program is well-typed, the output must include a set of states
covering any change to the values of the state variables possibly triggered by the input program. In
other words, the type system has to \emph{simulate} the execution of the input program at compile
time as regards the values of its state variables. In the following formalization, this results in
making the type system take an input of type @{typ "state set"} and output a value of the same type.
Yet, since state variables alone are relevant, a real-world implementation of the type system would
not need to work with full @{typ state} values, but just with tuples of state variables' values.

Is the input/output of a set of program states sufficient to keep track of the possible values of
the state variables at each execution step? Here below is a sample C program helping find an answer,
which determines the minimum of two integers \lstinline{a}, \lstinline{b} and assigns it to variable
\lstinline{a} using a state variable \lstinline{i}.

\null

\<^lstlisting>‹
i = (a > b) ? 1 : 0;
if (i > 0)
{
  a = b;
}
›

\null

Assuming that the initial value of \lstinline{i} is 0, the information flow correctness policy for
this program will be such that:

  ▪ @{prop "s: dom a ↝ dom i"}, @{prop "s: dom b ↝ dom i"} for any program state @{text s} where
\lstinline{i} $=0$ (line 1),

  ▪ @{prop "s: dom i ↝ dom a"} for any @{text s} where \lstinline{i} $=0$ or \lstinline{i} $=1$
(line 2, more on this later),

  ▪ @{prop "s: dom b ↝ dom a"} for any @{text s} where \lstinline{i} $=1$ (line 4),

ruling out any other pair of distinct domains in any state.

So far, everything has gone smoothly. However, what happens if the program is changed as follows?

\null

\<^lstlisting>‹
i = a - b;
if (i > 0)
{
  a = b;
}
›

\null

Upon simulating the execution of the former program, the type system can determine the set
$\{0, 1\}$ of the possible values of variable \lstinline{i} arising from the conditional assignment
\lstinline{i = (a > b) ? 1 : 0} at line 1. On the contrary, in the case of the latter program, the
possible values of \lstinline{i} after the assignment \lstinline{i = a - b} at line 1 must be marked
as being \emph{indeterminate}, since they depend on the initial values of variables \lstinline{a}
and \lstinline{b}, which are unknown at compile time. Hence, the type system needs to provide for an
additional input/output parameter of type @{typ "vname set"}, whose input and output values shall
collect the variables whose possible values before and after the execution of the input program are
\emph{determinate}.

The correctness of the simulation of program execution by the type system can be expressed as the
following condition. Suppose that the type system outputs a @{typ "state set"} @{term A'} and a
@{typ "vname set"} @{term X'} when it is input a program @{term c}, a @{typ "state set"} @{term A},
and a @{typ "vname set"} @{term X}. Then, for any state @{term s} agreeing with some state in
@{term A} on the value of every state variable in @{term X}, if @{prop "(c, s) ⇒ s'"}, @{term s'}
must agree with some state in @{term A'} on the value of every state variable in @{term X'}. This
can be summarized by saying that the type system must \emph{overapproximate} program semantics,
since any algorithm simulating program execution cannot but be imprecise (see \<^cite>‹"Nipkow23-1"›,
\emph{incipit} of chapter 13).

In turn, if the outputs for @{term c}, @{term A'}, @{term X'} are @{term A''}, @{term X''} and
@{prop "(c, s') ⇒ s''"}, @{term s''} must agree with some state in @{term A''} on the value of
every state variable in @{term X''}. But if @{term c} is a loop and @{prop "(c, s) ⇒ s'"}, then
@{prop "(c, s') ⇒ s''"} just in case @{prop "s' = s''"}, so that the type system is guaranteed to
overapproximate the semantics of @{term c} only if states consistent with @{term A'}, @{term X'} are
also consistent with @{term "A''"}, @{term X''} and vice versa. Thus, the type system needs to be
\emph{idempotent} if @{term c} is a loop, that is, it must be such that @{prop "A' = A''"} and
@{prop "X' = X''"} in this case. Since idempotence is not required for control structures other than
loops, the main type system @{text ctyping2} formalized in what follows will delegate the simulation
of the execution of loop bodies to an auxiliary, idempotent type system @{text ctyping1}.

This type system keeps track of the program state updates possibly occurring in its input program
using sets of lists of functions of type @{typ "vname ⇒ val option option"}. Command @{const SKIP}
is mapped to a singleton made of the empty list, as no state update takes place. An assignment to a
variable \lstinline{x} is mapped to a singleton made of a list comprising a single function, whose
value is @{term "Some (Some i)"} or @{term "Some None"} for \lstinline{x} if it is a state variable
and the right-hand side is a constant @{term "N i"} or a non-constant expression, respectively, and
@{const None} otherwise. That is, @{const None} stands for \emph{unchanged/non-state variable}
(remember, only state variable updates need to be tracked), whereas @{term "Some None"} stands for
\emph{indeterminate variable}, since the value of a non-constant expression in a loop iteration
(remember, @{text ctyping1} is meant for simulating the execution of loop bodies) is in general
unknown at compile time.

At first glance, a conditional statement could simply be mapped to the union of the sets tracking
the program state updates possibly occurring in its branches. However, things are not so simple, as
shown by the sample C loop here below, which has a conditional statement as its body.

\null

\<^lstlisting>‹
for (i = 0; i < 2; i++)
{
  if (n % 2)
  {
    a = 1;
    b = 1;
    n++;
  }
  else
  {
    a = 2;
    c = 2;
    n++;
  }
}
›

\null

If the initial value of the integer variable \lstinline{n} is even, the final values of variables
\lstinline{a}, \lstinline{b}, and \lstinline{c} will be 1, 1, 2, whereas if the initial value of
\lstinline{n} is odd, the final values of the aforesaid variables will be 2, 1, 2. Assuming that
their initial value is 0, the potential final values tracked by considering each branch individually
are 1, 1, 0 and 2, 0, 2 instead. These are exactly the values generated by a single loop iteration;
if they are fed back into the loop body along with the increased value of \lstinline{n}, the actual
final values listed above are produced.

As a result, a mere union of the sets tracking the program state updates possibly occurring in each
branch would not be enough for the type system to be idempotent. The solution is to rather construct
every possible alternate concatenation without repetitions of the lists contained in each set, which
is referred to as \emph{merging} those sets in the following formalization. In fact, alternating the
state updates performed by each branch in the previous example produces the actual final values
listed above. Since the latest occurrence of a state update makes any previous occurrence irrelevant
for the final state, repetitions need not be taken into account, which ensures the finiteness of the
construction provided that the sets being merged are finite. In the special case where the boolean
condition can be evaluated at compile time, considering the picked branch alone is of course enough.

Another case trickier than what one could expect at first glance is that of sequential composition.
This is shown by the sample C loop here below, whose body consists of the sequential composition of
some assignments with a conditional statement.

\null

\<^lstlisting>‹
for (i = 0; i < 2; i++)
{
  a = 1;
  b = 1;
  if (n % 2)
  {
    a = 2;
    c = 2;
    n++;
  }
  else
  {
    b = 3;
    d = 3;
    n++;
  }
}
›

\null

If the initial value of the integer variable \lstinline{n} is even, the final values of variables
\lstinline{a}, \lstinline{b}, \lstinline{c}, and \lstinline{d} will be 2, 1, 2, 3, whereas if the
initial value of \lstinline{n} is odd, the final values of the aforesaid variables will be 1, 3, 2,
3. Assuming that their initial value is 0, the potential final values tracked by considering the
sequences of the state updates triggered by the starting assignments with the updates, simulated as
described above, possibly triggered by the conditional statement rather are:

  ▪ 2, 1, 2, 0,

  ▪ 1, 3, 0, 3,

  ▪ 2, 3, 2, 3.

The first two tuples of values match the ones generated by a single loop iteration, and produce the
actual final values listed above if they are fed back into the loop body along with the increased
value of \lstinline{n}.

Hence, concatenating the lists tracking the state updates possibly triggered by the first command in
the sequence (the starting assignment sequence in the previous example) with the lists tracking the
updates possibly triggered by the second command in the sequence (the conditional statement in the
previous example) would not suffice for the type system to be idempotent. The solution is to rather
append the latter lists to those constructed by \emph{merging} the sets tracking the state updates
possibly performed by each command in the sequence. Again, provided that such sets are finite, this
construction is finite, too. In the special case where the latter set is a singleton, the aforesaid
merging is unnecessary, as it would merely insert a preceding occurrence of the single appended list
into the resulting concatenated lists, and such repetitions are irrelevant as observed above.

Surprisingly enough, the case of loops is actually simpler than possible first-glance expectations.
A loop defines two branches, namely its body and an implicit alternative branch doing nothing. Thus,
it can simply be mapped to the union of the set tracking the state updates possibly occurring in its
body with a singleton made of the empty list. As happens with conditional statements, in the special
case where the boolean condition can be evaluated at compile time, considering the selected branch
alone is obviously enough.

Type system @{text ctyping1} uses the set of lists resulting from this recursion over the input
command to construct a set @{text F} of functions of type @{typ "vname ⇒ val option option"}, as
follows: for each list @{text ys} in the former set, @{text F} contains the function mapping any
variable \lstinline{x} to the rightmost occurrence, if any, of pattern @{term "Some v"} to which
\lstinline{x} is mapped by any function in @{text ys} (that is, to the latest update, if any, of
\lstinline{x} tracked in @{text ys}), or else to @{const None}. Then, if @{text A}, @{text X} are
the input @{typ "state set"} and @{typ "vname set"}, and @{text B}, @{text Y} the output ones:

  ▪ @{text B} is the set of the program states constructed by picking a function @{text f} and a
state @{text s} from @{text F} and @{text A}, respectively, and mapping any variable \lstinline{x}
to @{text i} if @{prop "f x = Some (Some i)"}, or else to @{term "s x"} if @{prop "f x = None"}
(namely, to its value in the initial state @{text s} if @{text f} marks it as being unchanged).

  ▪ @{text Y} is @{const UNIV} if @{prop "A = {}"} (more on this later), or else the set of the
variables not mapped to @{term "Some None"} (that is, not marked as being indeterminate) by any
function in @{text F}, and contained in @{text X} (namely, being initially determinate) if mapped to
@{const None} (that is, if marked as being unchanged) by some function in @{text F}.

When can @{text ctyping1} evaluate the boolean condition of a conditional statement or a loop, so as
to possibly detect and discard some ``dead'' branch? This question can be answered by examining the
following sample C loop, where \lstinline{n} is a state variable, while integer \lstinline{j} is
unknown at compile time.

\null

\<^lstlisting>‹
for (i = 0; i != j; i++)
{
  if (n == 1)
  {
    n = 2;
  }
  else if (n == 0)
  {
    n = 1;
  }
}
›

\null

Assuming that the initial value of \lstinline{n} is 0, its final value will be 0, 1, or 2 according
to whether \lstinline{j} matches 0, 1, or any other positive integer, respectively, whereas the loop
will not even terminate if \lstinline{j} is negative. Consequently, the type system cannot avoid
tracking the state updates possibly triggered in every branch, on pain of failing to be idempotent.
As a result, evaluating the boolean conditions in the conditional statement at compile time so as to
discard some branch is not possible, even though they only depend on an initially determinate state
variable. The conclusion is that @{text ctyping1} may generally evaluate boolean conditions just in
case they contain constants alone, namely only if they are trivial enough to be possibly eliminated
by program optimization. This is exactly what @{text ctyping1} does by passing any boolean condition
found in the input program to the type system @{text btyping1} for boolean expressions, defined here
below as well.

\null
›

primrec btyping1 :: "bexp ⇒ bool option" (‹(⊢ _)› [51] 55) where

"⊢ Bc v = Some v" |

"⊢ Not b = (case ⊢ b of
  Some v ⇒ Some (¬ v) | _ ⇒ None)" |

"⊢ And b⇩₁ b⇩₂ = (case (⊢ b⇩₁, ⊢ b⇩₂) of
  (Some v⇩₁, Some v⇩₂) ⇒ Some (v⇩₁ ∧ v⇩₂) | _ ⇒ None)" |

"⊢ Less a⇩₁ a⇩₂ = (if avars a⇩₁ ∪ avars a⇩₂ = {}
  then Some (aval a⇩₁ (λx. 0) < aval a⇩₂ (λx. 0)) else None)"


type_synonym state_upd = "vname ⇒ val option option"

inductive_set ctyping1_merge_aux :: "state_upd list set ⇒
  state_upd list set ⇒ (state_upd list × bool) list set"
  (infix ‹⨆› 55) for A and B where

"xs ∈ A ⟹ [(xs, True)] ∈ A ⨆ B" |

"ys ∈ B ⟹ [(ys, False)] ∈ A ⨆ B" |

"⟦ws ∈ A ⨆ B; ¬ snd (last ws); xs ∈ A; (xs, True) ∉ set ws⟧ ⟹
   ws @ [(xs, True)] ∈ A ⨆ B" |

"⟦ws ∈ A ⨆ B; snd (last ws); ys ∈ B; (ys, False) ∉ set ws⟧ ⟹
   ws @ [(ys, False)] ∈ A ⨆ B"

declare ctyping1_merge_aux.intros [intro]

definition ctyping1_append ::
 "state_upd list set ⇒ state_upd list set ⇒ state_upd list set"
  (infixl ‹@› 55) where
"A @ B ≡ {xs @ ys | xs ys. xs ∈ A ∧ ys ∈ B}"

definition ctyping1_merge ::
 "state_upd list set ⇒ state_upd list set ⇒ state_upd list set"
  (infixl ‹⊔› 55) where
"A ⊔ B ≡ {concat (map fst ws) | ws. ws ∈ A ⨆ B}"

definition ctyping1_merge_append ::
 "state_upd list set ⇒ state_upd list set ⇒ state_upd list set"
  (infixl ‹⊔⇩_@› 55) where
"A ⊔⇩_@ B ≡ (if card B = Suc 0 then A else A ⊔ B) @ B"


primrec ctyping1_aux :: "com ⇒ state_upd list set"
  (‹(⊢ _)› [51] 60) where

"⊢ SKIP = {[]}" |

"⊢ y ::= a = {[λx. if x = y ∧ y ∈ state
  then if avars a = {} then Some (Some (aval a (λx. 0))) else Some None
  else None]}" |

"⊢ c⇩₁;; c⇩₂ = ⊢ c⇩₁ ⊔⇩_@ ⊢ c⇩₂" |

"⊢ IF b THEN c⇩₁ ELSE c⇩₂ = (let f = ⊢ b in
  (if f ∈ {Some True, None} then ⊢ c⇩₁ else {}) ⊔
  (if f ∈ {Some False, None} then ⊢ c⇩₂ else {}))" |

"⊢ WHILE b DO c = (let f = ⊢ b in
  (if f ∈ {Some False, None} then {[]} else {}) ∪
  (if f ∈ {Some True, None} then ⊢ c else {}))"

definition ctyping1_seq :: "state_upd ⇒ state_upd ⇒ state_upd"
  (infixl ‹;;› 55) where
"S;; T ≡ λx. case T x of None ⇒ S x | Some v ⇒ Some v"

definition ctyping1 :: "com ⇒ state set ⇒ vname set ⇒ config"
  (‹(⊢ _ '(⊆ _, _'))› [51] 55) where
"⊢ c (⊆ A, X) ≡ let F = {λx. foldl (;;) (λx. None) ys x | ys. ys ∈ ⊢ c} in
  ({λx. case f x of None ⇒ s x | Some None ⇒ t x | Some (Some i) ⇒ i |
     f s t. f ∈ F ∧ s ∈ A},
   Univ?? A {x. ∀f ∈ F. f x ≠ Some None ∧ (f x = None ⟶ x ∈ X)})"

text ‹
\null

A further building block propaedeutic to the definition of the main type system @{text ctyping2} is
the definition of its own companion type system @{text btyping2} for boolean expressions. The goal
of @{text btyping2} is splitting, whenever feasible at compile time, an input @{typ "state set"}
into two complementary subsets, respectively comprising the program states making the input boolean
expression true or false. This enables @{text ctyping2} to apply its information flow correctness
checks to conditional branches by considering only the program states in which those branches are
executed.

As opposed to @{const btyping1}, @{text btyping2} may evaluate its input boolean expression even if
it contains variables, provided that all of their values are known at compile time, namely that all
of them are determinate state variables -- hence @{text btyping2}, like @{text ctyping2}, needs to
take a @{typ "vname set"} collecting determinate variables as an additional input. In fact, in the
case of a loop body, the dirty work of covering any nested branch by skipping the evaluation of
non-constant boolean conditions is already done by @{const ctyping1}, so that any @{typ "state set"}
and @{typ "vname set"} input to @{text btyping2} already encompass every possible execution flow.

\null
›

primrec btyping2_aux :: "bexp ⇒ state set ⇒ vname set ⇒ state set option"
  (‹(⊫ _ '(⊆ _, _'))› [51] 55) where

"⊫ Bc v (⊆ A, _) = Some (if v then A else {})" |

"⊫ Not b (⊆ A, X) = (case ⊫ b (⊆ A, X) of
  Some B ⇒ Some (A - B) | _ ⇒ None)" |

"⊫ And b⇩₁ b⇩₂ (⊆ A, X) = (case (⊫ b⇩₁ (⊆ A, X), ⊫ b⇩₂ (⊆ A, X)) of
  (Some B⇩₁, Some B⇩₂) ⇒ Some (B⇩₁ ∩ B⇩₂) | _ ⇒ None)" |

"⊫ Less a⇩₁ a⇩₂ (⊆ A, X) = (if avars a⇩₁ ∪ avars a⇩₂ ⊆ state ∩ X
  then Some {s. s ∈ A ∧ aval a⇩₁ s < aval a⇩₂ s} else None)"

definition btyping2 :: "bexp ⇒ state set ⇒ vname set ⇒
  state set × state set"
  (‹(⊨ _ '(⊆ _, _'))› [51] 55) where
"⊨ b (⊆ A, X) ≡ case ⊫ b (⊆ A, X) of
  Some A' ⇒ (A', A - A') | _ ⇒ (A, A)"

text ‹
\null

It is eventually time to define the main type system @{text ctyping2}. Its output consists of the
@{typ "state set"} of the final program states and the @{typ "vname set"} of the finally determinate
variables produced by simulating the execution of the input program, based on the @{typ "state set"}
of initial program states and the @{typ "vname set"} of initially determinate variables taken as
inputs, if information flow correctness checks are passed; otherwise, the output is @{const None}.

An additional input is the counterpart of the level input to the security type systems formalized in
\<^cite>‹"Nipkow23-3"›, in that it specifies the \emph{scope} in which information flow correctness is
validated. It consists of a set of @{typ "state set × vname set"} pairs and a boolean flag. The set
keeps track of the variables contained in the boolean conditions, if any, nesting the input program,
in association with the program states in which they are evaluated. The flag is @{const False} if
the input program is nested in a loop, in which case state variables set to non-constant expressions
are marked as being indeterminate (as observed previously, the value of a non-constant expression in
a loop iteration is in general unknown at compile time).

In the recursive definition of @{text ctyping2}, the equations dealing with conditional branches,
namely those applying to conditional statements and loops, construct the output @{typ "state set"}
and @{typ "vname set"} respectively as the \emph{union} and the \emph{intersection} of the sets
computed for each branch. In fact, a possible final state is any one resulting from either branch,
and a variable is finally determinate just in case it is such regardless of the branch being picked.
Yet, a ``dead'' branch should have no impact on the determinateness of variables, as it only depends
on the other branch. Accordingly, provided that information flow correctness checks are passed, the
cases where the output is constructed non-recursively, namely those of @{const SKIP}, assignments,
and loops, return @{const UNIV} as @{typ "vname set"} if the input @{typ "state set"} is empty. In
the case of a loop, the @{typ "state set"} and the @{typ "vname set"} resulting from one or more
iterations of its body are computed using the auxiliary type system @{const ctyping1}. This explains
why @{const ctyping1} returns @{const UNIV} as @{typ "vname set"} if the input @{typ "state set"} is
empty, as mentioned previously.

As happens with the syntax-directed security type system formalized in \<^cite>‹"Nipkow23-3"›, the
cases performing non-recursive information flow correctness checks are those of assignments and
loops. In the former case, @{text ctyping2} verifies that the sets of variables contained in the
scope, as well as any variable occurring in the expression on the right-hand side of the assignment,
are allowed to interfere with the variable on the left-hand side, respectively in their associated
sets of states and in the input @{typ "state set"}. In the latter case, @{text ctyping2} verifies
that the sets of variables contained in the scope, as well as any variable occurring in the boolean
condition of the loop, are allowed to interfere with \emph{every} variable, respectively in their
associated sets of states and in the states in which the boolean condition is evaluated. In both
cases, if the applying interference relation is unknown as some state variable is indeterminate,
each of those checks must be passed for \emph{any} possible state (unless the respective set of
states is empty).

Why do the checks performed for loops test interference with \emph{every} variable? The answer is
provided by the following sample C program, which sets variables \lstinline{a} and \lstinline{b} to
the terms in the zero-based positions \lstinline{j} and \lstinline{j} + 1 of the Fibonacci sequence.

\null

\<^lstlisting>‹
a = 0;
b = 1;
for (i = 0; i != j; i++)
{
  c = b;
  b += a;
  a = c;
}
›

\null

The loop in this program terminates for any nonnegative value of \lstinline{j}. For any variable
\lstinline{x}, suppose that \lstinline{j} is not allowed to interfere with \lstinline{x} in such an
initial state, say @{text s}. According to the above information flow correctness definition, any
initial state @{text t} differing from @{text s} in the value of \lstinline{j} must make execution
terminate all the same in order for the program to be correct. However, this is not the case, since
execution does not terminate for any negative value of \lstinline{j}. Thus, the type system needs to
verify that \lstinline{j} may interfere with \lstinline{x}, on pain of returning a wrong \emph{pass}
verdict.

The cases that change the scope upon recursively calling the type system are those of conditional
statements and loops. In the latter case, the boolean flag is set to @{const False}, and the set of
@{typ "state set × vname set"} pairs is empty as the whole scope nesting the loop body, including
any variable occurring in the boolean condition of the loop, must be allowed to interfere with every
variable. In the former case, for both branches, the boolean flag is left unchanged, whereas the set
of pairs is extended with the pair composed of the input @{typ "state set"} (or of @{const UNIV} if
some state variable is indeterminate, unless the input @{typ "state set"} is empty) and of the set
of the variables, if any, occurring in the boolean condition of the statement.

Why is the scope extended with the whole input @{typ "state set"} for both branches, rather than
just with the set of states in which each single branch is selected? Once more, the question can be
answered by considering a sample C program, namely a previous one determining the minimum of two
integers \lstinline{a} and \lstinline{b} using a state variable \lstinline{i}. For the reader's
convenience, the program is reported here below.

\null

\<^lstlisting>‹
i = (a > b) ? 1 : 0;
if (i > 0)
{
  a = b;
}
›

\null

Since the branch changing the value of variable \lstinline{a} is executed just in case \lstinline{i}
$=1$, suppose that in addition to \lstinline{b}, \lstinline{i} also is not allowed to interfere with
\lstinline{a} for \lstinline{i} $=0$, and let @{text s} be any initial state where \lstinline{a}
$\le$ \lstinline{b}. Based on the above information flow correctness definition, any initial state
@{text t} differing from @{text s} in the value of \lstinline{b} (not bound by the interference of
\lstinline{i} with \lstinline{a}) must produce the same final value of \lstinline{a} in order for
the program to be correct. However, this is not the case, as the final value of \lstinline{a} will
change for any state @{text t} where \lstinline{a} $>$ \lstinline{b}. Therefore, the type system
needs to verify that \lstinline{i} may interfere with \lstinline{a} for \lstinline{i} $=0$, too, on
pain of returning a wrong \emph{pass} verdict. This is the reason why, as mentioned previously, an
information flow correctness policy for this program should be such that @{prop "s: dom i ↝ dom a"}
even for any state @{text s} where \lstinline{i} $=0$.

An even simpler example explains why, in the case of an assignment or a loop, the information flow
correctness checks described above need to be applied to the set of @{typ "state set × vname set"}
pairs in the scope even if the input @{typ "state set"} is empty, namely even if the assignment or
the loop are nested in a ``dead'' branch. Here below is a sample C program showing this.

\null

\<^lstlisting>‹
if (i)
{
  a = 1;
}
›

\null

Assuming that the initial value of \lstinline{i} is 0, the assignment nested within the conditional
statement is not executed, so that the final value of \lstinline{a} matches the initial one, say 0.
Suppose that \lstinline{i} is not allowed to interfere with \lstinline{a} in such an initial state,
say @{text s}. According to the above information flow correctness definition, any initial state
@{text t} differing from @{text s} in the value of \lstinline{i} must produce the same final value
of \lstinline{a} in order for the program to be correct. However, this is not the case, as the final
value of \lstinline{a} is 1 for any nonzero value of \lstinline{i}. Therefore, the type system needs
to verify that \lstinline{i} may interfere with \lstinline{a} in state @{text s} even though the
conditional branch is not executed in that state, on pain of returning a wrong \emph{pass} verdict.

\null
›

abbreviation atyping :: "bool ⇒ aexp ⇒ vname set ⇒ bool"
  (‹(_ ⊨ _ '(⊆ _'))› [51, 51] 50) where
"v ⊨ a (⊆ X) ≡ avars a = {} ∨ avars a ⊆ state ∩ X ∧ v"

definition univ_states_if :: "state set ⇒ vname set ⇒ state set"
  (‹(Univ? _ _)› [51, 75] 75) where
"Univ? A X ≡ if state ⊆ X then A else Univ A (⊆ {})"


fun ctyping2 :: "scope ⇒ com ⇒ state set ⇒ vname set ⇒ config option"
  (‹(_ ⊨ _ '(⊆ _, _'))› [51, 51] 55) where

"_ ⊨ SKIP (⊆ A, X) = Some (A, Univ?? A X)" |

"(U, v) ⊨ x ::= a (⊆ A, X) =
 (if (∀(B, Y) ∈ insert (Univ? A X, avars a) U. B: dom ` Y ↝ {dom x})
  then Some (if x ∈ state ∧ A ≠ {}
    then if v ⊨ a (⊆ X)
      then ({s(x := aval a s) | s. s ∈ A}, insert x X) else (A, X - {x})
    else (A, Univ?? A X))
  else None)" |

"(U, v) ⊨ c⇩₁;; c⇩₂ (⊆ A, X) =
 (case (U, v) ⊨ c⇩₁ (⊆ A, X) of
    Some (B, Y) ⇒ (U, v) ⊨ c⇩₂ (⊆ B, Y) | _ ⇒ None)" |

"(U, v) ⊨ IF b THEN c⇩₁ ELSE c⇩₂ (⊆ A, X) =
 (case (insert (Univ? A X, bvars b) U, ⊨ b (⊆ A, X)) of (U', B⇩₁, B⇩₂) ⇒
    case ((U', v) ⊨ c⇩₁ (⊆ B⇩₁, X), (U', v) ⊨ c⇩₂ (⊆ B⇩₂, X)) of
      (Some (C⇩₁, Y⇩₁), Some (C⇩₂, Y⇩₂)) ⇒ Some (C⇩₁ ∪ C⇩₂, Y⇩₁ ∩ Y⇩₂) |
      _ ⇒ None)" |

"(U, v) ⊨ WHILE b DO c (⊆ A, X) = (case ⊨ b (⊆ A, X) of (B⇩₁, B⇩₂) ⇒
  case ⊢ c (⊆ B⇩₁, X) of (C, Y) ⇒ case ⊨ b (⊆ C, Y) of (B⇩₁', B⇩₂') ⇒
    if ∀(B, W) ∈ insert (Univ? A X ∪ Univ? C Y, bvars b) U.
      B: dom ` W ↝ UNIV
    then case (({}, False) ⊨ c (⊆ B⇩₁, X), ({}, False) ⊨ c (⊆ B⇩₁', Y)) of
      (Some _, Some _) ⇒ Some (B⇩₂ ∪ B⇩₂', Univ?? B⇩₂ X ∩ Y) |
      _ ⇒ None
    else None)"

end

end