# Theory OpenFlow_Documentation

text_raw‹
\twocolumn
\columnsep 2pc          %    Space between columns
\textwidth 42pc         % Width of text line.
\part{Documentation}
\label{part2}
›
section‹Configuration Translation›
text_raw‹\label{sec:conv}›
text‹
All the results we present in this section are formalized and verified in Isabelle/HOL~\cite{nipkow2002isabelle}.
This means that their formal correctness can be trusted a level close to absolute certainty.
The definitions and lemmas stated here are merely a repetition of lemmas stated in other theory files.
This means that they have been directly set to this document from Isabelle and no typos or hidden assumptions are possible.
Additionally, it allows us to omit various helper lemmas that do not help the understanding.
However, it causes some notation inaccuracy, as type and function definitions are stated as lemmas or schematic goals.
›
theory OpenFlow_Documentation
(*<*)
imports
LinuxRouter_OpenFlow_Translation
Featherweight_OpenFlow_Comparison
"HOL-Library.LaTeXsugar"
begin
(*>*)

subsection‹Linux Firewall Model›
text_raw‹\label{sec:lfw}›
text‹We want to write a program that translates the configuration of a linux firewall to that of an OpenFlow switch.
We furthermore want to verify that translation.
For this purpose, we need a clear definition of the behavior of the two device types -- we need their models and semantics.
In case of a linux firewall, this is problematic because a linux firewall is a highly complex device that is ultimately capable of general purpose computation.
Creating a comprehensive semantics that encompasses all possible configuration types of a linux firewall is thus
highly non-trivial and not useful for the purpose of analysis.
We decided to approach the problem from the other side: we created a model that includes only the most basic features. (This implies neglecting IPv6.)
Fortunately, many of the highly complex features are rarely essential and even our basic model is still of some use.
›

text‹We first divided the firewall into subsystems.
Given a routing table @{term rt}, the firewall rules @{term fw},
the routing decision for a packet @{term p} can be obtained by @{term "routing_table_semantics rt (p_dst p)"},
the firewall decision by @{term "simple_fw fw p"}.
We draft the first description of our linux router model:
\begin{enumerate}
\item The destination MAC address of an arriving packet is checked: Does it match the MAC address of the ingress port?
If it does, we continue, otherwise, the packet is discarded.
\item The routing decision @{term "rd ≡ routing_table_semantics rt p"} is obtained.
\item The packet's output interface is updated based on @{term rd}\footnote{Note that we assume a packet model with input and output interfaces. The origin of this is explained in Section~\ref{sec:lfwfw}}.
\item The firewall is queried for a decision: @{term "simple_fw fw p"}. If the decision is to @{const[names_short] simple_action.Drop}, the packet is discarded.
\item The next hop is computed: If @{term rd} provides a next hop, that is used.
Otherwise, the destination address of the packet is used.
\item The MAC address of the next hop is looked up; the packet is updated with it and sent.
\end{enumerate}
We decided that this description is best formalized as an abortable program in the option monad:›
lemma "simple_linux_router rt fw mlf ifl p ≡ do {
_ ← iface_packet_check ifl p;
let rd ― ‹(routing decision)› = routing_table_semantics rt (p_dst p);
let p = p⦇p_oiface := output_iface rd⦈;
let fd ― ‹(firewall decision)› = simple_fw fw p;
_ ← (case fd of Decision FinalAllow ⇒ Some () | Decision FinalDeny ⇒ None);
let nh = (case next_hop rd of None ⇒ p_dst p | Some a ⇒ a);
ma ← mlf nh;
Some (p⦇p_l2dst := ma⦈)
}"
unfolding fromMaybe_def[symmetric] by(fact simple_linux_router_def)
text‹where @{term "mlf :: ipv4addr ⇒ 48 word"} is a function that looks up the MAC address for an IP address.›

text‹There are already a few important aspects that have not been modelled, but they are not core essential for the functionality of a firewall.
Namely, there is no local traffic from/to the firewall.
This is problematic since this model can not generate ARP replies --- thus, an equivalent OpenFlow device will not do so, either.
Furthermore, this model is problematic because it requires access to a function that looks up a MAC address,
something that may not be known at the time of time running a translation to an OpenFlow configuration.
›
text‹It is possible to circumvent these problems by inserting static ARP table entries in the directly connected devices
and looking up their MAC addresses \emph{a priori}.
A test-wise implementation of the translation based on this model showed acceptable results.
However, we deemed the \emph{a priori} lookup of the MAC addresses to be rather inelegant and built a second model.›
definition "simple_linux_router_altered rt fw ifl p ≡ do {
let rd = routing_table_semantics rt (p_dst p);
let p = p⦇p_oiface := output_iface rd⦈;
_ ← if p_oiface p = p_iiface p then None else Some ();
let fd = simple_fw fw p;
_ ← (case fd of Decision FinalAllow ⇒ Some () | Decision FinalDeny ⇒ None);
Some p
}"
(* TODO: Would a router actually forward a packet on the same interface? *)
text‹In this model, all access to the MAC layer has been eliminated.
This is done by the approximation that the firewall will be asked to route a packet
(i.e. be addressed on the MAC layer) iff the destination IP address of the packet causes it to be routed out on a different interface.
Because this model does not insert destination MAC addresses, the destination MAC address has to be already correct when the packet is sent.
This can only be achieved by changing the subnet of all connected device, moving them into one common subnet\footnote{There are cases where this is not possible --- A limitation of our system.}.
›
text‹
While a test-wise implementation based on this model also showed acceptable results, the model is still problematic.
The check @{term "p_oiface p = p_iiface p"} and the firewall require access to the output interface.
The details of why this cannot be provided are be elaborated in Section~\ref{sec:convi}.
The intuitive explanation is that an OpenFlow match can not have a field for the output interface.
We thus simplified the model even further:
›
lemma "simple_linux_router_nol12 rt fw p ≡ do {
let rd = routing_table_semantics rt (p_dst p);
let p = p⦇p_oiface := output_iface rd⦈;
let fd = simple_fw fw p;
_ ← (case fd of Decision FinalAllow ⇒ Some () | Decision FinalDeny ⇒ None);
Some p
}" by(fact simple_linux_router_nol12_def)
text‹We continue with this definition as a basis for our translation.
Even this strongly altered version and the original linux firewall still behave the same in a substantial amount of cases:›
theorem
"⟦iface_packet_check ifl pii ≠ None;
mlf (case next_hop (routing_table_semantics rt (p_dst pii)) of None ⇒ p_dst pii | Some a ⇒ a) ≠ None⟧ ⟹
∃x. map_option (λp. p⦇p_l2dst := x⦈) (simple_linux_router_nol12 rt fw pii) = simple_linux_router rt fw mlf ifl pii"
by(fact rtr_nomac_eq[unfolded fromMaybe_def])
text‹The conditions are to be read as The check whether a received packet has the correct destination MAC never returns @{const False}'' and
The next hop MAC address for all packets can be looked up''.
Obviously, these conditions do not hold for all packets.
We will show an example where this makes a difference in Section~\ref{sec:mnex}.›

subsubsection‹Routing Table›
text_raw‹\label{sec:lfwr}›
text‹
The routing system in linux features multiple tables and a system that can use the iptables firewall and an additional match language to select a routing table.
Based on our directive, we only focused on the single most used \texttt{main} routing table.›
text‹
We define a routing table entry to be a record (named tuple) of a prefix match, a metric and the routing action, which in turn is a record of an output interface and an optional next-hop address.›
schematic_goal "(?rtbl_entry :: ('a::len) routing_rule) = ⦇ routing_match = PrefixMatch pfx len, metric = met, routing_action = ⦇ output_iface = oif_string, next_hop = (h :: 'a word option) ⦈ ⦈" ..
text‹A routing table is then a list of these entries:›
lemma "(rtbl :: ('a :: len) prefix_routing) = (rtbl :: 'a routing_rule list)" by rule
text‹Not all members of the type @{type prefix_routing} are sane routing tables. There are three different validity criteria that we require so that our definitions are adequate.
\begin{itemize}
\item The prefixes have to be 0 in bits exceeding their length.
\item There has to be a default rule, i.e. one with prefix length 0. With the condition above, that implies that all its prefix bits are zero and it thus matches any address.
\item The entries have to be sorted by prefix length and metric.
\end{itemize}
The first two are set into code in the following way:
›
lemma "valid_prefix (PrefixMatch pfx len) ≡ pfx && (2 ^ (32 - len) - 1) = (0 :: 32 word)"
lemma "has_default_route rt ⟷ (∃r ∈ set rt. pfxm_length (routing_match r) = 0)"
by(fact has_default_route_alt)
text‹The third is not needed in any of the further proofs, so we omit it.›

text‹The semantics of a routing table is to simply traverse the list until a matching entry is found.›
schematic_goal "routing_table_semantics (rt_entry # rt) dst_addr = (if prefix_match_semantics (routing_match rt_entry) dst_addr then routing_action rt_entry else routing_table_semantics rt dst_addr)" by(fact routing_table_semantics.simps)
text‹If no matching entry is found, the behavior is undefined.›

subsubsection‹iptables Firewall›
text_raw‹\label{sec:lfwfw}›
text‹The firewall subsystem in a linux router is not any less complex than any of the of the other systems.
Fortunately, this complexity has been dealt with in~\cite{diekmann2016verified,Iptables_Semantics-AFP} already and we can directly use the result.›
text‹In short, one of the results is that a complex \emph{iptables} configuration can be simplified to be represented by a single list of matches that only support the following match conditions:
\begin{itemize}
\item (String) prefix matches on the input and output interfaces.
\item A @{type prefix_match} on the source and destination IP address.
\item An exact match on the layer 4 protocol.
\item Interval matches on the source or destination port, e.g. @{term "p⇩d ∈ {(1::16 word)..1023}"}
\end{itemize}
The model/type of the packet is adjusted to fit that: it is a record of the fields matched on.
This also means that input and output interface are coded to the packet.
Given that this information is usually stored alongside the packet content, this can be deemed a reasonable model.
In case the output interface is not needed (e.g., when evaluating an OpenFlow table), it can simply be left blank.

Obviously, a simplification into the above match type cannot always produce an equivalent firewall, and the set of accepted packets has to be over- or underapproximated.
The reader interested in the details of this is strongly referred to~\cite{diekmann2016verified}; we are simply going to continue with the result: @{const simple_fw}.
›
text‹One property of the simplification is worth noting here: The simplified firewall does not know state and the simplification approximates stateful matches by stateless ones.
Thus, the overapproximation of a stateful firewall ruleset that begins with accepting packets of established connections usually begins with a rule that accepts all packets.
Dealing with this by writing a meaningful simplification of stateful firewalls is future work.
›

subsection‹OpenFlow Switch Model›
text‹In this section, we present our model of an OpenFlow switch.
The requirements for this model are derived from the fact that it models devices that are the target of a configuration translation.
This has two implications:
\begin{itemize}
\item All configurations that are representable in our model should produce the correct behavior wrt. their semantics.
The problem is that correct here means that the behavior is the same that any real device would produce.
Since we cannot possibly account for all device types, we instead focus on those that conform to the OpenFlow specifications.
To account for the multiple different versions of the specification (e.g.~\cite{specification10,specification15}), we tried making our model a subset of
both the oldest stable version 1.0~\cite{specification10} and the newest available specification version 1.5.1~\cite{specification15}.
\item Conversely, our model does not need to represent all possible behavior of an OpenFlow switch, just the behavior that can be invoked by the result of our translation.
This is especially useful regarding for controller interaction, but also for MPLS or VLANs, which we did not model in Section \ref{sec:lfw}.
\end{itemize}›

text‹More concretely, we set the following rough outline for our model.
\begin{itemize}
\item A switch consists of a single flow table.
\item A flow table entry consists of a priority, a match condition and an action list.
\item The only possible action (we require) is to forward the packet on a port.
\item We do not model controller interaction.
\end{itemize}
Additionally, we decided that we wanted to be able to ensure the validity of the flow table in all qualities,
i.e. we want to model the conditions no overlapping flow entries appear', all match conditions have their necessary preconditions'.
The details of this are explained in the following sections.
›

subsubsection‹Matching Flow Table entries›
text_raw‹\label{sec:of_match}›
text‹Table 3 of Section 3.1 of \cite{specification10} gives a list of required packet fields that can be used to match packets.
This directly translates into the type for a match expression on a single field:›

schematic_goal "(field_match :: of_match_field) ∈ {
IngressPort (?s::string),
EtherSrc (?as::48 word), EtherDst (?ad::48 word),
EtherType (?t::16 word),
VlanId (?i::16 word), VlanPriority (?p::16 word),
IPv4Src (?pms::32 prefix_match),
IPv4Dst (?pmd::32 prefix_match),
IPv4Proto (?ipp :: 8 word),
L4Src (?ps :: 16 word) (?ms :: 16 word),
L4Dst (?pd :: 16 word) (?md :: 16 word)
}" by(fact of_match_field_typeset)
text‹
Two things are worth additional mention: L3 and L4 addressess''.
The @{term IPv4Src} and @{term IPv4Dst} matches are specified as can be subnet masked'' in~\cite{specification10},
whereras~\cite{specification15} states clearly that arbitrary bitmasks can be used. We took the conservative approach here.
Our alteration of @{term L4Src} and @{term L4Dst} is more grave. While~\cite{specification10} does not state anything about layer 4 ports and masks,
\cite{specification15} specifically forbids using masks on them.
Nevertheless, OpenVSwitch \cite{openvswitch} and some other implementations support them.
We will explain in detail why we must include bitmasks on layer 4 ports to obtain a meaningful translation in Section~\ref{sec:convi}.›

text‹One @{type of_match_field} is not enough to classify a packet.
To match packets, we thus use entire sets of match fields.
As Guha \emph{et al.}~\cite{guha2013machine} noted\footnote{See also: \cite[§2.3]{michaelis2016middlebox}}, executing a set of given @{type of_match_field}s on a packet requires careful consideration.
For example, it is not meaningful to use @{term IPv4Dst} if the given packet is not actually an IP packet, i.e.
@{term IPv4Dst} has the prerequisite of @{term "EtherType 0x0800"} being among the match fields.
Guha \emph{et al.} decided to use the fact that the preconditions can be arranged on a directed acyclic graph (or rather: an acyclic forest).
They evaluated match conditions in a manner following that graph:
first, all field matches without preconditions are evaluated.
Upon evaluating a field match (e.g., @{term "EtherType 0x0800"}), the matches that had their precondition fulfilled by it
(e.g., @{term IPv4Src} and @{term IPv4Src} in this example) are evalutated.
This mirrors the faulty behavior of some implementations (see \cite{guha2013machine}).
Adopting that behavior into our model would mean that any packet matches against the field match set @{term "{IPv4Dst (PrefixMatch 134744072 32)}"}
instead of just those destined for 8.8.8.8 or causing an error. We found this to be unsatisfactory.›

text‹To solve this problem, we made three definitions.
The first, @{term match_no_prereq} matches an @{type of_match_field} against a packet without considering prerequisites.
The second, @{term prerequisites}, checks for a given @{type of_match_field} whether its prerequisites are in a set of given match fields.
Especially:
›
lemma
"prerequisites (VlanPriority pri) m = (∃id. let v = VlanId id in v ∈ m ∧ prerequisites v m)"
"prerequisites (IPv4Proto pr) m = (let v = EtherType 0x0800 in v ∈ m ∧ prerequisites v m)"
"prerequisites (IPv4Src a) m = (let v = EtherType 0x0800 in v ∈ m ∧ prerequisites v m)"
"prerequisites (IPv4Dst a) m = (let v = EtherType 0x0800 in v ∈ m ∧ prerequisites v m)"
"prerequisites (L4Src p msk) m = (∃proto ∈ {TCP,UDP,L4_Protocol.SCTP}. let v = IPv4Proto proto in v ∈ m ∧ prerequisites v m)"
"prerequisites (L4Dst p msk) m = prerequisites (L4Src undefined undefined) m"
by(fact prerequisites.simps)+
text‹Then, to actually match a set of @{type of_match_field} against a packet, we use the option type:›
lemma "OF_match_fields m p =
(if ∃f ∈ m. ¬prerequisites f m then None else
if ∀f ∈ m. match_no_prereq f p then Some True else Some False)"
by(fact OF_match_fields_alt)

subsubsection‹Evaluating a Flow Table›
text‹In the previous section, we explained how we match the set of match fields belonging to a single flow entry against a packet.
This section explains how the correct flow entry from a table can be selected.
To prevent to much entanglement with the previous section, we assume an arbitrary match function @{term "γ :: 'match_field set ⇒ 'packet ⇒ bool"}.
This function @{term "γ"} takes the match condition @{term m} from a flow entry @{term "OFEntry (priority::16 word) (m::'match_field set) action"}
and decides whether a packet matches those.›

text‹The flow table is simply a list of flow table entries @{type flow_entry_match}.
Deciding the right flow entry to use for a given packet is explained in the OpenFlow specification \cite{specification10}, Section 3.4:
\begin{quote}
Packets are matched against flow entries based on prioritization.
An entry that specifies an exact match (i.e., has no wildcards) is always the highest priority\footnote{This behavior has been deprecated.}.
All wildcard entries have a priority associated with them.
Higher priority entries must match before lower priority ones.
If multiple entries have the same priority, the switch is free to choose any ordering.
\end{quote}
We use the term overlapping'' for  the flow entries that can cause a packet to match multiple flow entries with the same priority.
Guha \emph{et al.}~\cite{guha2013machine} have dealt with overlapping.
However, the semantics for a flow table they presented \cite[Figure 5]{guha2013machine}
is slightly different from what they actually used in their theory files.
We have tried to reproduce the original inductive definition (while keeping our abstraction @{term γ}),
in Isabelle/HOL\footnote{The original is written in Coq~\cite{barras1997coq} and we can not use it directly.}:›
lemma "γ (ofe_fields fe) p = True ⟹
∀fe' ∈ set (ft1 @ ft2). ofe_prio fe' > ofe_prio fe ⟶ γ (ofe_fields fe') p = False ⟹
guha_table_semantics γ (ft1 @ fe # ft2) p (Some (ofe_action fe))"
"∀fe ∈ set ft. γ (ofe_fields fe) p = False ⟹
guha_table_semantics γ ft p None" by(fact guha_matched guha_unmatched)+
text‹Guha \emph{et al.} have deliberately made their semantics non-deterministic, to match the fact that the switch may choose any ordering''.
This can lead to undesired results:›
lemma "CARD('action) ≥ 2 ⟹ ∃ff. γ ff p ⟹ ∃ft (a1 :: 'action) (a2 :: 'action). a1 ≠ a2 ∧ guha_table_semantics γ ft p (Some a1) ∧ guha_table_semantics γ ft p (Some a2)"
by(fact guha_table_semantics_ex2res)
text‹This means that, given at least two distinct actions exist and our matcher @{term γ} is not false for all possible match conditions,
we can say that a flow table and two actions exist such that both actions are executed. This can be misleading, as the switch might choose an
ordering on some flow table and never execute some of the (overlapped) actions.›

text‹Instead, we decided to follow Section 5.3 of the specification \cite{specification15}, which states:
\begin{quote}
If there are multiple matching flow entries, the selected flow entry is explicitly undefined.
\end{quote}
This still leaves some room for interpretation, but it clearly states that overlapping flow entries are undefined behavior,
and undefined behavior should not be invoked.
Thus, we came up with a semantics that clearly indicates when undefined behavior has been invoked:›
lemma
"OF_priority_match γ flow_entries packet = (
let m  = filter (λf. γ (ofe_fields f) packet) flow_entries;
m' = filter (λf. ∀fo ∈ set m. ofe_prio fo ≤ ofe_prio f) m in
case m' of []  ⇒ NoAction
| [s] ⇒ Action (ofe_action s)
|  _  ⇒ Undefined)"
unfolding OF_priority_match_def ..
text‹The definition works the following way\footnote{Note that the order of the flow table entries is irrelevant.
We could have made this definition on sets but chose not to for consistency.}:
\begin{enumerate}
\item The flow table is filtered for those entries that match, the result is called $m$.
\item $m$ is filtered again, leaving only those entries for which no entries with lower priority could be found, i.e. the matching flow table entries with minimal priority. The result is called $m'$.
\item A case distinction on $m'$ is made. If only one matching entry was found, its action is returned for execution.
If $m$ is empty, the flow table semantics returns @{term NoAction} to indicate that the flow table does not decide an action for the packet.
If,  not zero or one entry is found, but more, the special value @{term Undefined} for indicating undefined behavior is returned.
\end{enumerate}
The use of @{term Undefined} immediately raises the question in which condition it cannot occur.
We give the following definition:›
lemma "check_no_overlap γ ft = (∀a ∈ set ft. ∀b ∈ set ft. (a ≠ b ∧ ofe_prio a = ofe_prio b) ⟶ ¬(∃p. γ (ofe_fields a) p ∧ γ (ofe_fields b) p))" unfolding check_no_overlap_alt check_no_overlap2_def by force
text‹Together with distinctness of the flow table, this provides the abscence of @{term Undefined}\footnote{It is slightly stronger than necessary, overlapping rules might be shadowed and thus never influence the behavior.}:›
lemma "⟦check_no_overlap γ ft; distinct ft⟧ ⟹
OF_priority_match γ ft p ≠ Undefined" by (simp add: no_overlapsI no_overlaps_not_unefined)

text‹Given the absence of overlapping or duplicate flow entries, we can show two interesting equivalences.
the first is the equality to the semantics defined by Guha \emph{et al.}:›
lemma "⟦check_no_overlap γ ft; distinct ft⟧ ⟹
OF_priority_match γ ft p = option_to_ftb d ⟷ guha_table_semantics γ ft p d"
text‹where @{term option_to_ftb} maps between the return type of @{term OF_priority_match} and an option type as one would expect.›

text‹The second equality for @{term OF_priority_match} is one that helps reasoning about flow tables.
We define a simple recursive traversal for flow tables:›
lemma
"OF_match_linear γ [] p = NoAction"
"OF_match_linear γ (a#as) p = (if γ (ofe_fields a) p then Action (ofe_action a) else OF_match_linear γ as p)"
by(fact OF_match_linear.simps)+
text‹For this definition to be equivalent, we need the flow table to be sorted:›
lemma"
⟦no_overlaps γ f ;sorted_descending (map ofe_prio f)⟧ ⟹
OF_match_linear γ f p = OF_priority_match γ f p"
by(fact  OF_eq)

text‹As the last step, we implemented a serialization function for flow entries; it has to remain unverified.
The serialization function deals with one little inaccuracy: We have modelled the @{term IngressPort} match to use the interface name, but OpenFlow requires numerical interface IDs instead.
We deemed that pulling this translation step into the main translation would only make the correctness lemma of the translation more complicated while not increasing the confidence in the correctness significantly.
We thus made replacing interface names by their ID part of the serialization.
›

text‹Having collected all important definitions and models, we can move on to the conversion.›
(*text‹\todo{Maybe I should make a sweet little subsection that merges this all into a single model definition.}›*)

subsection‹Translation Implementation›
text_raw‹\label{sec:convi}›
text‹This section explains how the functions that are executed sequentially in a linux firewall can be compressed into a single OpenFlow table.
Creating this flow table in a single step would be immensely complicated.
We thus divided the task into several steps using the following key insights:
\begin{itemize}
\item All steps that are executed in the linux router can be formulated as a firewall, more specifically, a generalization of @{term simple_fw} that allows arbitrary actions instead of just accept and drop.
\item A function that computes the conjunction of two @{term simple_fw} matches is already present.
Extending this to a function that computes the join of two firewalls is relatively simple. This is explained in Section \ref{sec:fwconj}
\end{itemize}
›
subsubsection‹Chaining Firewalls›
text_raw‹\label{sec:fwconj}›
text‹This section explains how to compute the join of two firewalls.›
text‹The basis of this is a generalization of @{const simple_fw}.
Instead of only allowing @{const simple_action.Accept} or @{const simple_action.Drop} as actions, it allows arbitrary actions. The type of the function that evaluates this generalized simple firewall is
@{term "generalized_sfw :: ('i::len simple_match × 'a) list ⇒ ('i, 'b) simple_packet_scheme ⇒ ('i simple_match × 'a) option"}.
The definition is straightforward:›
lemma
"generalized_sfw [] p = None"
"generalized_sfw (a # as) p = (if (case a of (m,_) ⇒ simple_matches m p) then Some a else generalized_sfw as p)"
by(fact generalized_sfw_simps)+
text‹Based on that, we asked: if @{term fw⇩1} makes the decision @{term a} (where @{term a} is the second element of the result tuple from @{const generalized_sfw}) and @{term fw⇩2} makes the decision @{term b}, how can we compute the firewall that
makes the decision @{term "(a,b)"}\footnote{Note that tuples are right-associative in Isabelle/HOL, i.e., @{term "(a::'a,(b,c)::('b×'c))"} is a pair of @{term a} and the pair @{term "(b,c)"}}.
One possible answer is given by the following definition:
›
lemma "generalized_fw_join l1 l2 ≡ [(u,a,b). (m1,a) ← l1, (m2,b) ← l2, u ← (case simple_match_and m1 m2 of None ⇒ [] | Some s ⇒ [s])]"
by(fact generalized_fw_join_def[unfolded option2list_def])+
text‹This definition validates the following lemma:›
lemma "generalized_sfw (generalized_fw_join fw⇩1 fw⇩2) p = Some (u, d⇩1,d⇩2) ⟷ (∃r⇩1 r⇩2. generalized_sfw fw⇩1 p = Some (r⇩1,d⇩1) ∧ generalized_sfw fw⇩2 p = Some (r⇩2,d⇩2) ∧ Some u = simple_match_and r⇩1 r⇩2)"
by (auto dest: generalized_fw_joinD sym simp add: generalized_fw_joinI)
text‹Thus, @{const generalized_fw_join} has a number of applications.
For example, it could be used to compute a firewall ruleset that represents two firewalls that are executed in sequence.
›
definition "simple_action_conj a b ≡ (if a = simple_action.Accept ∧ b = simple_action.Accept then simple_action.Accept else simple_action.Drop)"
definition "simple_rule_conj ≡ (uncurry SimpleRule ∘ apsnd (uncurry simple_action_conj))"
theorem "simple_fw rs⇩1 p = Decision FinalAllow ∧ simple_fw rs⇩2 p = Decision FinalAllow ⟷
simple_fw (map simple_rule_conj (generalized_fw_join (map simple_rule_dtor rs⇩1) (map simple_rule_dtor rs⇩2))) p = Decision FinalAllow"
unfolding simple_rule_conj_def simple_action_conj_def[abs_def] using simple_fw_join by(force simp add: comp_def apsnd_def map_prod_def case_prod_unfold uncurry_def[abs_def])
text‹Using the join, it should be possible to compute any $n$-ary logical operation on firewalls.
We will use it for something somewhat different in the next section.›

subsubsection‹Translation Implementation›

text_raw‹
\begin{figure*}
\begin{framed}
›
lemma "lr_of_tran rt fw ifs ≡
if ¬ (no_oif_match fw ∧ has_default_policy fw ∧ simple_fw_valid fw	∧ valid_prefixes rt ∧ has_default_route rt ∧ distinct ifs)
then Inl ''Error in creating OpenFlow table: prerequisites not satisifed''
else (
let
nfw = map simple_rule_dtor fw;
frt = map (λr. (route2match r, output_iface (routing_action r))) rt;
nrd = generalized_fw_join frt nfw;
ard = (map (apfst of_nat) ∘ annotate_rlen) nrd
in
if length nrd < unat (- 1 :: 16 word)
then Inr (pack_OF_entries ifs ard)
else Inl ''Error in creating OpenFlow table: priority number space exhausted''
)"
unfolding Let_def lr_of_tran_def lr_of_tran_fbs_def lr_of_tran_s1_def comp_def route2match_def by force

text_raw‹
\end{framed}
\caption{Function for translating a @{typ "'i::len simple_rule list"}, a @{typ "'i routing_rule list"}, and a list of interfaces to a flow table.}
\label{fig:convi}
\end{figure*}
›

text‹
This section shows the actual definition of the translation function, in Figure~\ref{fig:convi}.
Before beginning the translation, the definition checks whether the necessary preconditions are valid.
This first two steps are to convert @{term fw} and @{term rt} to lists that can be evaluated by @{const generalized_sfw}.
For @{term fw}, this is done by @{term "map simple_rule_dtor"}, which just deconstructs @{type simple_rule}s into tuples of match and action.
For @{term rt}, we made a firewall ruleset with rules that use prefix matches on the destination IP address.
The next step is to join the two rulesets.
The result of the join is a ruleset with rules @{term r} that only match if both, the corresponding firewall rule @{term fwr} and the corresponding routing rule @{term rr} matches.
The data accompanying @{term r} is the port from @{term rr} and the firewall decision from @{term fwr}.
Next, descending priorities are added to the rules using @{term "map (apfst word_of_nat) ∘ annotate_rlen"}.
If the number of rules is too large to fit into the $2^{16}$ priority classes, an error is returned.
Otherwise, the function @{const pack_OF_entries} is used to convert the @{typ "(16 word × 32 simple_match × char list × simple_action) list"} to an OpenFlow table.
While converting the @{typ "char list × simple_action"} tuple is straightforward, converting the @{type simple_match} to an equivalent list of @{typ "of_match_field set"} is non-trivial.
This is done by the function @{const simple_match_to_of_match}.
›
text‹The main difficulties for @{const simple_match_to_of_match} lie in making sure that the prerequisites are satisfied
and in the fact that a @{type simple_match} operates on slightly stronger match expressions.
\begin{itemize}
\item A @{type simple_match} allows a (string) prefix match on the input and output interfaces.
Given a list of existing interfaces on the router @{term ifs}, the function has to insert flow entries for each interface matching the prefix.
\item A @{type simple_match} can match ports by an interval. Now it becomes obvious why Section~\ref{sec:of_match} added bitmasks to @{const L4Src} and @{const L4Dst}.
Using the algorithm to split word intervals into intervals that can be represented by prefix matches from~\cite{diekmann2016verified},
we can efficiently represent the original interval by a few (32 in the worst case) prefix matches and insert flow entries for each of them.%
\footnote{It might be possible to represent the interval match more efficiently than a split into prefixes. However, that would produce overlapping matches (which is not a problem if we assing separate priorities)
and we did not have a verified implementation of an algorithm that does so.}
\end{itemize}
The following lemma characterizes @{const simple_match_to_of_match}:
›
lemma simple_match_to_of_match:
assumes
"simple_match_valid r"
"p_iiface p ∈ set ifs"
"match_iface (oiface r) (p_oiface p)"
"p_l2type p = 0x800"
shows
"simple_matches r p ⟷ (∃gr ∈ set (simple_match_to_of_match r ifs). OF_match_fields gr p = Some True)"
using assms simple_match_to_of_matchD simple_match_to_of_matchI by blast

text‹The assumptions are to be read as follows:
\begin{itemize}
\item The match @{term r} has to be valid, i.e. it has to use @{const valid_prefix} matches, and it cannot use anything other than $0$-$65535$ for the port matches unless its protocol match ensures @{const TCP}, @{const UDP} or @{const L4_Protocol.SCTP}.
\item @{const simple_match_to_of_match} cannot produce rules for packets that have input interfaces that are not named in the interface list.
\item The output interface of @{term p} has to match the output interface match of @{term r}. This is a weakened formulation of @{term "oiface r = ifaceAny"}, since @{thm[display] match_ifaceAny[no_vars]}.
We require this because OpenFlow field matches cannot be used to match on the output port --- they are supposed to match a packet and decide an output port.
\item The @{type simple_match} type was designed for IP(v4) packets, we limit ourselves to them.
\end{itemize}
The conclusion then states that the @{type simple_match} @{term r} matches iff an element of the result of @{const simple_match_to_of_match} matches.
The third assumption is part of the explanation why we did not use @{const simple_linux_router_altered}:
@{const simple_match_to_of_match} cannot deal with output interface matches.
Thus, before passing a generalized simple firewall to @{const pack_OF_entries}, we would have to set the output ports to @{const ifaceAny}.
A system replace output interface matches with destination IP addresses has already been formalized and will be published in a future version of \cite{Iptables_Semantics-AFP}.
For now, we limit ourselves to firewalls that do not do output port matching, i.e., we require @{term "no_oif_match fw"}.
›

text_raw‹\begin{figure*}
\begin{framed}
›
theorem
fixes
p :: "(32, 'a) simple_packet_ext_scheme"
assumes
"p_iiface p ∈ set ifs" and "p_l2type p = 0x800"
"lr_of_tran rt fw ifs = Inr oft"
shows
"OF_priority_match OF_match_fields_safe oft p = Action [Forward oif] ⟷ simple_linux_router_nol12 rt fw p = (Some (p⦇p_oiface := oif⦈))"
"OF_priority_match OF_match_fields_safe oft p = Action [] ⟷ simple_linux_router_nol12 rt fw p = None"
"OF_priority_match OF_match_fields_safe oft p ≠ NoAction" "OF_priority_match OF_match_fields_safe oft p ≠ Undefined"
"OF_priority_match OF_match_fields_safe oft p = Action ls ⟶ length ls ≤ 1"
"∃ls. length ls ≤ 1 ∧ OF_priority_match OF_match_fields_safe oft p = Action ls"
using assms lr_of_tran_correct by simp_all
text_raw‹
\end{framed}
\caption{Central theorem on @{const lr_of_tran}}
\label{fig:central}
\end{figure*}
›

text‹
Given discussed properties, we present the central theorem for our translation in Figure~\ref{fig:central}.
The first two assumptions are limitations on the traffic we make a statement about.
Obviously, we will never see any packets with an input interface that is not in the interface list.
Furthermore, we do not state anything about non-IPv4 traffic. (The traffic will remain unmatched in by the flow table, but we have not verified that.)
The last assumption is that the translation does not return a run-time error.
The translation will return a run-time error if the rules can not be assigned priorities from a 16 bit integer,
or when one of the following conditions on the input data is not satisifed:›
lemma "
¬ no_oif_match fw ∨
¬ has_default_policy fw ∨
¬ simple_fw_valid fw	∨
¬ valid_prefixes rt ∨
¬ has_default_route rt ∨
¬ distinct ifs ⟹
∃err. lr_of_tran rt fw ifs = Inl err" unfolding lr_of_tran_def by(simp split: if_splits)

subsubsection‹Comparison to Exodus›
text‹
We are not the first researchers to attempt automated static migration to SDN.
The (only) other attempt we are aware of is \emph{Exodus} by Nelson \emph{et al.}~\cite{nelson2015exodus}.
›
text‹
There are some fundamental differences between Exodus and our work:
\begin{itemize}
\item Exodus focuses on Cisco IOS instead of linux.
\item Exodus does not produce OpenFlow rulesets, but FlowLog~\cite{nelson2014tierless} controller programs.
\item Exodus is not limited to using a single flow table.
\item Exodus requires continuous controller interaction for some of its functions.
\item Exodus attempts to support as much functionality as possible and has implemented support for dynamic routing, VLANs and NAT.
\item Nelson \emph{et al.} reject the idea that the translation could or should be proven correct.
\end{itemize}
›

(*<*)
end
(*>*)