# Theory Satisfiability

chapter ‹Satisfiability\label{s:Sat}›

theory Satisfiability
imports Wellformed NP
begin

text ‹
This chapter introduces the language \SAT{} and shows that it is in $\NP$, which
constitutes the easier part of the Cook-Levin theorem. The other part, the
$\NP$-hardness of \SAT{}, is what all the following chapters are concerned with.

We first introduce Boolean formulas in conjunctive normal form and the concept
of satisfiability. Then we define a way to represent such formulas as bit
strings, leading to the definition of the language \SAT{} as a set of strings
(Section~\ref{s:sat-sat}).

For the proof that \SAT{} is in $\NP$, we construct a Turing machine that, given
a CNF formula and a string representing a variable assignment, outputs
\textbf{1} iff. the assignment satisfies the formula. The TM will run in
polynomial time, and there are always assignments polynomial (in fact, linear)
in the length of the formula (Section~\ref{s:Sat-np}).
›

section ‹The language \SAT{}\label{s:sat-sat}›

text ‹
\SAT{} is the language of all strings representing satisfiable Boolean
formulas in conjunctive normal form (CNF).  This section introduces a minimal
version of Boolean formulas in conjunctive normal form, including the
concepts of assignments and satisfiability.
›

subsection ‹CNF formulas and satisfiability\label{s:CNF}›

text ‹
Arora and Barak~\cite[p.~44]{ccama} define Boolean formulas in general as
expressions over $\land$, $\lor$, $\lnot$, parentheses, and variables $v_1, v_2, \dots$ in the usual way. Boolean formulas in conjunctive normal form are defined as
$\bigwedge_i\left(\bigvee_j v_{i_j}\right)$, where the $v_{i_j}$ are literals.
This definition does not seem to allow for empty clauses. Also whether the
empty CNF formula'' exists is somewhat doubtful.  Nevertheless, our
formalization allows for both empty clauses and the empty CNF formula, because
this enables us to represent CNF formulas as lists of clauses and clauses as
lists of literals without having to somehow forbid the empty list. This seems to
be a popular approach for formalizing CNF formulas in the context of \SAT{} and
$\NP$~\cite{Gher2021MechanisingCT,Multiset_Ordering_NPC-AFP}.

We identify a variable $v_i$ with its index $i$, which can be any natural
number. A \emph{literal} can either be positive or negative, representing a
variable or negated variable, respectively.

\null
›

datatype literal = Neg nat | Pos nat

type_synonym clause = "literal list"

type_synonym formula = "clause list"

text ‹
An \emph{assignment} maps all variables, given by their index, to a Boolean:
›

type_synonym assignment = "nat ⇒ bool"

abbreviation satisfies_literal :: "assignment ⇒ literal ⇒ bool" where
"satisfies_literal α x ≡ case x of Neg n ⇒ ¬ α n | Pos n ⇒ α n"

definition satisfies_clause :: "assignment ⇒ clause ⇒ bool" where
"satisfies_clause α c ≡ ∃x∈set c. satisfies_literal α x"

definition satisfies :: "assignment ⇒ formula ⇒ bool" (infix "⊨" 60) where
"α ⊨ φ ≡ ∀c∈set φ. satisfies_clause α c"

text ‹
As is customary, the empty clause is satisfied by no assignment, and the empty
CNF formula is satisfied by every assignment.
›

proposition "¬ satisfies_clause α []"

proposition "α ⊨ []"

lemma satisfies_clause_take:
assumes "i < length clause"
shows "satisfies_clause α (take (Suc i) clause) ⟷
satisfies_clause α (take i clause) ∨ satisfies_literal α (clause ! i)"
using assms satisfies_clause_def by (auto simp add: take_Suc_conv_app_nth)

lemma satisfies_take:
assumes "i < length φ"
shows "α ⊨ take (Suc i) φ ⟷ α ⊨ take i φ ∧ satisfies_clause α (φ ! i)"
using assms satisfies_def by (auto simp add: take_Suc_conv_app_nth)

lemma satisfies_append:
assumes "α ⊨ φ⇩1 @ φ⇩2"
shows "α ⊨ φ⇩1" and "α ⊨ φ⇩2"
using assms satisfies_def by simp_all

lemma satisfies_append':
assumes "α ⊨ φ⇩1" and "α ⊨ φ⇩2"
shows "α ⊨ φ⇩1 @ φ⇩2"
using assms satisfies_def by auto

lemma satisfies_concat_map:
assumes "α ⊨ concat (map f [0..<k])" and "i < k"
shows "α ⊨ f i"
using assms satisfies_def by simp

lemma satisfies_concat_map':
assumes "⋀i. i < k ⟹ α ⊨ f i"
shows "α ⊨ concat (map f [0..<k])"
using assms satisfies_def by simp

text ‹
The main ingredient for defining \SAT{} is the concept of \emph{satisfiable} CNF
formula:
›

definition satisfiable :: "formula ⇒ bool" where
"satisfiable φ ≡ ∃α. α ⊨ φ"

text ‹
The set of all variables used in a CNF formula is finite.
›

definition variables :: "formula ⇒ nat set" where
"variables φ ≡ {n. ∃c∈set φ. Neg n ∈ set c ∨ Pos n ∈ set c}"

lemma finite_variables: "finite (variables φ)"
proof -
define voc :: "clause ⇒ nat set" where
"voc c = {n. Neg n ∈ set c ∨ Pos n ∈ set c}" for c
let ?vocs = "set (map voc φ)"

have "finite (voc c)" for c
proof (induction c)
case Nil
then show ?case
using voc_def by simp
next
case (Cons a c)
have "voc (a # c) = {n. Neg n ∈ set (a # c) ∨ Pos n ∈ set (a # c)}"
using voc_def by simp
also have "... = {n. Neg n ∈ set c ∨ Neg n = a ∨ Pos n ∈ set c ∨ Pos n = a}"
by auto
also have "... = {n. (Neg n ∈ set c ∨ Pos n ∈ set c) ∨ (Pos n = a ∨ Neg n = a)}"
by auto
also have "... = {n. Neg n ∈ set c ∨ Pos n ∈ set c} ∪ {n. Pos n = a ∨ Neg n = a}"
by auto
also have "... = voc c ∪ {n. Pos n = a ∨ Neg n = a}"
using voc_def by simp
finally have "voc (a # c) = voc c ∪ {n. Pos n = a ∨ Neg n = a}" .
moreover have "finite {n. Pos n = a ∨ Neg n = a}"
using finite_nat_set_iff_bounded by auto
ultimately show ?case
using Cons by simp
qed
moreover have "variables φ = ⋃?vocs"
using variables_def voc_def by auto
moreover have "finite ?vocs"
by simp
ultimately show ?thesis
by simp
qed

lemma variables_append: "variables (φ⇩1 @ φ⇩2) = variables φ⇩1 ∪ variables φ⇩2"
using variables_def by auto

text ‹
Arora and Barak~\cite[Claim~2.13]{ccama} define the \emph{size} of a CNF formula
as the numbr of $\wedge / \vee$ symbols. We use a slightly different definition,
namely the number of literals:
›

definition fsize :: "formula ⇒ nat" where
"fsize φ ≡ sum_list (map length φ)"

subsection ‹Predicates on assignments›

text ‹
Every CNF formula is satisfied by a set of assignments. Conversely, for certain
sets of assignments we can construct CNF formulas satisfied by exactly these
assignments.  This will be helpful later when we construct formulas for reducing
arbitrary languages to \SAT{} (see Section~\ref{s:Reducing}).
›

subsubsection ‹Universality of CNF formulas›

text ‹
A set (represented by a predicate) $F$ of assignments depends on the first
$\ell$ variables iff.\ any two assignments that agree on the first $\ell$
variables are either both in the set or both outside of the set.
›

definition depon :: "nat ⇒ (assignment ⇒ bool) ⇒ bool" where
"depon l F ≡ ∀α⇩1 α⇩2. (∀i<l. α⇩1 i = α⇩2 i) ⟶ F α⇩1 = F α⇩2"

text ‹
Lists of all strings of the same length:
›

fun str_of_len :: "nat ⇒ string list" where
"str_of_len 0 = [[]]" |
"str_of_len (Suc l) = map ((#) 𝕆) (str_of_len l) @ map ((#) 𝕀) (str_of_len l)"

lemma length_str_of_len: "length (str_of_len l) = 2 ^ l"
by (induction l) simp_all

lemma in_str_of_len_length: "xs ∈ set (str_of_len l) ⟹ length xs = l"
by (induction l arbitrary: xs) auto

lemma length_in_str_of_len: "length xs = l ⟹ xs ∈ set (str_of_len l)"
proof (induction l arbitrary: xs)
case 0
then show ?case
by simp
next
case (Suc l)
then obtain y ys where "xs = y # ys"
by (meson length_Suc_conv)
then have "length ys = l"
using Suc by simp
show ?case
proof (cases y)
case True
then have "xs ∈ set (map ((#) 𝕀) (str_of_len l))"
using length ys = l Suc xs = y # ys by simp
then show ?thesis
by simp
next
case False
then have "xs ∈ set (map ((#) 𝕆) (str_of_len l))"
using length ys = l Suc xs = y # ys by simp
then show ?thesis
by simp
qed
qed

text ‹
A predicate $F$ depending on the first $\ell$ variables $v_0, \dots, v_{\ell-1}$
can be regarded as a truth table over $\ell$ variables. The next lemma shows
that for every such truth table there exists a CNF formula with at most $2^\ell$
clauses and $\ell\cdot2^\ell$ literals over the first $\ell$ variables. This is
the well-known fact that every Boolean function (over $\ell$ variables) can be
represented by a CNF formula~\cite[Claim~2.13]{ccama}.
›

lemma depon_ex_formula:
assumes "depon l F"
shows "∃φ.
fsize φ ≤ l * 2 ^ l ∧
length φ ≤ 2 ^ l ∧
variables φ ⊆ {..<l} ∧
(∀α. F α = α ⊨ φ)"
proof -
define cl where "cl = (λv. map (λi. if v ! i then Neg i else Pos i) [0..<l])"
have cl1: "satisfies_clause a (cl v)" if "length v = l" and "v ≠ map a [0..<l]" for v a
proof -
obtain i where i: "i < l" "a i ≠ v ! i"
using ‹length v = l› ‹v ≠ map a [0..<l]›
by (smt (verit, best) atLeastLessThan_iff map_eq_conv map_nth set_upt)
then have *: "cl v ! i = (if v ! i then Neg i else Pos i)"
using cl_def by simp
then have "case (cl v ! i) of Neg n ⇒ ¬ a n | Pos n ⇒ a n"
using i(2) by simp
then show ?thesis
using cl_def * that(1) satisfies_clause_def i(1) by fastforce
qed
have cl2: "v ≠ map a [0..<l]" if "length v = l" and "satisfies_clause a (cl v)" for v a
proof
assume assm: "v = map a [0..<l]"
from that(2) have "∃x∈set (cl v). case x of Neg n ⇒ ¬ a n | Pos n ⇒ a n"
using satisfies_clause_def by simp
then obtain i where i: "i < l" and "case (cl v ! i) of Neg n ⇒ ¬ a n | Pos n ⇒ a n"
using cl_def by auto
then have "v ! i ≠ a i"
using cl_def by fastforce
then show False
using i assm by simp
qed

have filter_length_nth: "f (vs ! j)" if "vs = filter f sol" and "j < length vs"
for vs sol :: "'a list" and f j
using that nth_mem by (metis length_removeAll_less less_irrefl removeAll_filter_not)

have sum_list_map: "sum_list (map g xs) ≤ k * length xs" if "⋀x. x ∈ set xs ⟹ g x = k"
for xs :: "'a list" and g k
using that
proof (induction "length xs" arbitrary: xs)
case 0
then show ?case
by simp
next
case (Suc x)
then obtain y ys where "xs = y # ys"
by (metis length_Suc_conv)
then have "length ys = x"
using Suc by simp
have "y ∈ set xs"
using xs = y # ys by simp
have "sum_list (map g xs) = sum_list (map g (y # ys))"
using xs = y # ys by simp
also have "... = g y + sum_list (map g ys)"
by simp
also have "... = k + sum_list (map g ys)"
using Suc y ∈ set xs by simp
also have "... ≤ k + k * length ys"
using Suc length ys = x ‹xs = y # ys› by auto
also have "... = k * length xs"
by (metis Suc.hyps(2) ‹length ys = x› mult_Suc_right)
finally show ?case
by simp
qed

define vs where
"vs = filter (λv. F (λi. if i < l then v ! i else False) = False) (str_of_len l)"
define φ where "φ = map cl vs"

have "a ⊨ φ" if "F a" for a
proof -
define v where "v = map a [0..<l]"
then have "(λi. if i < l then v ! i else False) j = a j" if "j < l" for j
then have *: "F (λi. if i < l then v ! i else False)"
using assms(1) depon_def that by (smt (verit, ccfv_SIG))
have "satisfies_clause a c" if "c ∈ set φ" for c
proof -
obtain j where j: "c = φ ! j" "j < length φ"
using φ_def c ∈ set φ by (metis in_set_conv_nth)
then have "c = cl (vs ! j)"
using φ_def by simp
have "j < length vs"
using φ_def j by simp
then have "F (λi. if i < l then (vs ! j) ! i else False) = False"
using vs_def filter_length_nth by blast
then have "vs ! j ≠ v"
using * by auto
moreover have "length (vs ! j) = l"
using vs_def length_str_of_len ‹j < length vs›
by (smt (verit) filter_eq_nths in_str_of_len_length notin_set_nthsI nth_mem)
ultimately have "satisfies_clause a (cl (vs ! j))"
using v_def cl1 by simp
then show ?thesis
using c = cl (vs ! j) by simp
qed
then show ?thesis
using satisfies_def by simp
qed
moreover have "F α" if "α ⊨ φ" for α
proof (rule ccontr)
assume assm: "¬ F α"
define v where "v = map α [0..<l]"
have *: "F (λi. if i < l then v ! i else False) = False"
proof -
have "(λi. if i < l then v ! i else False) j = α j" if "j < l" for j
using v_def by (simp add: that)
then show ?thesis
using assm assms(1) depon_def by (smt (verit, best))
qed
have "length v = l"
using v_def by simp
then obtain j where "j < length (str_of_len l)" and "v = str_of_len l ! j"
by (metis in_set_conv_nth length_in_str_of_len)
then have "v ∈ set vs"
using vs_def * by fastforce
then have "cl v ∈ set φ"
using φ_def by simp
then have "satisfies_clause α (cl v)"
using that satisfies_def by simp
then have "v ≠ map α [0..<l]"
using length v = l cl2 by simp
then show False
using v_def by simp
qed
ultimately have "∀α. F α = α ⊨ φ"
by auto
moreover have "fsize φ ≤ l * 2 ^ l"
proof -
have "length c = l" if "c ∈ set φ" for c
using that cl_def φ_def by auto
then have "fsize φ ≤ l * length φ"
unfolding fsize_def using sum_list_map by auto
also have "... ≤ l * length (str_of_len l)"
using φ_def vs_def by simp
also have "... = l * 2 ^ l"
using length_str_of_len by simp
finally show ?thesis .
qed
moreover have "length φ ≤ 2 ^ l"
proof -
have "length φ ≤ length (str_of_len l)"
using φ_def vs_def by simp
also have "... = 2 ^ l"
using length_str_of_len by simp
finally show ?thesis .
qed
moreover have "variables φ ⊆ {..<l}"
proof
fix x assume "x ∈ variables φ"
then obtain c where c: "c ∈ set φ" "Neg x ∈ set c ∨ Pos x ∈ set c"
using variables_def by auto
then obtain v where v: "v ∈ set (str_of_len l)" "c = cl v"
using φ_def vs_def by auto
then show "x ∈ {..<l}"
using cl_def c by auto
qed
ultimately show ?thesis
by auto
qed

subsubsection ‹Substitutions of variables›

text ‹
We will sometimes consider CNF formulas over the first $\ell$ variables and
derive other CNF formulas from them by substituting these variables.  Such a
substitution will be represented by a list $\sigma$ of length at least $\ell$,
meaning that the variable $v_i$ is replaced by $v_{\sigma(i)}$. We will call
this operation on formulas \emph{relabel}, and the corresponding one on literals
\emph{rename}:
›

fun rename :: "nat list ⇒ literal ⇒ literal" where
"rename σ (Neg i) = Neg (σ ! i)" |
"rename σ (Pos i) = Pos (σ ! i)"

definition relabel :: "nat list ⇒ formula ⇒ formula" where
"relabel σ φ ≡ map (map (rename σ)) φ"

lemma fsize_relabel: "fsize (relabel σ φ) = fsize φ"
using relabel_def fsize_def by (metis length_concat length_map map_concat)

text ‹
A substitution $\sigma$ can also be applied to an assignment and to a list of
variable indices:
›

definition remap :: "nat list ⇒ assignment ⇒ assignment" where
"remap σ α i ≡ if i < length σ then α (σ ! i) else α i"

definition reseq :: "nat list ⇒ nat list ⇒ nat list" where
"reseq σ vs ≡ map ((!) σ) vs"

lemma length_reseq [simp]: "length (reseq σ vs) = length vs"
using reseq_def by simp

text ‹
Relabeling a formula and remapping an assignment are equivalent in a sense.
›

lemma satisfies_sigma:
assumes "variables φ ⊆ {..<length σ}"
shows "α ⊨ relabel σ φ ⟷ remap σ α ⊨ φ"
proof
assume sat: "α ⊨ relabel σ φ"
have "satisfies_clause (remap σ α) c" if "c ∈ set φ" for c
proof -
obtain i where "i < length φ" "φ ! i = c"
by (meson ‹c ∈ set φ› in_set_conv_nth)
then have "satisfies_clause α (map (rename σ) c)"
(is "satisfies_clause α ?c")
using sat satisfies_def relabel_def by auto
then obtain x where "x∈set ?c" "case x of Neg n ⇒ ¬ α n | Pos n ⇒ α n"
using satisfies_clause_def by auto
then obtain j where j: "j < length ?c" "case (?c ! j) of Neg n ⇒ ¬ α n | Pos n ⇒ α n"
by (metis in_set_conv_nth)
have "case c ! j of Neg n ⇒ ¬ (remap σ α) n | Pos n ⇒ (remap σ α) n"
proof (cases "c ! j")
case (Neg n)
then have 1: "?c ! j = Neg (σ ! n)"
using j(1) by simp
have "n ∈ variables φ"
using Neg j(1) nth_mem that variables_def by force
then have "n < length σ"
using assms by auto
then show ?thesis
using Neg 1 j(2) remap_def by auto
next
case (Pos n)
then have 1: "?c ! j = Pos (σ ! n)"
using j(1) by simp
have "n ∈ variables φ"
using Pos j(1) nth_mem that variables_def by force
then have "n < length σ"
using assms by auto
then show ?thesis
using Pos 1 j(2) remap_def by auto
qed
then show ?thesis
using satisfies_clause_def j by auto
qed
then show "remap σ α ⊨ φ"
using satisfies_def by simp
next
assume sat: "remap σ α ⊨ φ"
have "satisfies_clause α c" if "c ∈ set (relabel σ φ)" for c
proof -
let ?phi = "relabel σ φ"
let ?beta = "remap σ α"
obtain i where i: "i < length ?phi" "?phi ! i = c"
by (meson ‹c ∈ set ?phi› in_set_conv_nth)
then have "satisfies_clause ?beta (φ ! i)"
(is "satisfies_clause _ ?c")
using sat satisfies_def relabel_def by simp
then obtain x where "x∈set ?c" "case x of Neg n ⇒ ¬ ?beta n | Pos n ⇒ ?beta n"
using satisfies_clause_def by auto
then obtain j where j: "j < length ?c" "case (?c ! j) of Neg n ⇒ ¬ ?beta n | Pos n ⇒ ?beta n"
by (metis in_set_conv_nth)
then have ren: "c ! j = rename σ (?c ! j)"
using i relabel_def by auto
have "case c ! j of Neg n ⇒ ¬ α n | Pos n ⇒ α n"
proof (cases "?c ! j")
case (Neg n)
then have *: "c ! j = Neg (σ ! n)"
have "n ∈ variables φ"
using Neg i j variables_def that length_map mem_Collect_eq nth_mem relabel_def by force
then have "n < length σ"
using assms by auto
moreover have "¬ (remap σ α) n"
using j(2) Neg by simp
ultimately have "¬ α (σ ! n)"
using remap_def by simp
then show ?thesis
next
case (Pos n)
then have *: "c ! j = Pos (σ ! n)"
have "n ∈ variables φ"
using Pos i j variables_def that length_map mem_Collect_eq nth_mem relabel_def by force
then have "n < length σ"
using assms by auto
moreover have "(remap σ α) n"
using j(2) Pos by simp
ultimately have "α (σ ! n)"
using remap_def by simp
then show ?thesis
qed
moreover have "length c = length (φ ! i)"
using relabel_def i by auto
ultimately show ?thesis
using satisfies_clause_def j by auto
qed
then show "α ⊨ relabel σ φ"
qed

subsection ‹Representing CNF formulas as strings\label{s:sat-sat-repr}›

text ‹
By identifying negated literals with even numbers and positive literals with odd
numbers, we can identify literals with natural numbers. This yields a
straightforward representation of a clause as a list of numbers and of a CNF
formula as a list of lists of numbers. Such a list can, in turn, be represented
as a symbol sequence over a quaternary alphabet as described in
Section~\ref{s:tm-numlistlist}, which ultimately can be encoded over a binary
alphabet (see Section~\ref{s:tm-quaternary}). This is essentially how we
represent CNF formulas as strings.

We have to introduce a bunch of functions for mapping between these
representations.

\null
›

fun literal_n :: "literal ⇒ nat" where
"literal_n (Neg i) = 2 * i" |
"literal_n (Pos i) = Suc (2 * i)"

definition n_literal :: "nat ⇒ literal" where
"n_literal n ≡ if even n then Neg (n div 2) else Pos (n div 2)"

lemma n_literal_n: "n_literal (literal_n x) = x"
using n_literal_def by (cases x) simp_all

lemma literal_n_literal: "literal_n (n_literal n) = n"
using n_literal_def by simp

definition clause_n :: "clause ⇒ nat list" where
"clause_n cl ≡ map literal_n cl"

definition n_clause :: "nat list ⇒ clause" where
"n_clause ns ≡ map n_literal ns"

lemma n_clause_n: "n_clause (clause_n cl) = cl"
using n_clause_def clause_n_def n_literal_n by (simp add: map_idI)

lemma clause_n_clause: "clause_n (n_clause n) = n"
using n_clause_def clause_n_def literal_n_literal by (simp add: map_idI)

definition formula_n :: "formula ⇒ nat list list" where
"formula_n φ ≡ map clause_n φ"

definition n_formula :: "nat list list ⇒ formula" where
"n_formula nss ≡ map n_clause nss"

lemma n_formula_n: "n_formula (formula_n φ) = φ"
using n_formula_def formula_n_def n_clause_n by (simp add: map_idI)

lemma formula_n_formula: "formula_n (n_formula nss) = nss"
using n_formula_def formula_n_def clause_n_clause by (simp add: map_idI)

definition formula_zs :: "formula ⇒ symbol list" where
"formula_zs φ ≡ numlistlist (formula_n φ)"

text ‹
The mapping between formulas and well-formed symbol sequences for
lists of lists of numbers is bijective.
›

lemma formula_n_inj: "formula_n φ⇩1 = formula_n φ⇩2 ⟹ φ⇩1 = φ⇩2"
using n_formula_n by metis

definition zs_formula :: "symbol list ⇒ formula" where
"zs_formula zs ≡ THE φ. formula_zs φ = zs"

lemma zs_formula:
assumes "numlistlist_wf zs"
shows "∃!φ. formula_zs φ = zs"
proof -
obtain nss where nss: "numlistlist nss = zs"
using assms numlistlist_wf_def by auto
let ?phi = "n_formula nss"
have *: "formula_n ?phi = nss"
using nss formula_n_formula by simp
then have "formula_zs ?phi = zs"
using nss formula_zs_def by simp
then have "∃φ. formula_zs φ = zs"
by auto
moreover have "φ = ?phi" if "formula_zs φ = zs" for φ
proof -
have "numlistlist (formula_n φ) = zs"
using that formula_zs_def by simp
then have "nss = formula_n φ"
using nss numlistlist_inj by simp
then show ?thesis
using formula_n_inj * by simp
qed
ultimately show ?thesis
by auto
qed

lemma zs_formula_zs: "zs_formula (formula_zs φ) = φ"
by (simp add: formula_n_inj formula_zs_def numlistlist_inj the_equality zs_formula_def)

lemma formula_zs_formula:
assumes "numlistlist_wf zs"
shows "formula_zs (zs_formula zs) = zs"
using assms zs_formula zs_formula_zs by metis

text ‹
There will of course be Turing machines that perform computations on formulas.
In order to bound their running time, we need bounds for the length of the
symbol representation of formulas.
›

lemma nlength_literal_n_Pos: "nlength (literal_n (Pos n)) ≤ Suc (nlength n)"
using nlength_times2plus1 by simp

lemma nlength_literal_n_Neg: "nlength (literal_n (Neg n)) ≤ Suc (nlength n)"
using nlength_times2 by simp

lemma nlllength_formula_n:
fixes V :: nat and φ :: formula
assumes "⋀v. v ∈ variables φ ⟹ v ≤ V"
shows "nlllength (formula_n φ) ≤ fsize φ * Suc (Suc (nlength V)) + length φ"
using assms
proof (induction φ)
case Nil
then show ?case
using formula_n_def by simp
next
case (Cons cl φ)
then have 0: "⋀v. v ∈ variables φ ⟹ v ≤ V"
using variables_def by simp
have 1: "n ≤ V" if "Pos n ∈ set cl" for n
using that variables_def by (simp add: Cons.prems)
have 2: "n ≤ V" if "Neg n ∈ set cl" for n
using that variables_def by (simp add: Cons.prems)
have 3: "nlength (literal_n v) ≤ Suc (nlength V)" if "v ∈ set cl" for v
proof (cases v)
case (Neg n)
then have "nlength (literal_n v) ≤ Suc (nlength n)"
using nlength_literal_n_Neg by blast
moreover have "n ≤ V"
using Neg that 2 by simp
ultimately show ?thesis
using nlength_mono by fastforce
next
case (Pos n)
then have "nlength (literal_n v) ≤ Suc (nlength n)"
using nlength_literal_n_Pos by blast
moreover have "n ≤ V"
using Pos that 1 by simp
ultimately show ?thesis
using nlength_mono by fastforce
qed

have "nllength (clause_n cl) = length (numlist (map literal_n cl))"
using clause_n_def nllength_def by simp
have "nllength (clause_n cl) = (∑n←(map literal_n cl). Suc (nlength n))"
using clause_n_def nllength by simp
also have "... = (∑v←cl. Suc (nlength (literal_n v)))"
proof -
have "map (λn. Suc (nlength n)) (map literal_n cl) = map (λv. Suc (nlength (literal_n v))) cl"
by simp
then show ?thesis
by metis
qed
also have "... ≤ (∑v←cl. Suc (Suc (nlength V)))"
using sum_list_mono[of cl "λv. Suc (nlength (literal_n v))" "λv. Suc (Suc (nlength V))"] 3
by simp
also have "... = Suc (Suc (nlength V)) * length cl"
using sum_list_const by blast
finally have 4: "nllength (clause_n cl) ≤ Suc (Suc (nlength V)) * length cl" .

have "concat (map (λns. numlist ns @ [5]) (map clause_n (cl # φ))) =
(numlist (clause_n cl) @ [5]) @ concat (map (λns. numlist ns @ [5]) (map clause_n φ))"
by simp
then have "length (concat (map (λns. numlist ns @ [5]) (map clause_n (cl # φ)))) =
length ((numlist (clause_n cl) @ [5]) @ concat (map (λns. numlist ns @ [5]) (map clause_n φ)))"
by simp
then have "nlllength (formula_n (cl # φ)) =
length ((numlist (clause_n cl) @ [5]) @ concat (map (λns. numlist ns @ [5]) (map clause_n φ)))"
using formula_n_def numlistlist_def nlllength_def by simp
also have "... = length (numlist (clause_n cl) @ [5]) + length (concat (map (λns. numlist ns @ [5]) (map clause_n φ)))"
by simp
also have "... = length (numlist (clause_n cl) @ [5]) + nlllength (formula_n φ)"
using formula_n_def numlistlist_def nlllength_def by simp
also have "... = Suc (nllength (clause_n cl)) + nlllength (formula_n φ)"
using nllength_def by simp
also have "... ≤ Suc (Suc (Suc (nlength V)) * length cl) + nlllength (formula_n φ)"
using 4 by simp
also have "... ≤ Suc (Suc (Suc (nlength V)) * length cl) + fsize φ * Suc (Suc (nlength V)) + length φ"
using Cons 0 by simp
also have "... = fsize (cl # φ) * Suc (Suc (nlength V)) + length (cl # φ)"
finally show ?case
by simp
qed

text ‹
Since \SAT{} is supposed to be a set of strings rather than symbol
sequences, we need to map symbol sequences representing formulas to strings:
›

abbreviation formula_to_string :: "formula ⇒ string" where
"formula_to_string φ ≡ symbols_to_string (binencode (numlistlist (formula_n φ)))"

lemma formula_to_string_inj:
assumes "formula_to_string φ⇩1 = formula_to_string φ⇩2"
shows "φ⇩1 = φ⇩2"
proof -
let ?xs1 = "binencode (numlistlist (formula_n φ⇩1))"
let ?xs2 = "binencode (numlistlist (formula_n φ⇩2))"
have bin1: "binencodable (numlistlist (formula_n φ⇩1))"
by (simp add: Suc_le_eq numeral_2_eq_2 proper_symbols_numlistlist symbols_lt_numlistlist)
then have "bit_symbols ?xs1"
using bit_symbols_binencode by simp
then have 1: "string_to_symbols (symbols_to_string ?xs1) = ?xs1"
using bit_symbols_to_symbols by simp

have bin2: "binencodable (numlistlist (formula_n φ⇩2))"
by (simp add: Suc_le_eq numeral_2_eq_2 proper_symbols_numlistlist symbols_lt_numlistlist)
then have "bit_symbols ?xs2"
using bit_symbols_binencode by simp
then have "string_to_symbols (symbols_to_string ?xs2) = ?xs2"
using bit_symbols_to_symbols by simp
then have "?xs1 = ?xs2"
using 1 assms by simp
then have "numlistlist (formula_n φ⇩1) = numlistlist (formula_n φ⇩2)"
using binencode_inj bin1 bin2 by simp
then have "formula_n φ⇩1 = formula_n φ⇩2"
using numlistlist_inj by simp
then show ?thesis
using formula_n_inj by simp
qed

text ‹
While @{const formula_to_string} maps every CNF formula to a string, not every
string represents a CNF formula. We could just ignore such invalid strings and
define \SAT{} to only contain well-formed strings.  But this would implicitly
place these invalid strings in the complement of \SAT{}. While this does not
cause us any issues here, it would if we were to introduce co-$\NP$ and wanted
to show that $\overline{\mathtt{SAT}}$ is in co-$\NP$, as we would then have to
deal with the invalid strings. So it feels a little like cheating to ignore the
invalid strings like this.

Arora and Barak~\cite[p.~45 footnote~3]{ccama} recommend mapping invalid strings
to some fixed formula''.  A natural candidate for this fixed formula is the
empty CNF, since an invalid string in a sense represents nothing, and the empty
CNF formula is represented by the empty string. Since the empty CNF formula is
satisfiable this implies that all invalid strings become elements of \SAT{}.

We end up with the following definition of the protagonist of this article:
›

definition SAT :: language where
"SAT ≡ {formula_to_string φ | φ. satisfiable φ} ∪ {x. ¬ (∃φ. x = formula_to_string φ)}"

section ‹\SAT{} is in $\NP$\label{s:Sat-np}›

text ‹
In order to show that \SAT{} is in $\NP$, we will construct a polynomial-time
Turing machine $M$ and specify a polynomial function $p$ such that for every
$x$, $x\in \SAT$ iff. there is a $u\in\bbOI^{p(|x|)}$ such that $M$ outputs
\textbf{1} on $\langle x; u\rangle$.

The idea is straightforward: Let $\phi$ be the formula represented by the
string $x$.  Interpret the string $u$ as a list of variables and interpret this
as the assignment that assigns True to only the variables in the list. Then
check if the assignment satisfies the formula. This check is obviously''
possible in polynomial time because $M$ simply has to iterate over all clauses
and check if at least one literal per clause is true under the assignment.
Checking if a literal is true is simply a matter of checking whether the
literal's variable is in the list $u$. If the assignment satisfies $\phi$,
output \textbf{1}, otherwise the empty symbol sequence.

If $\phi$ is unsatisfiable then no assignment, hence no $u$ no matter the length
will make $M$ output \textbf{1}. On the other hand, if $\phi$ is satisfiable
then there will be a satisfying assignment where a subset of the variables in
$\phi$ are assigned True. Hence there will be a list of variables of at most
roughly the length of $\phi$. So setting the polynomial $p$ to something like
$p(n) = n$ should suffice.

In fact, as we shall see, $p(n) = n$ suffices. This is so because in our
representation, the string $x$, being a list of lists, has slightly more
overhead per number than the plain list $u$ has. Therefore listing all variables
in $\phi$ is guaranteed to need fewer symbols than $x$ has.

There are several technical details to work out. First of all, the input to $M$
need not be a well-formed pair. And if it is, the pair $\langle x, u\rangle$ has
to be decoded into separate components $x$ and $u$. These have to be decoded
again from the binary to the quaternary alphabet, which is only possible if both
$x$ and $u$ comprise only bit symbols (\textbf{01}). Then $M$ needs to check if
the decoded $x$ and $u$ are valid symbol sequences for lists (of lists) of
numbers. In the case of $u$ this is particularly finicky because the definition
of $\NP$ requires us to provide a string $u$ of exactly the length $p(|x|)$ and
so we have to pad $u$ with extra symbols, which have to be stripped again before
the validation can take place.

In the first subsection we describe what the verifier TM has to do in terms of
symbol sequences. In the subsections after that we devise a Turing machine that
implements this behavior.
›

subsection ‹Verifying \SAT{} instances›

text ‹
Our verifier Turing machine for \SAT{} will implement the following function;
that is, on input @{term zs} it will output @{term "verify_sat zs"}.  It
performs a number of decodings and well-formedness checks and outputs either
\textbf{1} or the empty symbol sequence.
›

definition verify_sat :: "symbol list ⇒ symbol list" where
"verify_sat zs ≡
let
ys = bindecode zs;
xs = bindecode (first ys);
vs = rstrip ♯ (bindecode (second ys))
in
if even (length (first ys)) ∧ bit_symbols (first ys) ∧ numlistlist_wf xs
then if bit_symbols (second ys) ∧ numlist_wf vs
then if (λv. v ∈ set (zs_numlist vs)) ⊨ zs_formula xs then [𝟭] else []
else []
else [𝟭]"

text ‹
Next we show that @{const verify_sat} behaves as is required of a verifier TM
for \SAT. Its polynomial running time is the subject of the next subsection.
›

text ‹
First we consider the case that @{term zs} encodes a pair $\langle x, u\rangle$
of strings where $x$ does not represent a CNF formula. Such an $x$ is in \SAT{},
hence the verifier TM is supposed to output \textbf{1}.
›

lemma ex_phi_x:
assumes "xs = string_to_symbols x"
assumes "even (length xs)" and "numlistlist_wf (bindecode xs)"
shows "∃φ. x = formula_to_string φ"
proof -
obtain nss where "numlistlist nss = bindecode xs"
using assms(3) numlistlist_wf_def by auto
moreover have "binencode (bindecode xs) = xs"
using assms(1,2) binencode_decode by simp
ultimately have "binencode (numlistlist nss) = xs"
by simp
then have "binencode (numlistlist (formula_n (n_formula nss))) = xs"
using formula_n_formula by simp
then have "formula_to_string (n_formula nss) = symbols_to_string xs"
by simp
then show ?thesis
using assms(1) symbols_to_string_to_symbols by auto
qed

lemma verify_sat_not_wf_phi:
assumes "zs = ⟨x; u⟩" and "¬ (∃φ. x = formula_to_string φ)"
shows "verify_sat zs = [𝟭]"
proof -
define ys where "ys = bindecode zs"
then have first_ys: "first ys = string_to_symbols x"
using first_pair assms(1) by simp
then have 2: "bit_symbols (first ys)"
using assms(1) bit_symbols_first ys_def by presburger

define xs where "xs = bindecode (first ys)"
then have "¬ (even (length (first ys)) ∧ bit_symbols (first ys) ∧ numlistlist_wf xs)"
using first_ys ex_phi_x assms(2) by auto
then show "verify_sat zs = [𝟭]"
unfolding verify_sat_def Let_def using ys_def xs_def by simp
qed

text ‹
The next case is that @{term zs} represents a pair $\langle x, u\rangle$ where
$x$ represents an unsatisfiable CNF formula. This $x$ is thus not in \SAT{} and
the verifier TM must output something different from \textbf{1}, such as the
empty string, regardless of $u$.
›

lemma verify_sat_not_sat:
fixes φ :: formula
assumes "zs = ⟨formula_to_string φ; u⟩" and "¬ satisfiable φ"
shows "verify_sat zs = []"
proof -
have bs_phi: "bit_symbols (binencode (formula_zs φ))"
using bit_symbols_binencode formula_zs_def proper_symbols_numlistlist symbols_lt_numlistlist
by (metis Suc_le_eq dual_order.refl numeral_2_eq_2)

define ys where "ys = bindecode zs"
then have "first ys = string_to_symbols (formula_to_string φ)"
using first_pair assms(1) by simp
then have first_ys: "first ys = binencode (formula_zs φ)"
using bit_symbols_to_symbols bs_phi formula_zs_def by simp
then have 2: "bit_symbols (first ys)"
using assms(1) bit_symbols_first ys_def by presburger
have 22: "even (length (first ys))"
using first_ys by simp

define xs where "xs = bindecode (first ys)"
define vs where "vs = rstrip 5 (bindecode (second ys))"

have wf_xs: "numlistlist_wf xs"
using xs_def first_ys bindecode_encode formula_zs_def numlistlist_wf_def numlistlist_wf_has2'
by (metis le_simps(3) numerals(2))

have φ: "zs_formula xs = φ"
using xs_def first_ys "2" binencode_decode formula_to_string_inj formula_zs_def formula_zs_formula wf_xs
by auto

have "verify_sat zs =
(if bit_symbols (second ys) ∧ numlist_wf vs
then if (λv. v ∈ set (zs_numlist vs)) ⊨ φ then [3] else []
else [])"
unfolding verify_sat_def Let_def using ys_def xs_def vs_def 2 22 wf_xs φ by simp
then show "verify_sat zs = []"
using assms(2) satisfiable_def by simp
qed

text ‹
Next we consider the case in which @{term zs} represents a pair $\langle x, u\rangle$ where $x$ represents a satisfiable CNF formula and $u$ a list of
numbers padded at the right with @{text ♯} symbols. This $u$ thus represents a
variable assignment, namely the one assigning True to exactly the variables in
the list. The verifier TM is to output \textbf{1} iff.\ this assignment
satisfies the CNF formula represented by $x$.

First we show that stripping away @{text ♯} symbols does not damage a symbol
sequence representing a list of numbers.
›

lemma rstrip_numlist_append: "rstrip ♯ (numlist vars @ replicate pad ♯) = numlist vars"
(is "rstrip ♯ ?zs = ?ys")
proof -
have "(LEAST i. i ≤ length ?zs ∧ set (drop i ?zs) ⊆ {♯}) = length ?ys"
proof (rule Least_equality)
show "length ?ys ≤ length ?zs ∧ set (drop (length ?ys) ?zs) ⊆ {♯}"
by auto
show "length ?ys ≤ m" if "m ≤ length ?zs ∧ set (drop m ?zs) ⊆ {♯}" for m
proof (rule ccontr)
assume "¬ length ?ys ≤ m"
then have "m < length ?ys"
by simp
then have "?ys ! m ∈ set (drop m ?ys)"
by (metis Cons_nth_drop_Suc list.set_intros(1))
moreover have "set (drop m ?ys) ⊆ {♯}"
using that by simp
ultimately have "?ys ! m = ♯"
by auto
moreover have "?ys ! m < ♯"
using symbols_lt_numlist m < length ?ys by simp
ultimately show False
by simp
qed
qed
then show ?thesis
using rstrip_def by simp
qed

lemma verify_sat_wf:
fixes φ :: formula and pad :: nat
assumes "zs = ⟨formula_to_string φ; symbols_to_string (binencode (numlist vars @ replicate pad ♯))⟩"
shows "verify_sat zs = (if (λv. v ∈ set vars) ⊨ φ then [𝟭] else [])"
proof -
have bs_phi: "bit_symbols (binencode (formula_zs φ))"
using bit_symbols_binencode formula_zs_def proper_symbols_numlistlist symbols_lt_numlistlist
by (metis Suc_le_eq dual_order.refl numeral_2_eq_2)

have "binencodable (numlist vars @ replicate pad ♯)"
using proper_symbols_numlist symbols_lt_numlist binencodable_append[of "numlist vars" "replicate pad ♯"]
by fastforce
then have bs_vars: "bit_symbols (binencode (numlist vars @ replicate pad ♯))"
using bit_symbols_binencode by simp

define ys where "ys = bindecode zs"
then have "first ys = string_to_symbols (formula_to_string φ)"
using first_pair assms(1) by simp
then have first_ys: "first ys = binencode (formula_zs φ)"
using bit_symbols_to_symbols bs_phi formula_zs_def by simp
then have bs_first: "bit_symbols (first ys)"
using assms(1) bit_symbols_first ys_def by presburger

have even: "even (length (first ys))"
using first_ys by simp

have second_ys: "second ys = binencode (numlist vars @ replicate pad ♯)"
using bs_vars ys_def assms(1) bit_symbols_to_symbols second_pair by simp
then have bs_second: "bit_symbols (second ys)"
using bs_vars by simp

define xs where "xs = bindecode (first ys)"
define vs where "vs = rstrip ♯ (bindecode (second ys))"

then have "vs = rstrip ♯ (numlist vars @ replicate pad ♯)"
using second_ys ‹binencodable (numlist vars @ replicate pad ♯)› bindecode_encode by simp
then have vs: "vs = numlist vars"
using rstrip_numlist_append by simp

have wf_xs: "numlistlist_wf xs"
using xs_def first_ys bindecode_encode formula_zs_def numlistlist_wf_def numlistlist_wf_has2'
by (metis le_simps(3) numerals(2))

have "verify_sat zs =
(if even (length (first ys)) ∧ bit_symbols (first ys) ∧ numlistlist_wf xs
then if bit_symbols (second ys) ∧ numlist_wf vs
then if (λv. v ∈ set (zs_numlist vs)) ⊨ zs_formula xs then [𝟭] else []
else []
else [3])"
unfolding verify_sat_def Let_def using bs_second ys_def xs_def vs_def by simp
then have *: "verify_sat zs =
(if bit_symbols (second ys) ∧ numlist_wf vs
then if (λv. v ∈ set (zs_numlist vs)) ⊨ zs_formula xs then [𝟭] else []
else [])"
unfolding verify_sat_def Let_def using even bs_first wf_xs by simp

have "xs = formula_zs φ"
using xs_def bindecode_encode formula_zs_def first_ys proper_symbols_numlistlist symbols_lt_numlistlist
then have φ: "φ = zs_formula xs"

have vars: "vars = zs_numlist vs"
using vs numlist_wf_def numlist_zs_numlist zs_numlist_ex1 by blast
then have wf_vs: "numlist_wf vs"
using numlist_wf_def vs by auto

have "verify_sat zs = (if (λv. v ∈ set (zs_numlist vs)) ⊨ zs_formula xs then [𝟭] else [])"
using * bs_second wf_xs wf_vs by simp
then show ?thesis
using φ vars by simp
qed

text ‹
Finally we show that for every string $x$ representing a satisfiable CNF formula
there is a list of numbers representing a satisfying assignment and represented
by a string of length at most $|x|$. That means there is always a string of
exactly the length of $x$ consisting of a satisfying assignment plus some
›

lemma nllength_remove1:
assumes "n ∈ set ns"
shows "nllength (n # remove1 n ns) = nllength ns"
using assms nllength sum_list_map_remove1[of n ns "λn. Suc (nlength n)"] by simp

lemma nllength_distinct_le:
assumes "distinct ns"
and "set ns ⊆ set ms"
shows "nllength ns ≤ nllength ms"
using assms
proof (induction ms arbitrary: ns)
case Nil
then show ?case
by simp
next
case (Cons m ms)
show ?case
proof (cases "m ∈ set ns")
case True
let ?ns = "remove1 m ns"
have "set ?ns ⊆ set ms"
using Cons by auto
moreover have "distinct ?ns"
using Cons by simp
ultimately have *: "nllength ?ns ≤ nllength ms"
using Cons by simp

have "nllength ns = nllength (m # ?ns)"
using True nllength_remove1 by simp
also have "... = Suc (nlength m) + nllength ?ns"
using nllength by simp
also have "... ≤ Suc (nlength m) + nllength ms"
using * by simp
also have "... = nllength (m # ms)"
using nllength by simp
finally show ?thesis .
next
case False
then have "set ns ⊆ set ms"
using Cons by auto
moreover have "distinct ns"
using Cons by simp
ultimately have "nllength ns ≤ nllength ms"
using Cons by simp
then show ?thesis
using nllength by simp
qed
qed

lemma nlllength_nllength_concat: "nlllength nss = nllength (concat nss) + length nss"
using nlllength nllength by (induction nss) simp_all

fun variable :: "literal ⇒ nat" where
"variable (Neg i) = i" |
"variable (Pos i) = i"

lemma sum_list_eq: "ns = ms ⟹ sum_list ns = sum_list ms"
by simp

lemma nllength_clause_le: "nllength (clause_n cl) ≥ nllength (map variable cl)"
proof -
have "variable x ≤ literal_n x" for x
by (cases x) simp_all
then have *: "Suc (nlength (variable x)) ≤ Suc (nlength (literal_n x))" for x
using nlength_mono by simp

let ?ns = "map literal_n cl"
have "nllength (clause_n cl) = nllength ?ns"
using clause_n_def by presburger
also have "... = (∑n←?ns. Suc (nlength n))"
using nllength by simp
also have "... = (∑x←cl. Suc (nlength (literal_n x)))"
by (smt (verit, del_insts) length_map nth_equalityI nth_map)
also have "... ≥ (∑x←cl. Suc (nlength (variable x)))"
using * by (simp add: sum_list_mono)
finally have "(∑x←cl. Suc (nlength (variable x))) ≤ nllength (clause_n cl)"
by simp
moreover have "(∑x←cl. Suc (nlength (variable x))) = nllength (map variable cl)"
proof -
have 1: "map (λx. Suc (nlength (variable x))) cl = map (λn. Suc (nlength n)) (map variable cl)"
by simp
then have "(∑x←cl. Suc (nlength (variable x))) = (∑n←(map variable cl). Suc (nlength n))"
using sum_list_eq[OF 1] by simp
then show ?thesis
using nllength by simp
qed
ultimately show ?thesis
by simp
qed

lemma nlllength_formula_ge: "nlllength (formula_n φ) ≥ nlllength (map (map variable) φ)"
proof (induction φ)
case Nil
then show ?case
by simp
next
case (Cons cl φ)
have "nlllength (map (map variable) (cl # φ)) =
nlllength (map (map variable) [cl]) + nlllength (map (map variable) φ)"
using nlllength by simp
also have "... = Suc (nllength (map variable cl)) + nlllength (map (map variable) φ)"
using nlllength by simp
also have "... ≤ Suc (nllength (map variable cl)) + nlllength (formula_n φ)"
using Cons by simp
also have "... ≤ Suc (nllength (clause_n cl)) + nlllength (formula_n φ)"
using nllength_clause_le by simp
also have "... = nlllength (formula_n (cl # φ))"
using nlllength by (simp add: formula_n_def)
finally show ?case .
qed

lemma variables_shorter_formula:
fixes φ :: formula and vars :: "nat list"
assumes "set vars ⊆ variables φ" and "distinct vars"
shows "nllength vars ≤ nlllength (formula_n φ)"
proof -
let ?nss = "map (map variable) φ"
have "nllength (concat ?nss) ≤ nlllength ?nss"
using nlllength_nllength_concat by simp
then have *: "nllength (concat ?nss) ≤ nlllength (formula_n φ)"
using nlllength_formula_ge by (meson le_trans)

have "set vars ⊆ set (concat ?nss)"
proof
fix n :: nat
assume "n ∈ set vars"
then have "n ∈ variables φ"
using assms(1) by auto
then obtain c where c: "c ∈ set φ" "Neg n ∈ set c ∨ Pos n ∈ set c"
using variables_def by auto
then obtain x where x: "x ∈ set c" "variable x = n"
by auto
then show "n ∈ set (concat (map (map variable) φ))"
using c by auto
qed
then have "nllength vars ≤ nllength (concat ?nss)"
using nllength_distinct_le assms(2) by simp
then show ?thesis
using * by simp
qed

lemma ex_assignment_linear_length:
assumes "satisfiable φ"
shows "∃vars. (λv. v ∈ set vars) ⊨ φ ∧ nllength vars ≤ nlllength (formula_n φ)"
proof -
obtain α where α: "α ⊨ φ"
using assms satisfiable_def by auto
define poss where "poss = {v. v ∈ variables φ ∧ α v}"
then have "finite poss"
using finite_variables by simp

let ?beta = "λv. v ∈ poss"
have sat: "?beta ⊨ φ"
unfolding satisfies_def
proof
fix c :: clause
assume "c ∈ set φ"
then have "satisfies_clause α c"
using satisfies_def α by simp
then obtain x where x: "x ∈ set c" "satisfies_literal α x"
using satisfies_clause_def by auto
show "satisfies_clause ?beta c"
proof (cases x)
case (Neg n)
then have "¬ α n"
using x(2) by simp
then have "n ∉ poss"
using poss_def by simp
then have "¬ ?beta n"
by simp
then have "satisfies_literal ?beta x"
using Neg by simp
then show ?thesis
using satisfies_clause_def x(1) by auto
next
case (Pos n)
then have "α n"
using x(2) by simp
then have "n ∈ poss"
using poss_def Pos ‹c ∈ set φ› variables_def x(1) by auto
then have "?beta n"
by simp
then have "satisfies_literal ?beta x"
using Pos by simp
then show ?thesis
using satisfies_clause_def x(1) by auto
qed
qed

obtain vars where vars: "set vars = poss" "distinct vars"
using finite poss by (meson finite_distinct_list)
moreover from this have "set vars ⊆ variables φ"
using poss_def by simp
ultimately have "nllength vars ≤ nlllength (formula_n φ)"
using variables_shorter_formula by simp
moreover have "(λv. v ∈ set vars) ⊨ φ"
using vars(1) sat by simp
ultimately show ?thesis
by auto
qed

lemma ex_witness_linear_length:
fixes φ :: formula
assumes "satisfiable φ"
shows "∃us.
bit_symbols us ∧
length us = length (formula_to_string φ) ∧
verify_sat ⟨formula_to_string φ; symbols_to_string us⟩ = [𝟭]"
proof -
obtain vars where vars:
"(λv. v ∈ set vars) ⊨ φ"
"nllength vars ≤ nlllength (formula_n φ)"
using assms ex_assignment_linear_length by auto

then have "nllength vars + pad = nlllength (formula_n φ)"
using vars(2) by simp
moreover define us where "us = numlist vars @ replicate pad ♯"
ultimately have "length us = nlllength (formula_n φ)"
then have "length (binencode us) = length (formula_to_string φ)" (is "length ?us = _")
moreover have "verify_sat ⟨formula_to_string φ; symbols_to_string ?us⟩ = [𝟭]"
using us_def vars(1) assms verify_sat_wf by simp
moreover have "bit_symbols ?us"
proof -
have "binencodable (numlist vars)"
using proper_symbols_numlist symbols_lt_numlist
by (metis Suc_leI lessI less_Suc_numeral numeral_2_eq_2 numeral_le_iff numeral_less_iff
order_less_le_trans pred_numeral_simps(3) semiring_norm(74))
moreover have "binencodable (replicate pad ♯)"
by simp
ultimately have "binencodable us"
using us_def binencodable_append by simp
then show ?thesis
using bit_symbols_binencode by simp
qed
ultimately show ?thesis
by blast
qed

lemma bit_symbols_verify_sat: "bit_symbols (verify_sat zs)"
unfolding verify_sat_def Let_def by simp

subsection ‹A Turing machine for verifying formulas›

text ‹
The core of the function @{const verify_sat} is the expression @{term " (λv. v ∈
set (zs_numlist vs)) ⊨ zs_formula xs"}, which checks if an assignment
represented by a list of variable indices satisfies a CNF formula represented by
a list of lists of literals. In this section we devise a Turing machine
performing this check.

Recall that the numbers 0 and 1 are represented by the empty symbol sequence
and the symbol sequence \textbf{1}, respectively. The Turing machines in this
section are described in terms of numbers.

We start with a Turing machine that checks a clause. The TM accepts on tape
$j_1$ a list of numbers representing an assignment $\alpha$ and on tape $j_2$ a
list of numbers representing a clause. It outputs on tape $j_3$ the number 1 if
$\alpha$ satisfies the clause, and otherwise 0. To do this the TM iterates over
all literals in the clause and determines the underlying variable and the sign
of the literal. If the literal is positive and the variable is in the list
representing $\alpha$ or if the literal is negative and the variable is not in
the list, the number 1 is written to the tape $j_3$. Otherwise the tape remains
unchanged. We assume $j_3$ is initialized with 0, and so it will be 1 if and
only if at least one literal is satisfied by $\alpha$.

The TM requires five auxiliary tapes $j_3 + 1, \dots, j_3 + 5$. Tape $j_3 + 1$
stores the literals one at a time, and later the variable; tape $j_3 + 2$ stores
the sign of the literal; tape $j_3 + 3$ stores whether the variable is contained
in $\alpha$; tapes $j_3 + 4$ and $j_3 + 5$ are the auxiliary tapes of @{const
tm_contains}.
›

definition tm_sat_clause :: "tapeidx ⇒ tapeidx ⇒ tapeidx ⇒ machine" where
"tm_sat_clause j1 j2 j3 ≡
WHILE [] ; λrs. rs ! j2 ≠ □ DO
tm_nextract 4 j2 (j3 + 1) ;;
tm_mod2 (j3 + 1) (j3 + 2) ;;
tm_div2 (j3 + 1) ;;
tm_contains j1 (j3 + 1) (j3 + 3) ;;
IF λrs. rs ! (j3 + 3) = □ ∧ rs ! (j3 + 2) = □ ∨ rs ! (j3 + 3) ≠ □ ∧ rs ! (j3 + 2) ≠ □ THEN
tm_setn j3 1
ELSE
[]
ENDIF ;;
tm_setn (j3 + 1) 0 ;;
tm_setn (j3 + 2) 0 ;;
tm_setn (j3 + 3) 0
DONE ;;
tm_cr j2"

lemma tm_sat_clause_tm:
assumes "k ≥ 2" and "G ≥ 5" and "j3 + 5 < k" "0 < j1" "j1 < k" "j2 < k" "j1 < j3"
shows "turing_machine k G (tm_sat_clause j1 j2 j3)"
using tm_sat_clause_def tm_mod2_tm tm_div2_tm tm_nextract_tm tm_setn_tm tm_contains_tm Nil_tm tm_cr_tm
assms turing_machine_loop_turing_machine turing_machine_branch_turing_machine
by simp

locale turing_machine_sat_clause =
fixes j1 j2 j3 :: tapeidx
begin

definition "tmL1 ≡ tm_nextract 4 j2 (j3 + 1)"
definition "tmL2 ≡ tmL1 ;; tm_mod2 (j3 + 1) (j3 + 2)"
definition "tmL3 ≡ tmL2 ;; tm_div2 (j3 + 1)"
definition "tmL4 ≡ tmL3 ;; tm_contains j1 (j3 + 1) (j3 + 3)"
definition "tmI ≡ IF λrs. rs ! (j3 + 3) = □ ∧ rs ! (j3 + 2) = □ ∨ rs ! (j3 + 3) ≠ □ ∧ rs ! (j3 + 2) ≠ □ THEN tm_setn j3 1 ELSE [] ENDIF"
definition "tmL5 ≡ tmL4 ;; tmI"
definition "tmL6 ≡ tmL5 ;; tm_setn (j3 + 1) 0"
definition "tmL7 ≡ tmL6 ;; tm_setn (j3 + 2) 0"
definition "tmL8 ≡ tmL7 ;; tm_setn (j3 + 3) 0"
definition "tmL ≡ WHILE [] ; λrs. rs ! j2 ≠ □ DO tmL8 DONE"
definition "tm2 ≡ tmL ;; tm_cr j2"

lemma tm2_eq_tm_sat_clause: "tm2 = tm_sat_clause j1 j2 j3"
unfolding tm2_def tmL_def tmL8_def tmL7_def tmL6_def tmL5_def tmL4_def tmL3_def tmI_def
tmL2_def tmL1_def tm_sat_clause_def
by simp

context
fixes tps0 :: "tape list" and k :: nat and vars :: "nat list" and clause :: clause
assumes jk: "0 < j1" "j1 ≠ j2" "j3 + 5 < k" "j1 < j3" "j2 < j3" "0 < j2" "length tps0 = k"
assumes tps0:
"tps0 ! j1 = nltape' vars 0"
"tps0 ! j2 = nltape' (clause_n clause) 0"
"tps0 ! j3 = (⌊0⌋⇩N, 1)"
"tps0 ! (j3 + 1) = (⌊0⌋⇩N, 1)"
"tps0 ! (j3 + 2) = (⌊0⌋⇩N, 1)"
"tps0 ! (j3 + 3) = (⌊0⌋⇩N, 1)"
"tps0 ! (j3 + 4) = (⌊0⌋⇩N, 1)"
"tps0 ! (j3 + 5) = (⌊0⌋⇩N, 1)"
begin

abbreviation "sat_take t ≡ satisfies_clause (λv. v ∈ set vars) (take t clause)"

definition tpsL :: "nat ⇒ tape list" where
"tpsL t ≡ tps0
[j2 := nltape' (clause_n clause) t,
j3 := (⌊sat_take t⌋⇩B, 1)]"

lemma tpsL0: "tpsL 0 = tps0"
proof -
have "nltape' (clause_n clause) 0 = tps0 ! j2"
using tps0(2) by presburger
moreover have "⌊sat_take 0⌋⇩B = ⌊0⌋⇩N"
using satisfies_clause_def by simp
ultimately show ?thesis
using tpsL_def tps0 jk by (metis list_update_id)
qed

definition tpsL1 :: "nat ⇒ tape list" where
"tpsL1 t ≡ tps0
[j2 := nltape' (clause_n clause) (Suc t),
j3 := (⌊sat_take t⌋⇩B, 1),
j3 + 1 := (⌊literal_n (clause ! t)⌋⇩N, 1)]"

lemma tmL1 [transforms_intros]:
assumes "ttt = 12 + 2 * nlength (clause_n clause ! t)" and "t < length (clause_n clause)"
shows "transforms tmL1 (tpsL t) ttt (tpsL1 t)"
unfolding tmL1_def
proof (tform tps: assms tps0 tpsL_def tpsL1_def jk)
have len: "t < length clause"
using assms(2) clause_n_def by simp
show "ttt = 12 + 2 * nlength 0 + 2 * nlength (clause_n clause ! t)"
using assms(1) by simp
have *: "j2 ≠ j3"
using jk by simp
have **: "clause_n clause ! t = literal_n (clause ! t)"
using len by (simp add: clause_n_def)
show "tpsL1 t = (tpsL t)
[j2 := nltape' (clause_n clause) (Suc t),
j3 + 1 := (⌊clause_n clause ! t⌋⇩N, 1)]"
unfolding tpsL_def tpsL1_def using list_update_swap[OF *, of tps0] by (simp add: **)
qed

definition tpsL2 :: "nat ⇒ tape list" where
"tpsL2 t ≡ tps0
[j2 := nltape' (clause_n clause) (Suc t),
j3 := (⌊sat_take t⌋⇩B, 1),
j3 + 1 := (⌊literal_n (clause ! t)⌋⇩N, 1),
j3 + 2 := (⌊literal_n (clause ! t) mod 2⌋⇩N, 1)]"

lemma tmL2 [transforms_intros]:
assumes "ttt = 12 + 2 * nlength (clause_n clause ! t) + 1"
and "t < length (clause_n clause)"
shows "transforms tmL2 (tpsL t) ttt (tpsL2 t)"
unfolding tmL2_def by (tform tps: assms tps0 tpsL2_def tpsL1_def jk)

definition tpsL3 :: "nat ⇒ tape list" where
"tpsL3 t ≡ tps0
[j2 := nltape' (clause_n clause) (Suc t),
j3 := (⌊sat_take t⌋⇩B, 1),
j3 + 1 := (⌊literal_n (clause ! t) div 2⌋⇩N, 1),
j3 + 2 := (⌊literal_n (clause ! t) mod 2⌋⇩N, 1)]"

lemma tmL3 [transforms_intros]:
assumes "ttt = 16 + 4 * nlength (clause_n clause ! t)"
and "t < length (clause_n clause)"
shows "transforms tmL3 (tpsL t) ttt (tpsL3 t)"
unfolding tmL3_def
proof (tform tps: assms(2) tps0 tpsL3_def tpsL2_def jk)
have len: "t < length clause"
using assms(2) clause_n_def by simp
have **: "clause_n clause ! t = literal_n (clause ! t)"
using len by (simp add: clause_n_def)
show "ttt = 12 + 2 * nlength (clause_n clause ! t) + 1 + (2 * nlength (literal_n (clause ! t)) + 3)"
using assms(1) ** by simp
qed

definition tpsL4 :: "nat ⇒ tape list" where
"tpsL4 t ≡ tps0
[j2 := nltape' (clause_n clause) (Suc t),
j3 := (⌊sat_take t⌋⇩B, 1),
j3 + 1 := (⌊literal_n (clause ! t) div 2⌋⇩N, 1),
j3 + 2 := (⌊literal_n (clause ! t) mod 2⌋⇩N, 1),
j3 + 3 := (⌊literal_n (clause ! t) div 2 ∈ set vars⌋⇩B, 1)]"

lemma tmL4 [transforms_intros]:
assumes "ttt = 20 + 4 * nlength (clause_n clause ! t) + 67 * (nllength vars)⇧2"
and "t < length (clause_n clause)"
shows "transforms tmL4 (tpsL t) ttt (tpsL4 t)"
unfolding tmL4_def
proof (tform tps: assms(2) tps0 tpsL4_def tpsL3_def jk time: assms(1))
have "tpsL3 t ! (j3 + 4) = (⌊0⌋⇩N, 1)"
using tpsL3_def tps0 jk by simp
then show "tpsL3 t ! (j3 + 3 + 1) = (⌊0⌋⇩N, 1)"
have "tpsL3 t ! (j3 + 5) = (⌊0⌋⇩N, 1)"
using tpsL3_def tps0 jk by simp
then show "tpsL3 t ! (j3 + 3 + 2) = (⌊0⌋⇩N, 1)"
qed

definition tpsL5 :: "nat ⇒ tape list" where
"tpsL5 t ≡ tps0
[j2 := nltape' (clause_n clause) (Suc t),
j3 := (⌊sat_take (Suc t)⌋⇩B, 1),
j3 + 1 := (⌊literal_n (clause ! t) div 2⌋⇩N, 1),
j3 + 2 := (⌊literal_n (clause ! t) mod 2⌋⇩N, 1),
j3 + 3 := (⌊literal_n (clause ! t) div 2 ∈ set vars⌋⇩B, 1)]"

lemma tmI [transforms_intros]:
assumes "ttt = 16" and "t < length (clause_n clause)"
shows "transforms tmI (tpsL4 t) ttt (tpsL5 t)"
unfolding tmI_def
proof (tform tps: jk tpsL4_def time: assms(1))
show "10 + 2 * nlength (if sat_take t then 1 else 0) + 2 * nlength 1 + 2 ≤ ttt"
using assms(1) nlength_0 nlength_1_simp by simp

have len: "t < length clause"
using assms(2) by (simp add: clause_n_def)

let ?l = "clause ! t"
have 1: "read (tpsL4 t) ! (j3 + 3) = □ ⟷ literal_n ?l div 2 ∉ set vars"
using tpsL4_def jk read_ncontents_eq_0[of "tpsL4 t" "j3 + 3"] by simp
have 2: "read (tpsL4 t) ! (j3 + 2) = □ ⟷ literal_n ?l mod 2 = 0"
using tpsL4_def jk read_ncontents_eq_0[of "tpsL4 t" "j3 + 2"] by simp

let ?a = "λv. v ∈ set vars"
let ?cond = "read (tpsL4 t) ! (j3 + 3) = □ ∧ read (tpsL4 t) ! (j3 + 2) = □ ∨
read (tpsL4 t) ! (j3 + 3) ≠ □ ∧ read (tpsL4 t) ! (j3 + 2) ≠ □"
have *: "?cond ⟷ satisfies_literal ?a ?l"
proof (cases ?l)
case (Neg v)
then have "literal_n ?l div 2 = v" "literal_n ?l mod 2 = 0"
by simp_all
moreover from this have "satisfies_literal ?a ?l ⟷ v ∉ set vars"
using Neg by simp
ultimately show ?thesis
using 1 2 by simp
next
case (Pos v)
then have "literal_n ?l div 2 = v" "literal_n ?l mod 2 = 1"
by simp_all
moreover from this have "satisfies_literal ?a ?l ⟷ v ∈ set vars"
using Pos by simp
ultimately show ?thesis
using 1 2 by simp
qed

have **: "sat_take (Suc t) ⟷ sat_take t ∨ satisfies_literal ?a ?l"
using satisfies_clause_take[OF len] by simp

show "tpsL5 t = (tpsL4 t)[j3 := (⌊1⌋⇩N, 1)]" if ?cond
proof -
have "(if sat_take (Suc t) then 1::nat else 0) = 1"
using that * ** by simp
then show ?thesis
unfolding tpsL5_def tpsL4_def using that by (simp add: list_update_swap)
qed
show "tpsL5 t = (tpsL4 t)" if "¬ ?cond"
proof -
have "sat_take t = sat_take (Suc t)"
using * ** that by simp
then show ?thesis
unfolding tpsL5_def tpsL4_def using that by (simp add: list_update_swap)
qed
qed

lemma tmL5 [transforms_intros]:
assumes "ttt = 36 + 4 * nlength (clause_n clause ! t) + 67 * (nllength vars)⇧2"
and "t < length (clause_n clause)"
shows "transforms tmL5 (tpsL t) ttt (tpsL5 t)"
unfolding tmL5_def by (tform tps: assms)

definition tpsL6 :: "nat ⇒ tape list" where
"tpsL6 t ≡ tps0
[j2 := nltape' (clause_n clause) (Suc t),
j3 := (⌊sat_take (Suc t)⌋⇩B, 1),
j3 + 1 := (⌊0⌋⇩N, 1),
j3 + 2 := (⌊literal_n (clause ! t) mod 2⌋⇩N, 1),
j3 + 3 := (⌊literal_n (clause ! t) div 2 ∈ set vars⌋⇩B, 1)]"

lemma tmL6 [transforms_intros]:
assumes "ttt = 46 + 4 * nlength (clause_n clause ! t) + 67 * (nllength vars)⇧2 + 2 * nlength (literal_n (clause ! t) div 2)"
and "t < length (clause_n clause)"
shows "transforms tmL6 (tpsL t) ttt (tpsL6 t)"
unfolding tmL6_def by (tform tps: assms tps0 tpsL6_def tpsL5_def jk)

definition tpsL7 :: "nat ⇒ tape list" where
"tpsL7 t ≡ tps0
[j2 := nltape' (clause_n clause) (Suc t),
j3 := (⌊sat_take (Suc t)⌋⇩B, 1),
j3 + 1 := (⌊0⌋⇩N, 1),
j3 + 2 := (⌊0⌋⇩N, 1),
j3 + 3 := (⌊literal_n (clause ! t) div 2 ∈ set vars⌋⇩B, 1)]"

lemma tmL7 [transforms_intros]:
assumes "ttt = 56 + 4 * nlength (clause_n clause ! t) + 67 * (nllength vars)⇧2 + 2 * nlength (literal_n (clause ! t) div 2) +
2 * nlength (literal_n (clause ! t) mod 2)"
and "t < length (clause_n clause)"
shows "transforms tmL7 (tpsL t) ttt (tpsL7 t)"
unfolding tmL7_def by (tform tps: assms tps0 tpsL7_def tpsL6_def jk)

definition tpsL8 :: "nat ⇒ tape list" where
"tpsL8 t ≡ tps0
[j2 := nltape' (clause_n clause) (Suc t),
j3 := (⌊sat_take (Suc t)⌋⇩B, 1),
j3 + 1 := (⌊0⌋⇩N, 1),
j3 + 2 := (⌊0⌋⇩N, 1),
j3 + 3 := (⌊0⌋⇩N, 1)]"

lemma tmL8:
assumes "ttt = 66 + 4 * nlength (clause_n clause ! t) + 67 * (nllength vars)⇧2 +
2 * nlength (literal_n (clause ! t) div 2) +
2 * nlength (literal_n (clause ! t) mod 2) +
2 * nlength (if literal_n (clause ! t) div 2 ∈ set vars then 1 else 0)"
and "t < length (clause_n clause)"
shows "transforms tmL8 (tpsL t) ttt (tpsL8 t)"
unfolding tmL8_def by (tform tps: assms tps0 tpsL8_def tpsL7_def jk)

lemma tmL8':
assumes "ttt = 70 + 6 * nllength (clause_n clause) + 67 * (nllength vars)⇧2"
and "t < length (clause_n clause)"
shows "transforms tmL8 (tpsL t) ttt (tpsL8 t)"
proof -
let ?l = "literal_n (clause ! t)"
let ?ll = "clause_n clause ! t"
let ?t = "66 + 4 * nlength ?ll + 67 * (nllength vars)⇧2 +
2 * nlength (?l div 2) + 2 * nlength (?l mod 2) + 2 * nlength (if ?l div 2 ∈ set vars then 1 else 0)"
have "?t = 66 + 4 * nlength ?ll + 67 * (nllength vars)⇧2 +
2 * nlength (?ll div 2) + 2 * nlength (?ll mod 2) + 2 * nlength (if ?ll div 2 ∈ set vars then 1 else 0)"
using assms(2) clause_n_def by simp
also have "... ≤ 66 + 4 * nlength ?ll + 67 * (nllength vars)⇧2 +
2 * nlength ?ll + 2 * nlength (?ll mod 2) + 2 * nlength (if ?ll div 2 ∈ set vars then 1 else 0)"
using nlength_mono[of "?ll div 2" "?ll"] by simp
also have "... = 66 + 6 * nlength ?ll + 67 * (nllength vars)⇧2 +
2 * nlength (?ll mod 2) + 2 * nlength (if ?ll div 2 ∈ set vars then 1 else 0)"
by simp
also have "... ≤ 66 + 6 * nlength ?ll + 67 * (nllength vars)⇧2 +
2 * nlength 1 + 2 * nlength (if ?ll div 2 ∈ set vars then 1 else 0)"
using nlength_mono by simp
also have "... ≤ 66 + 6 * nlength ?ll + 67 * (nllength vars)⇧2 + 2 * nlength 1 + 2 * nlength 1"
using nlength_mono by simp
also have "... = 70 + 6 * nlength ?ll + 67 * (nllength vars)⇧2"
using nlength_1_simp by simp
also have "... ≤ 70 + 6 * nllength (clause_n clause) + 67 * (nllength vars)⇧2"
using assms(2) member_le_nllength by simp
finally have "?t ≤ ttt"
using assms(1) by simp
then show ?thesis
using assms tmL8 transforms_monotone by blast
qed

definition tpsL8' :: "nat ⇒ tape list" where
"tpsL8' t ≡ tps0
[j2 := nltape' (clause_n clause) (Suc t),
j3 := (⌊sat_take