Theory Satisfiability

chapter ‹Satisfiability\label{s:Sat}›

theory Satisfiability
  imports Wellformed NP
begin

text ‹
This chapter introduces the language \SAT{} and shows that it is in $\NP$, which
constitutes the easier part of the Cook-Levin theorem. The other part, the
$\NP$-hardness of \SAT{}, is what all the following chapters are concerned with.

We first introduce Boolean formulas in conjunctive normal form and the concept
of satisfiability. Then we define a way to represent such formulas as bit
strings, leading to the definition of the language \SAT{} as a set of strings
(Section~\ref{s:sat-sat}).

For the proof that \SAT{} is in $\NP$, we construct a Turing machine that, given
a CNF formula and a string representing a variable assignment, outputs
\textbf{1} iff. the assignment satisfies the formula. The TM will run in
polynomial time, and there are always assignments polynomial (in fact, linear)
in the length of the formula (Section~\ref{s:Sat-np}).
›


section ‹The language \SAT{}\label{s:sat-sat}›

text ‹
\SAT{} is the language of all strings representing satisfiable Boolean
formulas in conjunctive normal form (CNF).  This section introduces a minimal
version of Boolean formulas in conjunctive normal form, including the
concepts of assignments and satisfiability.
›


subsection ‹CNF formulas and satisfiability\label{s:CNF}›

text ‹
Arora and Barak~\cite[p.~44]{ccama} define Boolean formulas in general as
expressions over $\land$, $\lor$, $\lnot$, parentheses, and variables $v_1, v_2,
\dots$ in the usual way. Boolean formulas in conjunctive normal form are defined as
$\bigwedge_i\left(\bigvee_j v_{i_j}\right)$, where the $v_{i_j}$ are literals.
This definition does not seem to allow for empty clauses. Also whether the
``empty CNF formula'' exists is somewhat doubtful.  Nevertheless, our
formalization allows for both empty clauses and the empty CNF formula, because
this enables us to represent CNF formulas as lists of clauses and clauses as
lists of literals without having to somehow forbid the empty list. This seems to
be a popular approach for formalizing CNF formulas in the context of \SAT{} and
$\NP$~\cite{Gher2021MechanisingCT,Multiset_Ordering_NPC-AFP}.

We identify a variable $v_i$ with its index $i$, which can be any natural
number. A \emph{literal} can either be positive or negative, representing a
variable or negated variable, respectively.

\null
›

datatype literal = Neg nat | Pos nat

type_synonym clause = "literal list"

type_synonym formula = "clause list"

text ‹
An \emph{assignment} maps all variables, given by their index, to a Boolean:
›

type_synonym assignment = "nat  bool"

abbreviation satisfies_literal :: "assignment  literal  bool" where
  "satisfies_literal α x  case x of Neg n  ¬ α n | Pos n  α n"

definition satisfies_clause :: "assignment  clause  bool" where
  "satisfies_clause α c  xset c. satisfies_literal α x"

definition satisfies :: "assignment  formula  bool" (infix  60) where
  "α  φ  cset φ. satisfies_clause α c"

text ‹
As is customary, the empty clause is satisfied by no assignment, and the empty
CNF formula is satisfied by every assignment.
›

proposition "¬ satisfies_clause α []"
  by (simp add: satisfies_clause_def)

proposition "α  []"
  by (simp add: satisfies_def)

lemma satisfies_clause_take:
  assumes "i < length clause"
  shows "satisfies_clause α (take (Suc i) clause) 
    satisfies_clause α (take i clause)  satisfies_literal α (clause ! i)"
  using assms satisfies_clause_def by (auto simp add: take_Suc_conv_app_nth)

lemma satisfies_take:
  assumes "i < length φ"
  shows "α  take (Suc i) φ  α  take i φ  satisfies_clause α (φ ! i)"
  using assms satisfies_def by (auto simp add: take_Suc_conv_app_nth)

lemma satisfies_append:
  assumes "α  φ1 @ φ2"
  shows "α  φ1" and "α  φ2"
  using assms satisfies_def by simp_all

lemma satisfies_append':
  assumes "α  φ1" and "α  φ2"
  shows "α  φ1 @ φ2"
  using assms satisfies_def by auto

lemma satisfies_concat_map:
  assumes "α  concat (map f [0..<k])" and "i < k"
  shows "α  f i"
  using assms satisfies_def by simp

lemma satisfies_concat_map':
  assumes "i. i < k  α  f i"
  shows "α  concat (map f [0..<k])"
  using assms satisfies_def by simp

text ‹
The main ingredient for defining \SAT{} is the concept of \emph{satisfiable} CNF
formula:
›

definition satisfiable :: "formula  bool" where
  "satisfiable φ  α. α  φ"

text ‹
The set of all variables used in a CNF formula is finite.
›

definition variables :: "formula  nat set" where
  "variables φ  {n. cset φ. Neg n  set c  Pos n  set c}"

lemma finite_variables: "finite (variables φ)"
proof -
  define voc :: "clause  nat set" where
    "voc c = {n. Neg n  set c  Pos n  set c}" for c
  let ?vocs = "set (map voc φ)"

  have "finite (voc c)" for c
  proof (induction c)
    case Nil
    then show ?case
      using voc_def by simp
  next
    case (Cons a c)
    have "voc (a # c) = {n. Neg n  set (a # c)  Pos n  set (a # c)}"
      using voc_def by simp
    also have "... = {n. Neg n  set c  Neg n = a  Pos n  set c  Pos n = a}"
      by auto
    also have "... = {n. (Neg n  set c  Pos n  set c)  (Pos n = a  Neg n = a)}"
      by auto
    also have "... = {n. Neg n  set c  Pos n  set c}  {n. Pos n = a  Neg n = a}"
      by auto
    also have "... = voc c  {n. Pos n = a  Neg n = a}"
      using voc_def by simp
    finally have "voc (a # c) = voc c  {n. Pos n = a  Neg n = a}" .
    moreover have "finite {n. Pos n = a  Neg n = a}"
      using finite_nat_set_iff_bounded by auto
    ultimately show ?case
      using Cons by simp
  qed
  moreover have "variables φ = ?vocs"
    using variables_def voc_def by auto
  moreover have "finite ?vocs"
    by simp
  ultimately show ?thesis
    by simp
qed

lemma variables_append: "variables (φ1 @ φ2) = variables φ1  variables φ2"
  using variables_def by auto

text ‹
Arora and Barak~\cite[Claim~2.13]{ccama} define the \emph{size} of a CNF formula
as the numbr of $\wedge / \vee$ symbols. We use a slightly different definition,
namely the number of literals:
›

definition fsize :: "formula  nat" where
  "fsize φ  sum_list (map length φ)"


subsection ‹Predicates on assignments›

text ‹
Every CNF formula is satisfied by a set of assignments. Conversely, for certain
sets of assignments we can construct CNF formulas satisfied by exactly these
assignments.  This will be helpful later when we construct formulas for reducing
arbitrary languages to \SAT{} (see Section~\ref{s:Reducing}).
›


subsubsection ‹Universality of CNF formulas›

text ‹
A set (represented by a predicate) $F$ of assignments depends on the first
$\ell$ variables iff.\ any two assignments that agree on the first $\ell$
variables are either both in the set or both outside of the set.
›

definition depon :: "nat  (assignment  bool)  bool" where
  "depon l F  α1 α2. (i<l. α1 i = α2 i)  F α1 = F α2"

text ‹
Lists of all strings of the same length:
›

fun str_of_len :: "nat  string list" where
  "str_of_len 0 = [[]]" |
  "str_of_len (Suc l) = map ((#) 𝕆) (str_of_len l) @ map ((#) 𝕀) (str_of_len l)"

lemma length_str_of_len: "length (str_of_len l) = 2 ^ l"
  by (induction l) simp_all

lemma in_str_of_len_length: "xs  set (str_of_len l)  length xs = l"
  by (induction l arbitrary: xs) auto

lemma length_in_str_of_len: "length xs = l  xs  set (str_of_len l)"
proof (induction l arbitrary: xs)
  case 0
  then show ?case
    by simp
next
  case (Suc l)
  then obtain y ys where "xs = y # ys"
    by (meson length_Suc_conv)
  then have "length ys = l"
    using Suc by simp
  show ?case
  proof (cases y)
    case True
    then have "xs  set (map ((#) 𝕀) (str_of_len l))"
      using `length ys = l` Suc `xs = y # ys` by simp
    then show ?thesis
      by simp
  next
    case False
    then have "xs  set (map ((#) 𝕆) (str_of_len l))"
      using `length ys = l` Suc `xs = y # ys` by simp
    then show ?thesis
      by simp
  qed
qed

text ‹
A predicate $F$ depending on the first $\ell$ variables $v_0, \dots, v_{\ell-1}$
can be regarded as a truth table over $\ell$ variables. The next lemma shows
that for every such truth table there exists a CNF formula with at most $2^\ell$
clauses and $\ell\cdot2^\ell$ literals over the first $\ell$ variables. This is
the well-known fact that every Boolean function (over $\ell$ variables) can be
represented by a CNF formula~\cite[Claim~2.13]{ccama}.
›

lemma depon_ex_formula:
  assumes "depon l F"
  shows "φ.
    fsize φ  l * 2 ^ l 
    length φ  2 ^ l 
    variables φ  {..<l} 
    (α. F α = α  φ)"
proof -
  define cl where "cl = (λv. map (λi. if v ! i then Neg i else Pos i) [0..<l])"
  have cl1: "satisfies_clause a (cl v)" if "length v = l" and "v  map a [0..<l]" for v a
  proof -
    obtain i where i: "i < l" "a i  v ! i"
      using length v = l v  map a [0..<l]
      by (smt (verit, best) atLeastLessThan_iff map_eq_conv map_nth set_upt)
    then have *: "cl v ! i = (if v ! i then Neg i else Pos i)"
      using cl_def by simp
    then have "case (cl v ! i) of Neg n  ¬ a n | Pos n  a n"
      using i(2) by simp
    then show ?thesis
      using cl_def * that(1) satisfies_clause_def i(1) by fastforce
  qed
  have cl2: "v  map a [0..<l]" if "length v = l" and "satisfies_clause a (cl v)" for v a
  proof
    assume assm: "v = map a [0..<l]"
    from that(2) have "xset (cl v). case x of Neg n  ¬ a n | Pos n  a n"
      using satisfies_clause_def by simp
    then obtain i where i: "i < l" and "case (cl v ! i) of Neg n  ¬ a n | Pos n  a n"
      using cl_def by auto
    then have "v ! i  a i"
      using cl_def by fastforce
    then show False
      using i assm by simp
  qed

  have filter_length_nth: "f (vs ! j)" if "vs = filter f sol" and "j < length vs"
    for vs sol :: "'a list" and f j
    using that nth_mem by (metis length_removeAll_less less_irrefl removeAll_filter_not)

  have sum_list_map: "sum_list (map g xs)  k * length xs" if "x. x  set xs  g x = k"
    for xs :: "'a list" and g k
    using that
  proof (induction "length xs" arbitrary: xs)
    case 0
    then show ?case
      by simp
  next
    case (Suc x)
    then obtain y ys where "xs = y # ys"
      by (metis length_Suc_conv)
    then have "length ys = x"
      using Suc by simp
    have "y  set xs"
      using `xs = y # ys` by simp
    have "sum_list (map g xs) = sum_list (map g (y # ys))"
      using `xs = y # ys` by simp
    also have "... = g y + sum_list (map g ys)"
      by simp
    also have "... = k + sum_list (map g ys)"
      using Suc `y  set xs` by simp
    also have "...  k + k * length ys"
      using Suc `length ys = x` xs = y # ys by auto
    also have "... = k * length xs"
      by (metis Suc.hyps(2) length ys = x mult_Suc_right)
    finally show ?case
      by simp
  qed

  define vs where
    "vs = filter (λv. F (λi. if i < l then v ! i else False) = False) (str_of_len l)"
  define φ where "φ = map cl vs"

  have "a  φ" if "F a" for a
  proof -
    define v where "v = map a [0..<l]"
    then have "(λi. if i < l then v ! i else False) j = a j" if "j < l" for j
      by (simp add: that)
    then have *: "F (λi. if i < l then v ! i else False)"
      using assms(1) depon_def that by (smt (verit, ccfv_SIG))
    have "satisfies_clause a c" if "c  set φ" for c
    proof -
      obtain j where j: "c = φ ! j" "j < length φ"
        using φ_def `c  set φ` by (metis in_set_conv_nth)
      then have "c = cl (vs ! j)"
        using φ_def by simp
      have "j < length vs"
        using φ_def j by simp
      then have "F (λi. if i < l then (vs ! j) ! i else False) = False"
        using vs_def filter_length_nth by blast
      then have "vs ! j  v"
        using * by auto
      moreover have "length (vs ! j) = l"
        using vs_def length_str_of_len j < length vs
        by (smt (verit) filter_eq_nths in_str_of_len_length notin_set_nthsI nth_mem)
      ultimately have "satisfies_clause a (cl (vs ! j))"
        using v_def cl1 by simp
      then show ?thesis
        using `c = cl (vs ! j)` by simp
    qed
    then show ?thesis
      using satisfies_def by simp
  qed
  moreover have "F α" if "α  φ" for α
  proof (rule ccontr)
    assume assm: "¬ F α"
    define v where "v = map α [0..<l]"
    have *: "F (λi. if i < l then v ! i else False) = False"
    proof -
      have "(λi. if i < l then v ! i else False) j = α j" if "j < l" for j
        using v_def by (simp add: that)
      then show ?thesis
        using assm assms(1) depon_def by (smt (verit, best))
    qed
    have "length v = l"
      using v_def by simp
    then obtain j where "j < length (str_of_len l)" and "v = str_of_len l ! j"
      by (metis in_set_conv_nth length_in_str_of_len)
    then have "v  set vs"
      using vs_def * by fastforce
    then have "cl v  set φ"
      using φ_def by simp
    then have "satisfies_clause α (cl v)"
      using that satisfies_def by simp
    then have "v  map α [0..<l]"
      using `length v = l` cl2 by simp
    then show False
      using v_def by simp
  qed
  ultimately have "α. F α = α  φ"
    by auto
  moreover have "fsize φ  l * 2 ^ l"
  proof -
    have "length c = l" if "c  set φ" for c
      using that cl_def φ_def by auto
    then have "fsize φ  l * length φ"
      unfolding fsize_def using sum_list_map by auto
    also have "...  l * length (str_of_len l)"
      using φ_def vs_def by simp
    also have "... = l * 2 ^ l"
      using length_str_of_len by simp
    finally show ?thesis .
  qed
  moreover have "length φ  2 ^ l"
  proof -
    have "length φ  length (str_of_len l)"
      using φ_def vs_def by simp
    also have "... = 2 ^ l"
      using length_str_of_len by simp
    finally show ?thesis .
  qed
  moreover have "variables φ  {..<l}"
  proof
    fix x assume "x  variables φ"
    then obtain c where c: "c  set φ" "Neg x  set c  Pos x  set c"
      using variables_def by auto
    then obtain v where v: "v  set (str_of_len l)" "c = cl v"
      using φ_def vs_def by auto
    then show "x  {..<l}"
      using cl_def c by auto
  qed
  ultimately show ?thesis
    by auto
qed


subsubsection ‹Substitutions of variables›

text ‹
We will sometimes consider CNF formulas over the first $\ell$ variables and
derive other CNF formulas from them by substituting these variables.  Such a
substitution will be represented by a list $\sigma$ of length at least $\ell$,
meaning that the variable $v_i$ is replaced by $v_{\sigma(i)}$. We will call
this operation on formulas \emph{relabel}, and the corresponding one on literals
\emph{rename}:
›

fun rename :: "nat list  literal  literal" where
  "rename σ (Neg i) = Neg (σ ! i)" |
  "rename σ (Pos i) = Pos (σ ! i)"

definition relabel :: "nat list  formula  formula" where
  "relabel σ φ  map (map (rename σ)) φ"

lemma fsize_relabel: "fsize (relabel σ φ) = fsize φ"
  using relabel_def fsize_def by (metis length_concat length_map map_concat)

text ‹
A substitution $\sigma$ can also be applied to an assignment and to a list of
variable indices:
›

definition remap :: "nat list  assignment  assignment" where
  "remap σ α i  if i < length σ then α (σ ! i) else α i"

definition reseq :: "nat list  nat list  nat list" where
  "reseq σ vs  map ((!) σ) vs"

lemma length_reseq [simp]: "length (reseq σ vs) = length vs"
  using reseq_def by simp

text ‹
Relabeling a formula and remapping an assignment are equivalent in a sense.
›

lemma satisfies_sigma:
  assumes "variables φ  {..<length σ}"
  shows "α  relabel σ φ  remap σ α  φ"
proof
  assume sat: "α  relabel σ φ"
  have "satisfies_clause (remap σ α) c" if "c  set φ" for c
  proof -
    obtain i where "i < length φ" "φ ! i = c"
      by (meson c  set φ in_set_conv_nth)
    then have "satisfies_clause α (map (rename σ) c)"
        (is "satisfies_clause α ?c")
      using sat satisfies_def relabel_def by auto
    then obtain x where "xset ?c" "case x of Neg n  ¬ α n | Pos n  α n"
      using satisfies_clause_def by auto
    then obtain j where j: "j < length ?c" "case (?c ! j) of Neg n  ¬ α n | Pos n  α n"
      by (metis in_set_conv_nth)
    have "case c ! j of Neg n  ¬ (remap σ α) n | Pos n  (remap σ α) n"
    proof (cases "c ! j")
      case (Neg n)
      then have 1: "?c ! j = Neg (σ ! n)"
        using j(1) by simp
      have "n  variables φ"
        using Neg j(1) nth_mem that variables_def by force
      then have "n < length σ"
        using assms by auto
      then show ?thesis
        using Neg 1 j(2) remap_def by auto
    next
      case (Pos n)
      then have 1: "?c ! j = Pos (σ ! n)"
        using j(1) by simp
      have "n  variables φ"
        using Pos j(1) nth_mem that variables_def by force
      then have "n < length σ"
        using assms by auto
      then show ?thesis
        using Pos 1 j(2) remap_def by auto
    qed
    then show ?thesis
      using satisfies_clause_def j by auto
  qed
  then show "remap σ α  φ"
    using satisfies_def by simp
next
  assume sat: "remap σ α  φ"
  have "satisfies_clause α c" if "c  set (relabel σ φ)" for c
  proof -
    let ?phi = "relabel σ φ"
    let ?beta = "remap σ α"
    obtain i where i: "i < length ?phi" "?phi ! i = c"
      by (meson c  set ?phi in_set_conv_nth)
    then have "satisfies_clause ?beta (φ ! i)"
        (is "satisfies_clause _ ?c")
      using sat satisfies_def relabel_def by simp
    then obtain x where "xset ?c" "case x of Neg n  ¬ ?beta n | Pos n  ?beta n"
      using satisfies_clause_def by auto
    then obtain j where j: "j < length ?c" "case (?c ! j) of Neg n  ¬ ?beta n | Pos n  ?beta n"
      by (metis in_set_conv_nth)
    then have ren: "c ! j = rename σ (?c ! j)"
      using i relabel_def by auto
    have "case c ! j of Neg n  ¬ α n | Pos n  α n"
    proof (cases "?c ! j")
      case (Neg n)
      then have *: "c ! j = Neg (σ ! n)"
        by (simp add: ren)
      have "n  variables φ"
        using Neg i j variables_def that length_map mem_Collect_eq nth_mem relabel_def by force
      then have "n < length σ"
        using assms by auto
      moreover have "¬ (remap σ α) n"
        using j(2) Neg by simp
      ultimately have "¬ α (σ ! n)"
        using remap_def by simp
      then show ?thesis
        by (simp add: *)
    next
      case (Pos n)
      then have *: "c ! j = Pos (σ ! n)"
        by (simp add: ren)
      have "n  variables φ"
        using Pos i j variables_def that length_map mem_Collect_eq nth_mem relabel_def by force
      then have "n < length σ"
        using assms by auto
      moreover have "(remap σ α) n"
        using j(2) Pos by simp
      ultimately have "α (σ ! n)"
        using remap_def by simp
      then show ?thesis
        by (simp add: *)
    qed
    moreover have "length c = length (φ ! i)"
      using relabel_def i by auto
    ultimately show ?thesis
      using satisfies_clause_def j by auto
  qed
  then show "α  relabel σ φ"
    by (simp add: satisfies_def)
qed


subsection ‹Representing CNF formulas as strings\label{s:sat-sat-repr}›

text ‹
By identifying negated literals with even numbers and positive literals with odd
numbers, we can identify literals with natural numbers. This yields a
straightforward representation of a clause as a list of numbers and of a CNF
formula as a list of lists of numbers. Such a list can, in turn, be represented
as a symbol sequence over a quaternary alphabet as described in
Section~\ref{s:tm-numlistlist}, which ultimately can be encoded over a binary
alphabet (see Section~\ref{s:tm-quaternary}). This is essentially how we
represent CNF formulas as strings.

We have to introduce a bunch of functions for mapping between these
representations.

\null
›

fun literal_n :: "literal  nat" where
  "literal_n (Neg i) = 2 * i" |
  "literal_n (Pos i) = Suc (2 * i)"

definition n_literal :: "nat  literal" where
  "n_literal n  if even n then Neg (n div 2) else Pos (n div 2)"

lemma n_literal_n: "n_literal (literal_n x) = x"
  using n_literal_def by (cases x) simp_all

lemma literal_n_literal: "literal_n (n_literal n) = n"
  using n_literal_def by simp

definition clause_n :: "clause  nat list" where
  "clause_n cl  map literal_n cl"

definition n_clause :: "nat list  clause" where
  "n_clause ns  map n_literal ns"

lemma n_clause_n: "n_clause (clause_n cl) = cl"
  using n_clause_def clause_n_def n_literal_n by (simp add: map_idI)

lemma clause_n_clause: "clause_n (n_clause n) = n"
  using n_clause_def clause_n_def literal_n_literal by (simp add: map_idI)

definition formula_n :: "formula  nat list list" where
  "formula_n φ  map clause_n φ"

definition n_formula :: "nat list list  formula" where
  "n_formula nss  map n_clause nss"

lemma n_formula_n: "n_formula (formula_n φ) = φ"
  using n_formula_def formula_n_def n_clause_n by (simp add: map_idI)

lemma formula_n_formula: "formula_n (n_formula nss) = nss"
  using n_formula_def formula_n_def clause_n_clause by (simp add: map_idI)

definition formula_zs :: "formula  symbol list" where
  "formula_zs φ  numlistlist (formula_n φ)"

text ‹
The mapping between formulas and well-formed symbol sequences for
lists of lists of numbers is bijective.
›

lemma formula_n_inj: "formula_n φ1 = formula_n φ2  φ1 = φ2"
  using n_formula_n by metis

definition zs_formula :: "symbol list  formula" where
  "zs_formula zs  THE φ. formula_zs φ = zs"

lemma zs_formula:
  assumes "numlistlist_wf zs"
  shows "∃!φ. formula_zs φ = zs"
proof -
  obtain nss where nss: "numlistlist nss = zs"
    using assms numlistlist_wf_def by auto
  let ?phi = "n_formula nss"
  have *: "formula_n ?phi = nss"
    using nss formula_n_formula by simp
  then have "formula_zs ?phi = zs"
    using nss formula_zs_def by simp
  then have "φ. formula_zs φ = zs"
    by auto
  moreover have "φ = ?phi" if "formula_zs φ = zs" for φ
  proof -
    have "numlistlist (formula_n φ) = zs"
      using that formula_zs_def by simp
    then have "nss = formula_n φ"
      using nss numlistlist_inj by simp
    then show ?thesis
      using formula_n_inj * by simp
  qed
  ultimately show ?thesis
    by auto
qed

lemma zs_formula_zs: "zs_formula (formula_zs φ) = φ"
  by (simp add: formula_n_inj formula_zs_def numlistlist_inj the_equality zs_formula_def)

lemma formula_zs_formula:
  assumes "numlistlist_wf zs"
  shows "formula_zs (zs_formula zs) = zs"
  using assms zs_formula zs_formula_zs by metis

text ‹
There will of course be Turing machines that perform computations on formulas.
In order to bound their running time, we need bounds for the length of the
symbol representation of formulas.
›

lemma nlength_literal_n_Pos: "nlength (literal_n (Pos n))  Suc (nlength n)"
  using nlength_times2plus1 by simp

lemma nlength_literal_n_Neg: "nlength (literal_n (Neg n))  Suc (nlength n)"
  using nlength_times2 by simp

lemma nlllength_formula_n:
  fixes V :: nat and φ :: formula
  assumes "v. v  variables φ  v  V"
  shows "nlllength (formula_n φ)  fsize φ * Suc (Suc (nlength V)) + length φ"
  using assms
proof (induction φ)
  case Nil
  then show ?case
    using formula_n_def by simp
next
  case (Cons cl φ)
  then have 0: "v. v  variables φ  v  V"
    using variables_def by simp
  have 1: "n  V" if "Pos n  set cl" for n
    using that variables_def by (simp add: Cons.prems)
  have 2: "n  V" if "Neg n  set cl" for n
    using that variables_def by (simp add: Cons.prems)
  have 3: "nlength (literal_n v)  Suc (nlength V)" if "v  set cl" for v
  proof (cases v)
    case (Neg n)
    then have "nlength (literal_n v)  Suc (nlength n)"
      using nlength_literal_n_Neg by blast
    moreover have "n  V"
      using Neg that 2 by simp
    ultimately show ?thesis
      using nlength_mono by fastforce
  next
    case (Pos n)
    then have "nlength (literal_n v)  Suc (nlength n)"
      using nlength_literal_n_Pos by blast
    moreover have "n  V"
      using Pos that 1 by simp
    ultimately show ?thesis
      using nlength_mono by fastforce
  qed

  have "nllength (clause_n cl) = length (numlist (map literal_n cl))"
    using clause_n_def nllength_def by simp
  have "nllength (clause_n cl) = (n(map literal_n cl). Suc (nlength n))"
    using clause_n_def nllength by simp
  also have "... = (vcl. Suc (nlength (literal_n v)))"
  proof -
    have "map (λn. Suc (nlength n)) (map literal_n cl) = map (λv. Suc (nlength (literal_n v))) cl"
      by simp
    then show ?thesis
      by metis
  qed
  also have "...  (vcl. Suc (Suc (nlength V)))"
    using sum_list_mono[of cl "λv. Suc (nlength (literal_n v))" "λv. Suc (Suc (nlength V))"] 3
    by simp
  also have "... = Suc (Suc (nlength V)) * length cl"
    using sum_list_const by blast
  finally have 4: "nllength (clause_n cl)  Suc (Suc (nlength V)) * length cl" .

  have "concat (map (λns. numlist ns @ [5]) (map clause_n (cl # φ))) =
      (numlist (clause_n cl) @ [5]) @ concat (map (λns. numlist ns @ [5]) (map clause_n φ))"
    by simp
  then have "length (concat (map (λns. numlist ns @ [5]) (map clause_n (cl # φ)))) =
      length ((numlist (clause_n cl) @ [5]) @ concat (map (λns. numlist ns @ [5]) (map clause_n φ)))"
    by simp
  then have "nlllength (formula_n (cl # φ)) =
      length ((numlist (clause_n cl) @ [5]) @ concat (map (λns. numlist ns @ [5]) (map clause_n φ)))"
    using formula_n_def numlistlist_def nlllength_def by simp
  also have "... = length (numlist (clause_n cl) @ [5]) + length (concat (map (λns. numlist ns @ [5]) (map clause_n φ)))"
      by simp
  also have "... = length (numlist (clause_n cl) @ [5]) + nlllength (formula_n φ)"
    using formula_n_def numlistlist_def nlllength_def by simp
  also have "... = Suc (nllength (clause_n cl)) + nlllength (formula_n φ)"
    using nllength_def by simp
  also have "...  Suc (Suc (Suc (nlength V)) * length cl) + nlllength (formula_n φ)"
    using 4 by simp
  also have "...  Suc (Suc (Suc (nlength V)) * length cl) + fsize φ * Suc (Suc (nlength V)) + length φ"
    using Cons 0 by simp
  also have "... = fsize (cl # φ) * Suc (Suc (nlength V)) + length (cl # φ)"
    by (simp add: add_mult_distrib2 mult.commute fsize_def)
  finally show ?case
    by simp
qed

text ‹
Since \SAT{} is supposed to be a set of strings rather than symbol
sequences, we need to map symbol sequences representing formulas to strings:
›

abbreviation formula_to_string :: "formula  string" where
  "formula_to_string φ  symbols_to_string (binencode (numlistlist (formula_n φ)))"

lemma formula_to_string_inj:
  assumes "formula_to_string φ1 = formula_to_string φ2"
  shows "φ1 = φ2"
proof -
  let ?xs1 = "binencode (numlistlist (formula_n φ1))"
  let ?xs2 = "binencode (numlistlist (formula_n φ2))"
  have bin1: "binencodable (numlistlist (formula_n φ1))"
    by (simp add: Suc_le_eq numeral_2_eq_2 proper_symbols_numlistlist symbols_lt_numlistlist)
  then have "bit_symbols ?xs1"
    using bit_symbols_binencode by simp
  then have 1: "string_to_symbols (symbols_to_string ?xs1) = ?xs1"
    using bit_symbols_to_symbols by simp

  have bin2: "binencodable (numlistlist (formula_n φ2))"
    by (simp add: Suc_le_eq numeral_2_eq_2 proper_symbols_numlistlist symbols_lt_numlistlist)
  then have "bit_symbols ?xs2"
    using bit_symbols_binencode by simp
  then have "string_to_symbols (symbols_to_string ?xs2) = ?xs2"
    using bit_symbols_to_symbols by simp
  then have "?xs1 = ?xs2"
    using 1 assms by simp
  then have "numlistlist (formula_n φ1) = numlistlist (formula_n φ2)"
    using binencode_inj bin1 bin2 by simp
  then have "formula_n φ1 = formula_n φ2"
    using numlistlist_inj by simp
  then show ?thesis
    using formula_n_inj by simp
qed

text ‹
While @{const formula_to_string} maps every CNF formula to a string, not every
string represents a CNF formula. We could just ignore such invalid strings and
define \SAT{} to only contain well-formed strings.  But this would implicitly
place these invalid strings in the complement of \SAT{}. While this does not
cause us any issues here, it would if we were to introduce co-$\NP$ and wanted
to show that $\overline{\mathtt{SAT}}$ is in co-$\NP$, as we would then have to
deal with the invalid strings. So it feels a little like cheating to ignore the
invalid strings like this.

Arora and Barak~\cite[p.~45 footnote~3]{ccama} recommend mapping invalid strings
to ``some fixed formula''.  A natural candidate for this fixed formula is the
empty CNF, since an invalid string in a sense represents nothing, and the empty
CNF formula is represented by the empty string. Since the empty CNF formula is
satisfiable this implies that all invalid strings become elements of \SAT{}.

We end up with the following definition of the protagonist of this article:
›

definition SAT :: language where
  "SAT  {formula_to_string φ | φ. satisfiable φ}  {x. ¬ (φ. x = formula_to_string φ)}"


section ‹\SAT{} is in $\NP$\label{s:Sat-np}›

text ‹
In order to show that \SAT{} is in $\NP$, we will construct a polynomial-time
Turing machine $M$ and specify a polynomial function $p$ such that for every
$x$, $x\in \SAT$ iff. there is a $u\in\bbOI^{p(|x|)}$ such that $M$ outputs
\textbf{1} on $\langle x; u\rangle$.

The idea is straightforward: Let $\phi$ be the formula represented by the
string $x$.  Interpret the string $u$ as a list of variables and interpret this
as the assignment that assigns True to only the variables in the list. Then
check if the assignment satisfies the formula. This check is ``obviously''
possible in polynomial time because $M$ simply has to iterate over all clauses
and check if at least one literal per clause is true under the assignment.
Checking if a literal is true is simply a matter of checking whether the
literal's variable is in the list $u$. If the assignment satisfies $\phi$,
output \textbf{1}, otherwise the empty symbol sequence.

If $\phi$ is unsatisfiable then no assignment, hence no $u$ no matter the length
will make $M$ output \textbf{1}. On the other hand, if $\phi$ is satisfiable
then there will be a satisfying assignment where a subset of the variables in
$\phi$ are assigned True. Hence there will be a list of variables of at most
roughly the length of $\phi$. So setting the polynomial $p$ to something like
$p(n) = n$ should suffice.

In fact, as we shall see, $p(n) = n$ suffices. This is so because in our
representation, the string $x$, being a list of lists, has slightly more
overhead per number than the plain list $u$ has. Therefore listing all variables
in $\phi$ is guaranteed to need fewer symbols than $x$ has.

There are several technical details to work out. First of all, the input to $M$
need not be a well-formed pair. And if it is, the pair $\langle x, u\rangle$ has
to be decoded into separate components $x$ and $u$. These have to be decoded
again from the binary to the quaternary alphabet, which is only possible if both
$x$ and $u$ comprise only bit symbols (\textbf{01}). Then $M$ needs to check if
the decoded $x$ and $u$ are valid symbol sequences for lists (of lists) of
numbers. In the case of $u$ this is particularly finicky because the definition
of $\NP$ requires us to provide a string $u$ of exactly the length $p(|x|)$ and
so we have to pad $u$ with extra symbols, which have to be stripped again before
the validation can take place.

In the first subsection we describe what the verifier TM has to do in terms of
symbol sequences. In the subsections after that we devise a Turing machine that
implements this behavior.
›


subsection ‹Verifying \SAT{} instances›

text ‹
Our verifier Turing machine for \SAT{} will implement the following function;
that is, on input @{term zs} it will output @{term "verify_sat zs"}.  It
performs a number of decodings and well-formedness checks and outputs either
\textbf{1} or the empty symbol sequence.
›

definition verify_sat :: "symbol list  symbol list" where
  "verify_sat zs 
    let
      ys = bindecode zs;
      xs = bindecode (first ys);
      vs = rstrip  (bindecode (second ys))
    in
      if even (length (first ys))  bit_symbols (first ys)  numlistlist_wf xs
      then if bit_symbols (second ys)  numlist_wf vs
           then if (λv. v  set (zs_numlist vs))  zs_formula xs then [𝟭] else []
           else []
      else [𝟭]"

text ‹
Next we show that @{const verify_sat} behaves as is required of a verifier TM
for \SAT. Its polynomial running time is the subject of the next subsection.
›

text ‹
First we consider the case that @{term zs} encodes a pair $\langle x, u\rangle$
of strings where $x$ does not represent a CNF formula. Such an $x$ is in \SAT{},
hence the verifier TM is supposed to output \textbf{1}.
›

lemma ex_phi_x:
  assumes "xs = string_to_symbols x"
  assumes "even (length xs)" and "numlistlist_wf (bindecode xs)"
  shows "φ. x = formula_to_string φ"
proof -
  obtain nss where "numlistlist nss = bindecode xs"
    using assms(3) numlistlist_wf_def by auto
  moreover have "binencode (bindecode xs) = xs"
    using assms(1,2) binencode_decode by simp
  ultimately have "binencode (numlistlist nss) = xs"
    by simp
  then have "binencode (numlistlist (formula_n (n_formula nss))) = xs"
    using formula_n_formula by simp
  then have "formula_to_string (n_formula nss) = symbols_to_string xs"
    by simp
  then show ?thesis
    using assms(1) symbols_to_string_to_symbols by auto
qed

lemma verify_sat_not_wf_phi:
  assumes "zs = x; u" and "¬ (φ. x = formula_to_string φ)"
  shows "verify_sat zs = [𝟭]"
proof -
  define ys where "ys = bindecode zs"
  then have first_ys: "first ys = string_to_symbols x"
    using first_pair assms(1) by simp
  then have 2: "bit_symbols (first ys)"
    using assms(1) bit_symbols_first ys_def by presburger

  define xs where "xs = bindecode (first ys)"
  then have "¬ (even (length (first ys))  bit_symbols (first ys)  numlistlist_wf xs)"
    using first_ys ex_phi_x assms(2) by auto
  then show "verify_sat zs = [𝟭]"
    unfolding verify_sat_def Let_def using ys_def xs_def by simp
qed

text ‹
The next case is that @{term zs} represents a pair $\langle x, u\rangle$ where
$x$ represents an unsatisfiable CNF formula. This $x$ is thus not in \SAT{} and
the verifier TM must output something different from \textbf{1}, such as the
empty string, regardless of $u$.
›

lemma verify_sat_not_sat:
  fixes φ :: formula
  assumes "zs = formula_to_string φ; u" and "¬ satisfiable φ"
  shows "verify_sat zs = []"
proof -
  have bs_phi: "bit_symbols (binencode (formula_zs φ))"
    using bit_symbols_binencode formula_zs_def proper_symbols_numlistlist symbols_lt_numlistlist
    by (metis Suc_le_eq dual_order.refl numeral_2_eq_2)

  define ys where "ys = bindecode zs"
  then have "first ys = string_to_symbols (formula_to_string φ)"
    using first_pair assms(1) by simp
  then have first_ys: "first ys = binencode (formula_zs φ)"
    using bit_symbols_to_symbols bs_phi formula_zs_def by simp
  then have 2: "bit_symbols (first ys)"
    using assms(1) bit_symbols_first ys_def by presburger
  have 22: "even (length (first ys))"
    using first_ys by simp

  define xs where "xs = bindecode (first ys)"
  define vs where "vs = rstrip 5 (bindecode (second ys))"

  have wf_xs: "numlistlist_wf xs"
    using xs_def first_ys bindecode_encode formula_zs_def numlistlist_wf_def numlistlist_wf_has2'
    by (metis le_simps(3) numerals(2))

  have φ: "zs_formula xs = φ"
    using xs_def first_ys "2" binencode_decode formula_to_string_inj formula_zs_def formula_zs_formula wf_xs
    by auto

  have "verify_sat zs =
     (if bit_symbols (second ys)  numlist_wf vs
      then if (λv. v  set (zs_numlist vs))  φ then [3] else []
      else [])"
    unfolding verify_sat_def Let_def using ys_def xs_def vs_def 2 22 wf_xs φ by simp
  then show "verify_sat zs = []"
    using assms(2) satisfiable_def by simp
qed

text ‹
Next we consider the case in which @{term zs} represents a pair $\langle x,
u\rangle$ where $x$ represents a satisfiable CNF formula and $u$ a list of
numbers padded at the right with @{text } symbols. This $u$ thus represents a
variable assignment, namely the one assigning True to exactly the variables in
the list. The verifier TM is to output \textbf{1} iff.\ this assignment
satisfies the CNF formula represented by $x$.

First we show that stripping away @{text } symbols does not damage a symbol
sequence representing a list of numbers.
›

lemma rstrip_numlist_append: "rstrip  (numlist vars @ replicate pad ) = numlist vars"
  (is "rstrip  ?zs = ?ys")
proof -
  have "(LEAST i. i  length ?zs  set (drop i ?zs)  {}) = length ?ys"
  proof (rule Least_equality)
    show "length ?ys  length ?zs  set (drop (length ?ys) ?zs)  {}"
      by auto
    show "length ?ys  m" if "m  length ?zs  set (drop m ?zs)  {}" for m
    proof (rule ccontr)
      assume "¬ length ?ys  m"
      then have "m < length ?ys"
        by simp
      then have "?ys ! m  set (drop m ?ys)"
        by (metis Cons_nth_drop_Suc list.set_intros(1))
      moreover have "set (drop m ?ys)  {}"
        using that by simp
      ultimately have "?ys ! m = "
        by auto
      moreover have "?ys ! m < "
        using symbols_lt_numlist `m < length ?ys` by simp
      ultimately show False
        by simp
    qed
  qed
  then show ?thesis
    using rstrip_def by simp
qed

lemma verify_sat_wf:
  fixes φ :: formula and pad :: nat
  assumes "zs = formula_to_string φ; symbols_to_string (binencode (numlist vars @ replicate pad ))"
  shows "verify_sat zs = (if (λv. v  set vars)  φ then [𝟭] else [])"
proof -
  have bs_phi: "bit_symbols (binencode (formula_zs φ))"
    using bit_symbols_binencode formula_zs_def proper_symbols_numlistlist symbols_lt_numlistlist
    by (metis Suc_le_eq dual_order.refl numeral_2_eq_2)

  have "binencodable (numlist vars @ replicate pad )"
    using proper_symbols_numlist symbols_lt_numlist binencodable_append[of "numlist vars" "replicate pad "]
    by fastforce
  then have bs_vars: "bit_symbols (binencode (numlist vars @ replicate pad ))"
    using bit_symbols_binencode by simp

  define ys where "ys = bindecode zs"
  then have "first ys = string_to_symbols (formula_to_string φ)"
    using first_pair assms(1) by simp
  then have first_ys: "first ys = binencode (formula_zs φ)"
    using bit_symbols_to_symbols bs_phi formula_zs_def by simp
  then have bs_first: "bit_symbols (first ys)"
    using assms(1) bit_symbols_first ys_def by presburger

  have even: "even (length (first ys))"
    using first_ys by simp

  have second_ys: "second ys = binencode (numlist vars @ replicate pad )"
    using bs_vars ys_def assms(1) bit_symbols_to_symbols second_pair by simp
  then have bs_second: "bit_symbols (second ys)"
    using bs_vars by simp

  define xs where "xs = bindecode (first ys)"
  define vs where "vs = rstrip  (bindecode (second ys))"

  then have "vs = rstrip  (numlist vars @ replicate pad )"
    using second_ys binencodable (numlist vars @ replicate pad ) bindecode_encode by simp
  then have vs: "vs = numlist vars"
    using rstrip_numlist_append by simp

  have wf_xs: "numlistlist_wf xs"
    using xs_def first_ys bindecode_encode formula_zs_def numlistlist_wf_def numlistlist_wf_has2'
    by (metis le_simps(3) numerals(2))

  have "verify_sat zs =
     (if even (length (first ys))  bit_symbols (first ys)  numlistlist_wf xs
      then if bit_symbols (second ys)  numlist_wf vs
           then if (λv. v  set (zs_numlist vs))  zs_formula xs then [𝟭] else []
           else []
      else [3])"
    unfolding verify_sat_def Let_def using bs_second ys_def xs_def vs_def by simp
  then have *: "verify_sat zs =
        (if bit_symbols (second ys)  numlist_wf vs
         then if (λv. v  set (zs_numlist vs))  zs_formula xs then [𝟭] else []
         else [])"
    unfolding verify_sat_def Let_def using even bs_first wf_xs by simp

  have "xs = formula_zs φ"
    using xs_def bindecode_encode formula_zs_def first_ys proper_symbols_numlistlist symbols_lt_numlistlist
    by (simp add: Suc_leI numerals(2))
  then have φ: "φ = zs_formula xs"
    by (simp add: zs_formula_zs)

  have vars: "vars = zs_numlist vs"
    using vs numlist_wf_def numlist_zs_numlist zs_numlist_ex1 by blast
  then have wf_vs: "numlist_wf vs"
    using numlist_wf_def vs by auto

  have "verify_sat zs = (if (λv. v  set (zs_numlist vs))  zs_formula xs then [𝟭] else [])"
    using * bs_second wf_xs wf_vs by simp
  then show ?thesis
    using φ vars by simp
qed

text ‹
Finally we show that for every string $x$ representing a satisfiable CNF formula
there is a list of numbers representing a satisfying assignment and represented
by a string of length at most $|x|$. That means there is always a string of
exactly the length of $x$ consisting of a satisfying assignment plus some
padding symbols.
›

lemma nllength_remove1:
  assumes "n  set ns"
  shows "nllength (n # remove1 n ns) = nllength ns"
  using assms nllength sum_list_map_remove1[of n ns "λn. Suc (nlength n)"] by simp

lemma nllength_distinct_le:
  assumes "distinct ns"
    and "set ns  set ms"
  shows "nllength ns  nllength ms"
  using assms
proof (induction ms arbitrary: ns)
  case Nil
  then show ?case
    by simp
next
  case (Cons m ms)
  show ?case
  proof (cases "m  set ns")
    case True
    let ?ns = "remove1 m ns"
    have "set ?ns  set ms"
      using Cons by auto
    moreover have "distinct ?ns"
      using Cons by simp
    ultimately have *: "nllength ?ns  nllength ms"
      using Cons by simp

    have "nllength ns = nllength (m # ?ns)"
      using True nllength_remove1 by simp
    also have "... = Suc (nlength m) + nllength ?ns"
      using nllength by simp
    also have "...  Suc (nlength m) + nllength ms"
      using * by simp
    also have "... = nllength (m # ms)"
      using nllength by simp
    finally show ?thesis .
  next
    case False
    then have "set ns  set ms"
      using Cons by auto
    moreover have "distinct ns"
      using Cons by simp
    ultimately have "nllength ns  nllength ms"
      using Cons by simp
    then show ?thesis
      using nllength by simp
  qed
qed

lemma nlllength_nllength_concat: "nlllength nss = nllength (concat nss) + length nss"
  using nlllength nllength by (induction nss) simp_all

fun variable :: "literal  nat" where
  "variable (Neg i) = i" |
  "variable (Pos i) = i"

lemma sum_list_eq: "ns = ms  sum_list ns = sum_list ms"
  by simp

lemma nllength_clause_le: "nllength (clause_n cl)  nllength (map variable cl)"
proof -
  have "variable x  literal_n x" for x
    by (cases x) simp_all
  then have *: "Suc (nlength (variable x))  Suc (nlength (literal_n x))" for x
    using nlength_mono by simp

  let ?ns = "map literal_n cl"
  have "nllength (clause_n cl) = nllength ?ns"
    using clause_n_def by presburger
  also have "... = (n?ns. Suc (nlength n))"
    using nllength by simp
  also have "... = (xcl. Suc (nlength (literal_n x)))"
    by (smt (verit, del_insts) length_map nth_equalityI nth_map)
  also have "...  (xcl. Suc (nlength (variable x)))"
    using * by (simp add: sum_list_mono)
  finally have "(xcl. Suc (nlength (variable x)))  nllength (clause_n cl)"
    by simp
  moreover have "(xcl. Suc (nlength (variable x))) = nllength (map variable cl)"
  proof -
    have 1: "map (λx. Suc (nlength (variable x))) cl = map (λn. Suc (nlength n)) (map variable cl)"
      by simp
    then have "(xcl. Suc (nlength (variable x))) = (n(map variable cl). Suc (nlength n))"
      using sum_list_eq[OF 1] by simp
    then show ?thesis
      using nllength by simp
  qed
  ultimately show ?thesis
    by simp
qed

lemma nlllength_formula_ge: "nlllength (formula_n φ)  nlllength (map (map variable) φ)"
proof (induction φ)
  case Nil
  then show ?case
    by simp
next
  case (Cons cl φ)
  have "nlllength (map (map variable) (cl # φ)) =
      nlllength (map (map variable) [cl]) + nlllength (map (map variable) φ)"
    using nlllength by simp
  also have "... = Suc (nllength (map variable cl)) + nlllength (map (map variable) φ)"
    using nlllength by simp
  also have "...  Suc (nllength (map variable cl)) + nlllength (formula_n φ)"
    using Cons by simp
  also have "...  Suc (nllength (clause_n cl)) + nlllength (formula_n φ)"
    using nllength_clause_le by simp
  also have "... = nlllength (formula_n (cl # φ))"
    using nlllength by (simp add: formula_n_def)
  finally show ?case .
qed

lemma variables_shorter_formula:
  fixes φ :: formula and vars :: "nat list"
  assumes "set vars  variables φ" and "distinct vars"
  shows "nllength vars  nlllength (formula_n φ)"
proof -
  let ?nss = "map (map variable) φ"
  have "nllength (concat ?nss)  nlllength ?nss"
    using nlllength_nllength_concat by simp
  then have *: "nllength (concat ?nss)  nlllength (formula_n φ)"
    using nlllength_formula_ge by (meson le_trans)

  have "set vars  set (concat ?nss)"
  proof
    fix n :: nat
    assume "n  set vars"
    then have "n  variables φ"
      using assms(1) by auto
    then obtain c where c: "c  set φ" "Neg n  set c  Pos n  set c"
      using variables_def by auto
    then obtain x where x: "x  set c" "variable x = n"
      by auto
    then show "n  set (concat (map (map variable) φ))"
      using c by auto
  qed
  then have "nllength vars  nllength (concat ?nss)"
    using nllength_distinct_le assms(2) by simp
  then show ?thesis
    using * by simp
qed

lemma ex_assignment_linear_length:
  assumes "satisfiable φ"
  shows "vars. (λv. v  set vars)  φ  nllength vars  nlllength (formula_n φ)"
proof -
  obtain α where α: "α  φ"
    using assms satisfiable_def by auto
  define poss where "poss = {v. v  variables φ  α v}"
  then have "finite poss"
    using finite_variables by simp

  let ?beta = "λv. v  poss"
  have sat: "?beta  φ"
    unfolding satisfies_def
  proof
    fix c :: clause
    assume "c  set φ"
    then have "satisfies_clause α c"
      using satisfies_def α by simp
    then obtain x where x: "x  set c" "satisfies_literal α x"
      using satisfies_clause_def by auto
    show "satisfies_clause ?beta c"
    proof (cases x)
      case (Neg n)
      then have "¬ α n"
        using x(2) by simp
      then have "n  poss"
        using poss_def by simp
      then have "¬ ?beta n"
        by simp
      then have "satisfies_literal ?beta x"
        using Neg by simp
      then show ?thesis
        using satisfies_clause_def x(1) by auto
    next
      case (Pos n)
      then have "α n"
        using x(2) by simp
      then have "n  poss"
        using poss_def Pos c  set φ variables_def x(1) by auto
      then have "?beta n"
        by simp
      then have "satisfies_literal ?beta x"
        using Pos by simp
      then show ?thesis
        using satisfies_clause_def x(1) by auto
    qed
  qed

  obtain vars where vars: "set vars = poss" "distinct vars"
    using `finite poss` by (meson finite_distinct_list)
  moreover from this have "set vars  variables φ"
    using poss_def by simp
  ultimately have "nllength vars  nlllength (formula_n φ)"
    using variables_shorter_formula by simp
  moreover have "(λv. v  set vars)  φ"
    using vars(1) sat by simp
  ultimately show ?thesis
    by auto
qed

lemma ex_witness_linear_length:
  fixes φ :: formula
  assumes "satisfiable φ"
  shows "us.
    bit_symbols us 
    length us = length (formula_to_string φ) 
    verify_sat formula_to_string φ; symbols_to_string us = [𝟭]"
proof -
  obtain vars where vars:
    "(λv. v  set vars)  φ"
    "nllength vars  nlllength (formula_n φ)"
    using assms ex_assignment_linear_length by auto

  define pad where "pad = nlllength (formula_n φ) - nllength vars"
  then have "nllength vars + pad = nlllength (formula_n φ)"
    using vars(2) by simp
  moreover define us where "us = numlist vars @ replicate pad "
  ultimately have "length us = nlllength (formula_n φ)"
    by (simp add: nllength_def)
  then have "length (binencode us) = length (formula_to_string φ)" (is "length ?us = _")
    by (simp add: nlllength_def)
  moreover have "verify_sat formula_to_string φ; symbols_to_string ?us = [𝟭]"
    using us_def vars(1) assms verify_sat_wf by simp
  moreover have "bit_symbols ?us"
  proof -
    have "binencodable (numlist vars)"
      using proper_symbols_numlist symbols_lt_numlist
      by (metis Suc_leI lessI less_Suc_numeral numeral_2_eq_2 numeral_le_iff numeral_less_iff
        order_less_le_trans pred_numeral_simps(3) semiring_norm(74))
    moreover have "binencodable (replicate pad )"
      by simp
    ultimately have "binencodable us"
      using us_def binencodable_append by simp
    then show ?thesis
      using bit_symbols_binencode by simp
  qed
  ultimately show ?thesis
    by blast
qed

lemma bit_symbols_verify_sat: "bit_symbols (verify_sat zs)"
  unfolding verify_sat_def Let_def by simp


subsection ‹A Turing machine for verifying formulas›

text ‹
The core of the function @{const verify_sat} is the expression @{term " (λv. v 
set (zs_numlist vs))  zs_formula xs"}, which checks if an assignment
represented by a list of variable indices satisfies a CNF formula represented by
a list of lists of literals. In this section we devise a Turing machine
performing this check.

Recall that the numbers 0 and 1 are represented by the empty symbol sequence
and the symbol sequence \textbf{1}, respectively. The Turing machines in this
section are described in terms of numbers.

We start with a Turing machine that checks a clause. The TM accepts on tape
$j_1$ a list of numbers representing an assignment $\alpha$ and on tape $j_2$ a
list of numbers representing a clause. It outputs on tape $j_3$ the number 1 if
$\alpha$ satisfies the clause, and otherwise 0. To do this the TM iterates over
all literals in the clause and determines the underlying variable and the sign
of the literal. If the literal is positive and the variable is in the list
representing $\alpha$ or if the literal is negative and the variable is not in
the list, the number 1 is written to the tape $j_3$. Otherwise the tape remains
unchanged. We assume $j_3$ is initialized with 0, and so it will be 1 if and
only if at least one literal is satisfied by $\alpha$.

The TM requires five auxiliary tapes $j_3 + 1, \dots, j_3 + 5$. Tape $j_3 + 1$
stores the literals one at a time, and later the variable; tape $j_3 + 2$ stores
the sign of the literal; tape $j_3 + 3$ stores whether the variable is contained
in $\alpha$; tapes $j_3 + 4$ and $j_3 + 5$ are the auxiliary tapes of @{const
tm_contains}.
›

definition tm_sat_clause :: "tapeidx  tapeidx  tapeidx  machine" where
  "tm_sat_clause j1 j2 j3 
    WHILE [] ; λrs. rs ! j2   DO
      tm_nextract 4 j2 (j3 + 1) ;;
      tm_mod2 (j3 + 1) (j3 + 2) ;;
      tm_div2 (j3 + 1) ;;
      tm_contains j1 (j3 + 1) (j3 + 3) ;;
      IF λrs. rs ! (j3 + 3) =   rs ! (j3 + 2) =   rs ! (j3 + 3)    rs ! (j3 + 2)   THEN
        tm_setn j3 1
      ELSE
        []
      ENDIF ;;
      tm_setn (j3 + 1) 0 ;;
      tm_setn (j3 + 2) 0 ;;
      tm_setn (j3 + 3) 0
    DONE ;;
    tm_cr j2"

lemma tm_sat_clause_tm:
  assumes "k  2" and "G  5" and "j3 + 5 < k" "0 < j1" "j1 < k" "j2 < k" "j1 < j3"
  shows "turing_machine k G (tm_sat_clause j1 j2 j3)"
  using tm_sat_clause_def tm_mod2_tm tm_div2_tm tm_nextract_tm tm_setn_tm tm_contains_tm Nil_tm tm_cr_tm
    assms turing_machine_loop_turing_machine turing_machine_branch_turing_machine
  by simp

locale turing_machine_sat_clause =
  fixes j1 j2 j3 :: tapeidx
begin

definition "tmL1  tm_nextract 4 j2 (j3 + 1)"
definition "tmL2  tmL1 ;; tm_mod2 (j3 + 1) (j3 + 2)"
definition "tmL3  tmL2 ;; tm_div2 (j3 + 1)"
definition "tmL4  tmL3 ;; tm_contains j1 (j3 + 1) (j3 + 3)"
definition "tmI  IF λrs. rs ! (j3 + 3) =   rs ! (j3 + 2) =   rs ! (j3 + 3)    rs ! (j3 + 2)   THEN tm_setn j3 1 ELSE [] ENDIF"
definition "tmL5  tmL4 ;; tmI"
definition "tmL6  tmL5 ;; tm_setn (j3 + 1) 0"
definition "tmL7  tmL6 ;; tm_setn (j3 + 2) 0"
definition "tmL8  tmL7 ;; tm_setn (j3 + 3) 0"
definition "tmL  WHILE [] ; λrs. rs ! j2   DO tmL8 DONE"
definition "tm2  tmL ;; tm_cr j2"

lemma tm2_eq_tm_sat_clause: "tm2 = tm_sat_clause j1 j2 j3"
  unfolding tm2_def tmL_def tmL8_def tmL7_def tmL6_def tmL5_def tmL4_def tmL3_def tmI_def
    tmL2_def tmL1_def tm_sat_clause_def
  by simp

context
  fixes tps0 :: "tape list" and k :: nat and vars :: "nat list" and clause :: clause
  assumes jk: "0 < j1" "j1  j2" "j3 + 5 < k" "j1 < j3" "j2 < j3" "0 < j2" "length tps0 = k"
  assumes tps0:
    "tps0 ! j1 = nltape' vars 0"
    "tps0 ! j2 = nltape' (clause_n clause) 0"
    "tps0 ! j3 = (0N, 1)"
    "tps0 ! (j3 + 1) = (0N, 1)"
    "tps0 ! (j3 + 2) = (0N, 1)"
    "tps0 ! (j3 + 3) = (0N, 1)"
    "tps0 ! (j3 + 4) = (0N, 1)"
    "tps0 ! (j3 + 5) = (0N, 1)"
begin

abbreviation "sat_take t  satisfies_clause (λv. v  set vars) (take t clause)"

definition tpsL :: "nat  tape list" where
  "tpsL t  tps0
    [j2 := nltape' (clause_n clause) t,
     j3 := (sat_take tB, 1)]"

lemma tpsL0: "tpsL 0 = tps0"
proof -
  have "nltape' (clause_n clause) 0 = tps0 ! j2"
    using tps0(2) by presburger
  moreover have "sat_take 0B = 0N"
    using satisfies_clause_def by simp
  ultimately show ?thesis
    using tpsL_def tps0 jk by (metis list_update_id)
qed

definition tpsL1 :: "nat  tape list" where
  "tpsL1 t  tps0
    [j2 := nltape' (clause_n clause) (Suc t),
     j3 := (sat_take tB, 1),
     j3 + 1 := (literal_n (clause ! t)N, 1)]"

lemma tmL1 [transforms_intros]:
  assumes "ttt = 12 + 2 * nlength (clause_n clause ! t)" and "t < length (clause_n clause)"
  shows "transforms tmL1 (tpsL t) ttt (tpsL1 t)"
  unfolding tmL1_def
proof (tform tps: assms tps0 tpsL_def tpsL1_def jk)
  have len: "t < length clause"
    using assms(2) clause_n_def by simp
  show "ttt = 12 + 2 * nlength 0 + 2 * nlength (clause_n clause ! t)"
    using assms(1) by simp
  have *: "j2  j3"
    using jk by simp
  have **: "clause_n clause ! t = literal_n (clause ! t)"
    using len by (simp add: clause_n_def)
  show "tpsL1 t = (tpsL t)
    [j2 := nltape' (clause_n clause) (Suc t),
     j3 + 1 := (clause_n clause ! tN, 1)]"
    unfolding tpsL_def tpsL1_def using list_update_swap[OF *, of tps0] by (simp add: **)
qed

definition tpsL2 :: "nat  tape list" where
  "tpsL2 t  tps0
    [j2 := nltape' (clause_n clause) (Suc t),
     j3 := (sat_take tB, 1),
     j3 + 1 := (literal_n (clause ! t)N, 1),
     j3 + 2 := (literal_n (clause ! t) mod 2N, 1)]"

lemma tmL2 [transforms_intros]:
  assumes "ttt = 12 + 2 * nlength (clause_n clause ! t) + 1"
    and "t < length (clause_n clause)"
  shows "transforms tmL2 (tpsL t) ttt (tpsL2 t)"
  unfolding tmL2_def by (tform tps: assms tps0 tpsL2_def tpsL1_def jk)

definition tpsL3 :: "nat  tape list" where
  "tpsL3 t  tps0
    [j2 := nltape' (clause_n clause) (Suc t),
     j3 := (sat_take tB, 1),
     j3 + 1 := (literal_n (clause ! t) div 2N, 1),
     j3 + 2 := (literal_n (clause ! t) mod 2N, 1)]"

lemma tmL3 [transforms_intros]:
  assumes "ttt = 16 + 4 * nlength (clause_n clause ! t)"
    and "t < length (clause_n clause)"
  shows "transforms tmL3 (tpsL t) ttt (tpsL3 t)"
  unfolding tmL3_def
proof (tform tps: assms(2) tps0 tpsL3_def tpsL2_def jk)
  have len: "t < length clause"
    using assms(2) clause_n_def by simp
  have **: "clause_n clause ! t = literal_n (clause ! t)"
    using len by (simp add: clause_n_def)
  show "ttt = 12 + 2 * nlength (clause_n clause ! t) + 1 + (2 * nlength (literal_n (clause ! t)) + 3)"
    using assms(1) ** by simp
qed

definition tpsL4 :: "nat  tape list" where
  "tpsL4 t  tps0
    [j2 := nltape' (clause_n clause) (Suc t),
     j3 := (sat_take tB, 1),
     j3 + 1 := (literal_n (clause ! t) div 2N, 1),
     j3 + 2 := (literal_n (clause ! t) mod 2N, 1),
     j3 + 3 := (literal_n (clause ! t) div 2  set varsB, 1)]"

lemma tmL4 [transforms_intros]:
  assumes "ttt = 20 + 4 * nlength (clause_n clause ! t) + 67 * (nllength vars)2"
    and "t < length (clause_n clause)"
  shows "transforms tmL4 (tpsL t) ttt (tpsL4 t)"
  unfolding tmL4_def
proof (tform tps: assms(2) tps0 tpsL4_def tpsL3_def jk time: assms(1))
  have "tpsL3 t ! (j3 + 4) = (0N, 1)"
    using tpsL3_def tps0 jk by simp
  then show "tpsL3 t ! (j3 + 3 + 1) = (0N, 1)"
    by (metis ab_semigroup_add_class.add_ac(1) numeral_plus_one semiring_norm(2) semiring_norm(8))
  have "tpsL3 t ! (j3 + 5) = (0N, 1)"
    using tpsL3_def tps0 jk by simp
  then show "tpsL3 t ! (j3 + 3 + 2) = (0N, 1)"
    by (simp add: numeral_Bit1)
qed

definition tpsL5 :: "nat  tape list" where
  "tpsL5 t  tps0
    [j2 := nltape' (clause_n clause) (Suc t),
     j3 := (sat_take (Suc t)B, 1),
     j3 + 1 := (literal_n (clause ! t) div 2N, 1),
     j3 + 2 := (literal_n (clause ! t) mod 2N, 1),
     j3 + 3 := (literal_n (clause ! t) div 2  set varsB, 1)]"

lemma tmI [transforms_intros]:
  assumes "ttt = 16" and "t < length (clause_n clause)"
  shows "transforms tmI (tpsL4 t) ttt (tpsL5 t)"
  unfolding tmI_def
proof (tform tps: jk tpsL4_def time: assms(1))
  show "10 + 2 * nlength (if sat_take t then 1 else 0) + 2 * nlength 1 + 2  ttt"
    using assms(1) nlength_0 nlength_1_simp by simp

  have len: "t < length clause"
    using assms(2) by (simp add: clause_n_def)

  let ?l = "clause ! t"
  have 1: "read (tpsL4 t) ! (j3 + 3) =   literal_n ?l div 2  set vars"
    using tpsL4_def jk read_ncontents_eq_0[of "tpsL4 t" "j3 + 3"] by simp
  have 2: "read (tpsL4 t) ! (j3 + 2) =   literal_n ?l mod 2 = 0"
    using tpsL4_def jk read_ncontents_eq_0[of "tpsL4 t" "j3 + 2"] by simp

  let ?a = "λv. v  set vars"
  let ?cond = "read (tpsL4 t) ! (j3 + 3) =   read (tpsL4 t) ! (j3 + 2) =  
    read (tpsL4 t) ! (j3 + 3)    read (tpsL4 t) ! (j3 + 2)  "
  have *: "?cond  satisfies_literal ?a ?l"
  proof (cases ?l)
    case (Neg v)
    then have "literal_n ?l div 2 = v" "literal_n ?l mod 2 = 0"
      by simp_all
    moreover from this have "satisfies_literal ?a ?l  v  set vars"
      using Neg by simp
    ultimately show ?thesis
      using 1 2 by simp
  next
    case (Pos v)
    then have "literal_n ?l div 2 = v" "literal_n ?l mod 2 = 1"
      by simp_all
    moreover from this have "satisfies_literal ?a ?l  v  set vars"
      using Pos by simp
    ultimately show ?thesis
      using 1 2 by simp
  qed

  have **: "sat_take (Suc t)  sat_take t  satisfies_literal ?a ?l"
    using satisfies_clause_take[OF len] by simp

  show "tpsL5 t = (tpsL4 t)[j3 := (1N, 1)]" if ?cond
  proof -
    have "(if sat_take (Suc t) then 1::nat else 0) = 1"
      using that * ** by simp
    then show ?thesis
      unfolding tpsL5_def tpsL4_def using that by (simp add: list_update_swap)
  qed
  show "tpsL5 t = (tpsL4 t)" if "¬ ?cond"
  proof -
    have "sat_take t = sat_take (Suc t)"
      using * ** that by simp
    then show ?thesis
      unfolding tpsL5_def tpsL4_def using that by (simp add: list_update_swap)
  qed
qed

lemma tmL5 [transforms_intros]:
  assumes "ttt = 36 + 4 * nlength (clause_n clause ! t) + 67 * (nllength vars)2"
    and "t < length (clause_n clause)"
  shows "transforms tmL5 (tpsL t) ttt (tpsL5 t)"
  unfolding tmL5_def by (tform tps: assms)

definition tpsL6 :: "nat  tape list" where
  "tpsL6 t  tps0
    [j2 := nltape' (clause_n clause) (Suc t),
     j3 := (sat_take (Suc t)B, 1),
     j3 + 1 := (0N, 1),
     j3 + 2 := (literal_n (clause ! t) mod 2N, 1),
     j3 + 3 := (literal_n (clause ! t) div 2  set varsB, 1)]"

lemma tmL6 [transforms_intros]:
  assumes "ttt = 46 + 4 * nlength (clause_n clause ! t) + 67 * (nllength vars)2 + 2 * nlength (literal_n (clause ! t) div 2)"
    and "t < length (clause_n clause)"
  shows "transforms tmL6 (tpsL t) ttt (tpsL6 t)"
  unfolding tmL6_def by (tform tps: assms tps0 tpsL6_def tpsL5_def jk)

definition tpsL7 :: "nat  tape list" where
  "tpsL7 t  tps0
    [j2 := nltape' (clause_n clause) (Suc t),
     j3 := (sat_take (Suc t)B, 1),
     j3 + 1 := (0N, 1),
     j3 + 2 := (0N, 1),
     j3 + 3 := (literal_n (clause ! t) div 2  set varsB, 1)]"

lemma tmL7 [transforms_intros]:
  assumes "ttt = 56 + 4 * nlength (clause_n clause ! t) + 67 * (nllength vars)2 + 2 * nlength (literal_n (clause ! t) div 2) +
      2 * nlength (literal_n (clause ! t) mod 2)"
    and "t < length (clause_n clause)"
  shows "transforms tmL7 (tpsL t) ttt (tpsL7 t)"
  unfolding tmL7_def by (tform tps: assms tps0 tpsL7_def tpsL6_def jk)

definition tpsL8 :: "nat  tape list" where
  "tpsL8 t  tps0
    [j2 := nltape' (clause_n clause) (Suc t),
     j3 := (sat_take (Suc t)B, 1),
     j3 + 1 := (0N, 1),
     j3 + 2 := (0N, 1),
     j3 + 3 := (0N, 1)]"

lemma tmL8:
  assumes "ttt = 66 + 4 * nlength (clause_n clause ! t) + 67 * (nllength vars)2 +
      2 * nlength (literal_n (clause ! t) div 2) +
      2 * nlength (literal_n (clause ! t) mod 2) +
      2 * nlength (if literal_n (clause ! t) div 2  set vars then 1 else 0)"
    and "t < length (clause_n clause)"
  shows "transforms tmL8 (tpsL t) ttt (tpsL8 t)"
  unfolding tmL8_def by (tform tps: assms tps0 tpsL8_def tpsL7_def jk)

lemma tmL8':
  assumes "ttt = 70 + 6 * nllength (clause_n clause) + 67 * (nllength vars)2"
    and "t < length (clause_n clause)"
  shows "transforms tmL8 (tpsL t) ttt (tpsL8 t)"
proof -
  let ?l = "literal_n (clause ! t)"
  let ?ll = "clause_n clause ! t"
  let ?t = "66 + 4 * nlength ?ll + 67 * (nllength vars)2 +
      2 * nlength (?l div 2) + 2 * nlength (?l mod 2) + 2 * nlength (if ?l div 2  set vars then 1 else 0)"
  have "?t = 66 + 4 * nlength ?ll + 67 * (nllength vars)2 +
      2 * nlength (?ll div 2) + 2 * nlength (?ll mod 2) + 2 * nlength (if ?ll div 2  set vars then 1 else 0)"
    using assms(2) clause_n_def by simp
  also have "...  66 + 4 * nlength ?ll + 67 * (nllength vars)2 +
      2 * nlength ?ll + 2 * nlength (?ll mod 2) + 2 * nlength (if ?ll div 2  set vars then 1 else 0)"
    using nlength_mono[of "?ll div 2" "?ll"] by simp
  also have "... = 66 + 6 * nlength ?ll + 67 * (nllength vars)2 +
      2 * nlength (?ll mod 2) + 2 * nlength (if ?ll div 2  set vars then 1 else 0)"
    by simp
  also have "...  66 + 6 * nlength ?ll + 67 * (nllength vars)2 +
      2 * nlength 1 +