Theory Inductive_Inference_Basics

chapter ‹Inductive inference of recursive functions\label{c:iirf}›

theory Inductive_Inference_Basics
  imports Standard_Results
begin

text ‹Inductive inference originates from work by
Solomonoff~cite"s-ftiip1-64" and "s-ftiip2-64" and Gold~cite"g-lil-67" and "g-lr-65"
and comes in many variations. The common theme is to infer additional
information about objects, such as formal languages or functions, from incomplete
data, such as finitely many words contained in the language or argument-value
pairs of the function. Oftentimes ``additional information'' means complete
information, such that the task becomes identification of the object.

The basic setting in inductive inference of recursive functions is as follows.
Let us denote, for a total function $f$, by $f^n$ the code of the list
$[f(0), ..., f(n)]$. Let $U$ be a set (called \emph{class}) of total
recursive functions, and $\psi$ a binary partial recursive function
(called \emph{hypothesis space}).
A partial recursive function $S$ (called \emph{strategy})
is said to \emph{learn $U$ in the limit with respect to $\psi$} if
for all $f \in U$,
\begin{itemize}
  \item the value $S(f^n)$ is defined for all $n\in\mathbb{N}$,
  \item the sequence $S(f^0), S(f^1), \ldots$ converges to an
    $i\in\mathbb{N}$ with $\psi_i = f$.
\end{itemize}

Both the output $S(f^n)$ of the strategy and its interpretation
as a function $\psi_{S(f^n)}$ are called \emph{hypothesis}. The set
of all classes learnable in the limit by $S$ with respect to $\psi$ is
denoted by $\mathrm{LIM}_\psi(S)$. Moreover we set $\mathrm{LIM}_\psi =
\bigcup_{S\in\mathcal{P}} \mathrm{LIM}_\psi(S)$ and $\mathrm{LIM} =
\bigcup_{\psi\in\mathcal{P}^2} \mathrm{LIM}_\psi$. We call the latter set the
\emph{inference type} $\mathrm{LIM}$.

Many aspects of this setting can be varied. We shall consider:
\begin{itemize}
  \item Intermediate hypotheses: $\psi_{S(f^n)}$ can be required to be total or
    to be in the class $U$, or to coincide with $f$ on arguments up to $n$, or
    a myriad of other conditions or combinations thereof.
  \item Convergence of hypotheses:
  \begin{itemize}
    \item The strategy can be required to output not a sequence but a single
      hypothesis, which must be correct.
    \item The strategy can be required to converge to a \emph{function} rather
      than an index.
  \end{itemize}
\end{itemize}

We formalize five kinds of results (ℐ› and ℐ'› stand for
inference types):
\begin{itemize}
  \item Comparison of learning power: results of the form @{prop "
     ℐ'"}, in particular showing that the inclusion is proper
    (Sections~\ref{s:fin_cp}, \ref{s:num_fin}, \ref{s:num_cp},
    \ref{s:num_total}, \ref{s:cons_lim}, \ref{s:lim_bc}, \ref{s:total_cons},
    \ref{s:r1_bc}).
  \item Whether ℐ› is closed under the subset relation: @{prop "U
       V  U  V  "}.
  \item Whether ℐ› is closed under union: @{prop "U   
    V    U  V  "} (Section~\ref{s:union}).
  \item Whether every class in ℐ› can be learned with respect to a
    Gödel numbering as hypothesis space (Section~\ref{s:inference_types}).
  \item Whether every class in ℐ› can be learned by a \emph{total}
    recursive strategy (Section~\ref{s:lemma_r}).
\end{itemize}

The bulk of this chapter is devoted to the first category of results. Most
results that we are going to formalize have been called ``classical'' by
Jantke and Beick~cite"jb-cpnii-81", who compare a large number of inference
types. Another comparison is by Case and Smith~cite"cs-cicmii-83". Angluin
and Smith~cite"as-ii-87" give an overview of various forms of inductive
inference.

All (interesting) proofs herein are based on my lecture notes of the
\emph{Induktive Inferenz} lectures by Rolf Wiehagen from 1999/2000 and
2000/2001 at the University of Kaiserslautern. I have given references to the
original proofs whenever I was able to find them. For the other proofs, as
well as for those that I had to contort beyond recognition, I provide proof
sketches.›


section ‹Preliminaries›

text ‹Throughout the chapter, in particular in proof sketches, we use
the following notation.

Let $b\in\mathbb{N}^*$ be a list of numbers. We write $|b|$ for its length
and $b_i$ for the $i$-th element ($i=0,\dots, |b| - 1$). Concatenation of
numbers and lists works in the obvious way; for instance, $jbk$ with
$j,k\in\mathbb{N}$, $b\in\mathbb{N}^*$ refers to the list $jb_0\dots
b_{|b|-1}k$. For $0 \leq i < |b|$, the term $b_{i:=v}$ denotes the list
$b_0\dots b_{i-1}vb_{i+1}\dots b_{|b|-1}$. The notation $b_{<i}$ refers to
$b_0\dots b_{i-1}$ for $0 < i \leq |b|$. Moreover, $v^n$ is short for the
list consisting of $n$ times the value $v \in \mathbb{N}$.

Unary partial functions can be regarded as infinite sequences consisting of
numbers and the symbol~$\uparrow$ denoting undefinedness. We abbreviate the
empty function by $\uparrow^\infty$ and the constant zero function by
$0^\infty$. A function can be written as a list concatenated with a partial
function. For example, $jb\uparrow^\infty$ is the function
\[
x \mapsto \left\{\begin{array}{ll}
  j        & \mbox{if } x = 0,\\
  b_{x-1}  & \mbox{if } 0 < x \leq |b|,\\
  \uparrow & \mbox{otherwise,}
\end{array}\right.
\]
and $jp$, where $p$ is a function, means
\[
x \mapsto \left\{\begin{array}{ll}
  j      & \mbox{if } x = 0,\\
  p(x-1) & \mbox{otherwise.}
\end{array}\right.
\]

A \emph{numbering} is a function $\psi \in \mathcal{P}^2$.›


subsection ‹The prefixes of a function›

text ‹A \emph{prefix}, also called \emph{initial segment}, is a list of
initial values of a function.›

definition prefix :: "partial1  nat  nat list" where
  "prefix f n  map (λx. the (f x)) [0..<Suc n]"

lemma length_prefix [simp]: "length (prefix f n) = Suc n"
  unfolding prefix_def by simp

lemma prefix_nth [simp]:
  assumes "k < Suc n"
  shows "prefix f n ! k = the (f k)"
  unfolding prefix_def using assms nth_map_upt[of k "Suc n" 0 "λx. the (f x)"] by simp

lemma prefixI:
  assumes "length vs > 0" and "x. x < length vs  f x ↓= vs ! x"
  shows "prefix f (length vs - 1) = vs"
  using assms nth_equalityI[of "prefix f (length vs - 1)" vs] by simp

lemma prefixI':
  assumes "length vs = Suc n" and "x. x < Suc n  f x ↓= vs ! x"
  shows "prefix f n = vs"
  using assms nth_equalityI[of "prefix f (length vs - 1)" vs] by simp

lemma prefixE:
  assumes "prefix f (length vs - 1) = vs"
    and "f  "
    and "length vs > 0"
    and "x < length vs"
  shows "f x ↓= vs ! x"
  using assms length_prefix prefix_nth[of x "length vs - 1" f] by simp

lemma prefix_eqI:
  assumes "x. x  n  f x = g x"
  shows "prefix f n = prefix g n"
  using assms prefix_def by simp

lemma prefix_0: "prefix f 0 = [the (f 0)]"
  using prefix_def by simp

lemma prefix_Suc: "prefix f (Suc n) = prefix f n @ [the (f (Suc n))]"
  unfolding prefix_def by simp

lemma take_prefix:
  assumes "f  " and "k  n"
  shows "prefix f k = take (Suc k) (prefix f n)"
proof -
  let ?vs = "take (Suc k) (prefix f n)"
  have "length ?vs = Suc k"
    using assms(2) by simp
  then have "x. x < length ?vs  f x ↓= ?vs ! x"
    using assms by auto
  then show ?thesis
    using prefixI[where ?vs="?vs"] length ?vs = Suc k by simp
qed

text ‹Strategies receive prefixes in the form of encoded lists. The
term ``prefix'' refers to both encoded and unencoded lists. We use the
notation @{text "f ▹ n"} for the prefix $f^n$.›

definition init :: "partial1  nat  nat" (infix "" 110) where
  "f  n  list_encode (prefix f n)"

lemma init_neq_zero: "f  n  0"
  unfolding init_def prefix_def using list_encode_0 by fastforce

lemma init_prefixE [elim]: "prefix f n = prefix g n  f  n = g  n"
  unfolding init_def by simp

lemma init_eqI:
  assumes "x. x  n  f x = g x"
  shows "f  n = g  n"
  unfolding init_def using prefix_eqI[OF assms] by simp

lemma initI:
  assumes "e_length e > 0" and "x. x < e_length e  f x ↓= e_nth e x"
  shows "f  (e_length e - 1) = e"
  unfolding init_def using assms prefixI by simp

lemma initI':
  assumes "e_length e = Suc n" and "x. x <  Suc n  f x ↓= e_nth e x"
  shows "f  n = e"
  unfolding init_def using assms prefixI' by simp

lemma init_iff_list_eq_upto:
  assumes "f  " and "e_length vs > 0"
  shows "(x<e_length vs. f x ↓= e_nth vs x)  prefix f (e_length vs - 1) = list_decode vs"
  using prefixI[OF assms(2)] prefixE[OF _ assms] by auto

lemma length_init [simp]: "e_length (f  n) = Suc n"
  unfolding init_def by simp

lemma init_Suc_snoc: "f  (Suc n) = e_snoc (f  n) (the (f (Suc n)))"
  unfolding init_def by (simp add: prefix_Suc)

lemma nth_init: "i < Suc n  e_nth (f  n) i = the (f i)"
  unfolding init_def using prefix_nth by auto

lemma hd_init [simp]: "e_hd (f  n) = the (f 0)"
  unfolding init_def using init_neq_zero by (simp add: e_hd_nth0)

lemma list_decode_init [simp]: "list_decode (f  n) = prefix f n"
  unfolding init_def by simp

lemma init_eq_iff_eq_upto:
  assumes "g  " and "f  "
  shows "(j<Suc n. g j = f j)  g  n = f  n"
  using assms initI' init_iff_list_eq_upto length_init list_decode_init
  by (metis diff_Suc_1 zero_less_Suc)

definition is_init_of :: "nat  partial1  bool" where
  "is_init_of t f  i<e_length t. f i ↓= e_nth t i"

lemma not_initial_imp_not_eq:
  assumes "x. x < Suc n  f x " and "¬ (is_init_of (f  n) g)"
  shows "f  g"
  using is_init_of_def assms by auto

lemma all_init_eq_imp_fun_eq:
  assumes "f  " and "g  " and "n. f  n = g  n"
  shows "f = g"
proof
  fix n
  from assms have "prefix f n = prefix g n"
    by (metis init_def list_decode_encode)
  then have "the (f n) = the (g n)"
    unfolding init_def prefix_def by simp
  then show "f n = g n"
    using assms(1,2) by (meson R1_imp_total1 option.expand total1E)
qed

corollary neq_fun_neq_init:
  assumes "f  " and "g  " and "f  g"
  shows "n. f  n  g  n"
  using assms all_init_eq_imp_fun_eq by auto

lemma eq_init_forall_le:
  assumes "f  n = g  n" and "m  n"
  shows "f  m = g  m"
proof -
  from assms(1) have "prefix f n = prefix g n"
    by (metis init_def list_decode_encode)
  then have "the (f k) = the (g k)" if "k  n" for k
    using prefix_def that by auto
  then have "the (f k) = the (g k)" if "k  m" for k
    using assms(2) that by simp
  then have "prefix f m = prefix g m"
    using prefix_def by simp
  then show ?thesis by (simp add: init_def)
qed

corollary neq_init_forall_ge:
  assumes "f  n  g  n" and "m  n"
  shows "f  m  g  m"
  using eq_init_forall_le assms by blast

lemma e_take_init:
  assumes "f  " and "k < Suc n"
  shows "e_take (Suc k) (f  n) = f  k"
  using assms take_prefix by (simp add: init_def less_Suc_eq_le)

lemma init_butlast_init:
  assumes "total1 f" and "f  n = e" and "n > 0"
  shows "f  (n - 1) = e_butlast e"
proof -
  let ?e = "e_butlast e"
  have "e_length e = Suc n"
    using assms(2) by auto
  then have len: "e_length ?e = n"
    by simp
  have "f  (e_length ?e - 1) = ?e"
  proof (rule initI)
    show "0 < e_length ?e"
      using assms(3) len by simp
    have "x. x < e_length e  f x ↓= e_nth e x"
      using assms(1,2) total1_def e_length e = Suc n by auto
    then show "x. x < e_length ?e  f x ↓= e_nth ?e x"
      by (simp add: butlast_conv_take)
  qed
  with len show ?thesis by simp
qed

text ‹Some definitions make use of recursive predicates, that is,
$01$-valued functions.›

definition RPred1 :: "partial1 set" ("01") where
  "01  {f. f    (x. f x ↓= 0  f x ↓= 1)}"

lemma RPred1_subseteq_R1: "01  "
  unfolding RPred1_def by auto

lemma const0_in_RPred1: "(λ_. Some 0)  01"
  using RPred1_def const_in_Prim1 by fast

lemma RPred1_altdef: "01 = {f. f    (x. the (f x)  1)}"
  (is "01 = ?S")
proof
  show "01  ?S"
  proof
    fix f
    assume f: "f  01"
    with RPred1_def have "f  " by auto
    from f have "x. f x ↓= 0  f x ↓= 1"
      by (simp add: RPred1_def)
    then have "x. the (f x)  1"
      by (metis eq_refl less_Suc_eq_le zero_less_Suc option.sel)
    with f   show "f  ?S" by simp
  qed
  show "?S  01"
  proof
    fix f
    assume f: "f  ?S"
    then have "f  " by simp
    then have total: "x. f x " by auto
    from f have "x. the (f x) = 0  the (f x) = 1"
      by (simp add: le_eq_less_or_eq)
    with total have "x. f x ↓= 0  f x ↓= 1"
      by (metis option.collapse)
    then show "f  01"
      using f   RPred1_def by auto
  qed
qed

subsection ‹NUM›

text ‹A class of recursive functions is in NUM if it can be
embedded in a total numbering. Thus, for learning such classes there is
always a total hypothesis space available.›

definition NUM :: "partial1 set set" where
  "NUM  {U. ψ2. fU. i. ψ i = f}"

definition NUM_wrt :: "partial2  partial1 set set" where
  "ψ  2  NUM_wrt ψ  {U. fU. i. ψ i = f}"

lemma NUM_I [intro]:
  assumes "ψ  2" and "f. f  U  i. ψ i = f"
  shows "U  NUM"
  using assms NUM_def by blast

lemma NUM_E [dest]:
  assumes "U  NUM"
  shows "U  "
    and "ψ2. fU. i. ψ i = f"
  using NUM_def assms by (force, auto)

lemma NUM_closed_subseteq:
  assumes "U  NUM" and "V  U"
  shows "V  NUM"
  using assms subset_eq[of V U] NUM_I by auto

text ‹This is the classical diagonalization proof showing that there is
no total numbering containing all total recursive functions.›

lemma R1_not_in_NUM: "  NUM"
proof
  assume "  NUM"
  then obtain ψ where num: "ψ  2" "f. i. ψ i = f"
    by auto
  then obtain psi where psi: "recfn 2 psi" "total psi" "eval psi [i, x] = ψ i x" for i x
    by auto
  define d where "d = Cn 1 S [Cn 1 psi [Id 1 0, Id 1 0]]"
  then have "recfn 1 d"
    using psi(1) by simp
  moreover have d: "eval d [x] ↓= Suc (the (ψ x x))" for x
    unfolding d_def using num psi by simp
  ultimately have "(λx. eval d [x])  "
    using R1I by blast
  then obtain i where "ψ i = (λx. eval d [x])"
    using num(2) by auto
  then have "ψ i i = eval d [i]" by simp
  with d have "ψ i i ↓= Suc (the (ψ i i))" by simp
  then show False
    using option.sel[of "Suc (the (ψ i i))"] by simp
qed

text ‹A hypothesis space that contains a function for every prefix will
come in handy. The following is a total numbering with this property.›

definition "r_prenum 
  Cn 2 r_ifless [Id 2 1, Cn 2 r_length [Id 2 0], Cn 2 r_nth [Id 2 0, Id 2 1], r_constn 1 0]"

lemma r_prenum_prim [simp]: "prim_recfn 2 r_prenum"
  unfolding r_prenum_def by simp_all

lemma r_prenum [simp]:
  "eval r_prenum [e, x] ↓= (if x < e_length e then e_nth e x else 0)"
  by (simp add: r_prenum_def)

definition prenum :: partial2 where
  "prenum e x  Some (if x < e_length e then e_nth e x else 0)"

lemma prenum_in_R2: "prenum  2"
  using prenum_def Prim2I[OF r_prenum_prim, of prenum] by simp

lemma prenum [simp]: "prenum e x ↓= (if x < e_length e then e_nth e x else 0)"
  unfolding prenum_def ..

lemma prenum_encode:
  "prenum (list_encode vs) x ↓= (if x < length vs then vs ! x else 0)"
  using prenum_def by (cases "x < length vs") simp_all

text ‹Prepending a list of numbers to a function:›

definition prepend :: "nat list  partial1  partial1" (infixr "" 64) where
  "vs  f  λx. if x < length vs then Some (vs ! x) else f (x - length vs)"

lemma prepend [simp]:
  "(vs  f) x = (if x < length vs then Some (vs ! x) else f (x - length vs))"
  unfolding prepend_def ..

lemma prepend_total: "total1 f  total1 (vs  f)"
  unfolding total1_def by simp

lemma prepend_at_less:
  assumes "n < length vs"
  shows "(vs  f) n ↓= vs ! n"
  using assms by simp

lemma prepend_at_ge:
  assumes "n  length vs"
  shows "(vs  f) n = f (n - length vs)"
  using assms by simp

lemma prefix_prepend_less:
  assumes "n < length vs"
  shows "prefix (vs  f) n = take (Suc n) vs"
  using assms length_prefix by (intro nth_equalityI) simp_all

lemma prepend_eqI:
  assumes "x. x < length vs  g x ↓= vs ! x"
    and "x. g (length vs + x) = f x"
  shows "g = vs  f"
proof
  fix x
  show "g x = (vs  f) x"
  proof (cases "x < length vs")
    case True
    then show ?thesis using assms by simp
  next
    case False
    then show ?thesis
      using assms prepend by (metis add_diff_inverse_nat)
  qed
qed

fun r_prepend :: "nat list  recf  recf" where
  "r_prepend [] r = r"
| "r_prepend (v # vs) r =
     Cn 1 (r_lifz (r_const v) (Cn 1 (r_prepend vs r) [r_dec])) [Id 1 0, Id 1 0]"

lemma r_prepend_recfn:
  assumes "recfn 1 r"
  shows "recfn 1 (r_prepend vs r)"
  using assms by (induction vs) simp_all

lemma r_prepend:
  assumes "recfn 1 r"
  shows "eval (r_prepend vs r) [x] =
    (if x < length vs then Some (vs ! x) else eval r [x - length vs])"
proof (induction vs arbitrary: x)
  case Nil
  then show ?case using assms by simp
next
  case (Cons v vs)
  show ?case
    using assms Cons by (cases "x = 0") (auto simp add: r_prepend_recfn)
qed

lemma r_prepend_total:
  assumes "recfn 1 r" and "total r"
  shows "eval (r_prepend vs r) [x] ↓=
    (if x < length vs then vs ! x else the (eval r [x - length vs]))"
proof (induction vs arbitrary: x)
  case Nil
  then show ?case using assms by simp
next
  case (Cons v vs)
  show ?case
    using assms Cons by (cases "x = 0") (auto simp add: r_prepend_recfn)
qed

lemma prepend_in_P1:
  assumes "f  𝒫"
  shows "vs  f  𝒫"
proof -
  obtain r where r: "recfn 1 r"  "x. eval r [x] = f x"
    using assms by auto
  moreover have "recfn 1 (r_prepend vs r)"
    using r r_prepend_recfn by simp
  moreover have "eval (r_prepend vs r) [x] = (vs  f) x" for x
    using r r_prepend by simp
  ultimately show ?thesis by blast
qed

lemma prepend_in_R1:
  assumes "f  "
  shows "vs  f  "
proof -
  obtain r where r: "recfn 1 r" "total r" "x. eval r [x] = f x"
    using assms by auto
  then have "total1 f"
    using R1_imp_total1[OF assms] by simp
  have "total (r_prepend vs r)"
    using r r_prepend_total r_prepend_recfn totalI1[of "r_prepend vs r"] by simp
  with r have "total (r_prepend vs r)" by simp
  moreover have "recfn 1 (r_prepend vs r)"
    using r r_prepend_recfn by simp
  moreover have "eval (r_prepend vs r) [x] = (vs  f) x" for x
    using r r_prepend total1 f total1E by simp
  ultimately show ?thesis by auto
qed

lemma prepend_associative: "(us @ vs)  f = us  vs  f" (is "?lhs = ?rhs")
proof
  fix x
  consider
      "x < length us"
    | "x  length us  x < length (us @ vs)"
    | "x  length (us @ vs)"
    by linarith
  then show "?lhs x = ?rhs x"
  proof (cases)
    case 1
    then show ?thesis
      by (metis le_add1 length_append less_le_trans nth_append prepend_at_less)
  next
    case 2
    then show ?thesis
      by (smt add_diff_inverse_nat add_less_cancel_left length_append nth_append prepend)
  next
    case 3
    then show ?thesis
      using prepend_at_ge by auto
  qed
qed

abbreviation constant_divergent :: partial1 ("") where
  "  λ_. None"

abbreviation constant_zero :: partial1 ("0") where
  "0  λ_. Some 0"

lemma almost0_in_R1: "vs  0  "
  using RPred1_subseteq_R1 const0_in_RPred1 prepend_in_R1 by auto

text ‹The class $U_0$ of all total recursive functions that are almost
everywhere zero will be used several times to construct
(counter-)examples.›

definition U0 :: "partial1 set" ("U0") where
  "U0  {vs  0 |vs. vs  UNIV}"

text ‹The class @{term U0} contains exactly the functions in the
numbering @{term prenum}.›

lemma U0_altdef: "U0 = {prenum e| e. e  UNIV}" (is "U0 = ?W")
proof
  show "U0  ?W"
  proof
    fix f
    assume "f  U0"
    with U0_def obtain vs where "f = vs  0"
      by auto
    then have "f = prenum (list_encode vs)"
      using prenum_encode by auto
    then show "f  ?W" by auto
  qed
  show "?W  U0"
    unfolding U0_def by fastforce
qed

lemma U0_in_NUM: "U0  NUM"
  using prenum_in_R2 U0_altdef by (intro NUM_I[of prenum]; force)

text ‹Every almost-zero function can be represented by $v0^\infty$ for
a list $v$ not ending in zero.›

lemma almost0_canonical:
  assumes "f = vs  0" and "f  0"
  obtains ws where "length ws > 0" and "last ws  0" and "f = ws  0"
proof -
  let ?P = "λk. k < length vs  vs ! k  0"
  from assms have "vs  []"
    by auto
  then have ex: "k<length vs. vs ! k  0"
    using assms by auto
  define m where "m = Greatest ?P"
  moreover have le: "y. ?P y  y  length vs"
    by simp
  ultimately have "?P m"
    using ex GreatestI_ex_nat[of ?P "length vs"] by simp
  have not_gr: "¬ ?P k" if "k > m" for k
    using Greatest_le_nat[of ?P _ "length vs"] m_def ex le not_less that by blast
  let ?ws = "take (Suc m) vs"
  have "vs  0 = ?ws  0"
  proof
    fix x
    show "(vs  0) x = (?ws  0) x"
    proof (cases "x < Suc m")
      case True
      then show ?thesis using ?P m by simp
    next
      case False
      moreover from this have "(?ws  0) x ↓= 0"
        by simp
      ultimately show ?thesis
        using not_gr by (cases "x < length vs") simp_all
    qed
  qed
  then have "f = ?ws  0"
    using assms(1) by simp
  moreover have "length ?ws > 0"
    by (simp add: vs  [])
  moreover have "last ?ws  0"
    by (simp add: ?P m take_Suc_conv_app_nth)
  ultimately show ?thesis using that by blast
qed


section ‹Types of inference\label{s:inference_types}›

text ‹This section introduces all inference types that we are going to
consider together with some of their simple properties. All these inference
types share the following condition, which essentially says that everything
must be computable:›

abbreviation environment :: "partial2  (partial1 set)  partial1  bool" where
  "environment ψ U s  ψ  𝒫2  U    s  𝒫  (fU. n. s (f  n) )"


subsection ‹LIM: Learning in the limit›

text ‹A strategy $S$ learns a class $U$ in the limit with respect to a
hypothesis space @{term "ψ  𝒫2"} if for all $f\in U$, the
sequence $(S(f^n))_{n\in\mathbb{N}}$ converges to an $i$ with $\psi_i = f$.
Convergence for a sequence of natural numbers means that almost all elements
are the same. We express this with the following notation.›

abbreviation Almost_All :: "(nat  bool)  bool" (binder "" 10) where
  "n. P n  n0. nn0. P n"

definition learn_lim :: "partial2  (partial1 set)  partial1  bool" where
  "learn_lim ψ U s 
     environment ψ U s 
     (fU. i. ψ i = f  (n. s (f  n) ↓= i))"

lemma learn_limE:
  assumes "learn_lim ψ U s"
  shows "environment ψ U s"
    and "f. f  U  i. ψ i = f  (n. s (f  n) ↓= i)"
  using assms learn_lim_def by auto

lemma learn_limI:
  assumes "environment ψ U s"
    and "f. f  U  i. ψ i = f  (n. s (f  n) ↓= i)"
  shows "learn_lim ψ U s"
  using assms learn_lim_def by auto

definition LIM_wrt :: "partial2  partial1 set set" where
  "LIM_wrt ψ  {U. s. learn_lim ψ U s}"

definition Lim :: "partial1 set set" ("LIM") where
  "LIM  {U. ψ s. learn_lim ψ U s}"

text ‹LIM is closed under the the subset relation.›

lemma learn_lim_closed_subseteq:
  assumes "learn_lim ψ U s" and "V  U"
  shows "learn_lim ψ V s"
  using assms learn_lim_def by auto

corollary LIM_closed_subseteq:
  assumes "U  LIM" and "V  U"
  shows "V  LIM"
  using assms learn_lim_closed_subseteq by (smt Lim_def mem_Collect_eq)

text ‹Changing the hypothesis infinitely often precludes learning in
the limit.›

lemma infinite_hyp_changes_not_Lim:
  assumes "f  U" and "n. m1>n. m2>n. s (f  m1)  s (f  m2)"
  shows "¬ learn_lim ψ U s"
  using assms learn_lim_def by (metis less_imp_le)

lemma always_hyp_change_not_Lim:
  assumes "x. s (f  (Suc x))  s (f  x)"
  shows "¬ learn_lim ψ {f} s"
  using assms learn_limE by (metis le_SucI order_refl singletonI)

text ‹Guessing a wrong hypothesis infinitely often precludes learning
in the limit.›

lemma infinite_hyp_wrong_not_Lim:
  assumes "f  U" and "n. m>n. ψ (the (s (f  m)))  f"
  shows "¬ learn_lim ψ U s"
  using assms learn_limE by (metis less_imp_le option.sel)

text ‹Converging to the same hypothesis on two functions precludes
learning in the limit.›

lemma same_hyp_for_two_not_Lim:
  assumes "f1  U"
    and "f2  U"
    and "f1  f2"
    and "nn1. s (f1  n) = h"
    and "nn2. s (f2  n) = h"
  shows "¬ learn_lim ψ U s"
  using assms learn_limE by (metis le_cases option.sel)

text ‹Every class that can be learned in the limit can be learned in
the limit with respect to any Gödel numbering. We prove a generalization in
which hypotheses may have to satisfy an extra condition, so we can re-use it
for other inference types later.›

lemma learn_lim_extra_wrt_goedel:
  fixes extra :: "(partial1 set)  partial1  nat  partial1  bool"
  assumes "goedel_numbering χ"
    and "learn_lim ψ U s"
    and "f n. f  U  extra U f n (ψ (the (s (f  n))))"
  shows "t. learn_lim χ U t  (fU. n. extra U f n (χ (the (t (f  n)))))"
proof -
  have env: "environment ψ U s"
    and lim: "learn_lim ψ U s"
    and extra: "fU. n. extra U f n (ψ (the (s (f  n))))"
    using assms learn_limE by auto
  obtain c where c: "c  " "i. ψ i = χ (the (c i))"
    using env goedel_numberingE[OF assms(1), of ψ] by auto
  define t where "t 
    (λx. if s x   c (the (s x))  then Some (the (c (the (s x)))) else None)"
  have "t  𝒫"
    unfolding t_def using env c concat_P1_P1[of c s] by auto
  have "t x = (if s x  then Some (the (c (the (s x)))) else None)" for x
    using t_def c(1) R1_imp_total1 by auto
  then have t: "t (f  n) ↓= the (c (the (s (f  n))))" if "f  U" for f n
    using lim learn_limE that by simp
  have "learn_lim χ U t"
  proof (rule learn_limI)
    show "environment χ U t"
      using t by (simp add: t  𝒫 env goedel_numbering_P2[OF assms(1)])
    show "i. χ i = f  (n. t (f  n) ↓= i)" if "f  U" for f
    proof -
      from lim learn_limE(2) obtain i n0 where
        i: "ψ i = f  (nn0. s (f  n) ↓= i)"
        using f  U by blast
      let ?j = "the (c i)"
      have "χ ?j = f"
        using c(2) i by simp
      moreover have "t (f  n) ↓= ?j" if "n  n0" for n
        by (simp add: f  U i t that)
      ultimately show ?thesis by auto
    qed
  qed
  moreover have "extra U f n (χ (the (t (f  n))))" if "f  U" for f n
  proof -
    from t have "the (t (f  n)) = the (c (the (s (f  n))))"
      by (simp add: that)
    then have "χ (the (t (f  n))) = ψ (the (s (f  n)))"
      using c(2) by simp
    with extra show ?thesis using that by simp
  qed
  ultimately show ?thesis by auto
qed

lemma learn_lim_wrt_goedel:
  assumes "goedel_numbering χ" and "learn_lim ψ U s"
  shows "t. learn_lim χ U t"
  using assms learn_lim_extra_wrt_goedel[where ?extra="λU f n h. True"]
  by simp

lemma LIM_wrt_phi_eq_Lim: "LIM_wrt φ = LIM"
  using LIM_wrt_def Lim_def learn_lim_wrt_goedel[OF goedel_numbering_phi]
  by blast


subsection ‹BC: Behaviorally correct learning in the limit›

text ‹Behaviorally correct learning in the limit relaxes LIM by
requiring that the strategy almost always output an index for the target
function, but not necessarily the same index. In other words convergence of
$(S(f^n))_{n\in\mathbb{N}}$ is replaced by convergence of
$(\psi_{S(f^n)})_{n\in\mathbb{N}}$.›

definition learn_bc :: "partial2  (partial1 set)  partial1  bool" where
  "learn_bc ψ U s  
     environment ψ U s 
     (fU. n. ψ (the (s (f  n))) = f)"

lemma learn_bcE:
  assumes "learn_bc ψ U s"
  shows "environment ψ U s"
    and "f. f  U  n. ψ (the (s (f  n))) = f"
  using assms learn_bc_def by auto

lemma learn_bcI:
  assumes "environment ψ U s"
    and "f. f  U  n. ψ (the (s (f  n))) = f"
  shows "learn_bc ψ U s"
  using assms learn_bc_def by auto

definition BC_wrt :: "partial2  partial1 set set" where
  "BC_wrt ψ  {U. s. learn_bc ψ U s}"

definition BC :: "partial1 set set" where
  "BC  {U. ψ s. learn_bc ψ U s}"

text ‹BC is a superset of LIM and closed under the subset relation.›

lemma learn_lim_imp_BC: "learn_lim ψ U s  learn_bc ψ U s"
  using learn_limE learn_bcI[of ψ U s] by fastforce

lemma Lim_subseteq_BC: "LIM  BC"
  using learn_lim_imp_BC Lim_def BC_def by blast

lemma learn_bc_closed_subseteq:
  assumes "learn_bc ψ U s" and "V  U"
  shows "learn_bc ψ V s"
  using assms learn_bc_def by auto

corollary BC_closed_subseteq:
  assumes "U  BC" and "V  U"
  shows "V  BC"
  using assms by (smt BC_def learn_bc_closed_subseteq mem_Collect_eq)

text ‹Just like with LIM, guessing a wrong hypothesis infinitely often
precludes BC-style learning.›

lemma infinite_hyp_wrong_not_BC:
  assumes "f  U" and "n. m>n. ψ (the (s (f  m)))  f"
  shows "¬ learn_bc ψ U s"
proof
  assume "learn_bc ψ U s"
  then obtain n0 where "nn0. ψ (the (s (f  n))) = f"
    using learn_bcE assms(1) by metis
  with assms(2) show False using less_imp_le by blast
qed

text ‹The proof that Gödel numberings suffice as hypothesis spaces for
BC is similar to the one for @{thm[source] learn_lim_extra_wrt_goedel}. We do
not need the @{term extra} part for BC, but we get it for free.›

lemma learn_bc_extra_wrt_goedel:
  fixes extra :: "(partial1 set)  partial1  nat  partial1  bool"
  assumes "goedel_numbering χ"
    and "learn_bc ψ U s"
    and "f n. f  U  extra U f n (ψ (the (s (f  n))))"
  shows "t. learn_bc χ U t  (fU. n. extra U f n (χ (the (t (f  n)))))"
proof -
  have env: "environment ψ U s"
    and lim: "learn_bc ψ U s"
    and extra: "fU. n. extra U f n (ψ (the (s (f  n))))"
    using assms learn_bc_def by auto
  obtain c where c: "c  " "i. ψ i = χ (the (c i))"
    using env goedel_numberingE[OF assms(1), of ψ] by auto
  define t where
    "t = (λx. if s x   c (the (s x))  then Some (the (c (the (s x)))) else None)"
  have "t  𝒫"
    unfolding t_def using env c concat_P1_P1[of c s] by auto
  have "t x = (if s x  then Some (the (c (the (s x)))) else None)" for x
    using t_def c(1) R1_imp_total1 by auto
  then have t: "t (f  n) ↓= the (c (the (s (f  n))))" if "f  U" for f n
    using lim learn_bcE(1) that by simp
  have "learn_bc χ U t"
  proof (rule learn_bcI)
    show "environment χ U t"
      using t by (simp add: t  𝒫 env goedel_numbering_P2[OF assms(1)])
    show "n. χ (the (t (f  n))) = f" if "f  U" for f
    proof -
      obtain n0 where "nn0. ψ (the (s (f  n))) = f"
        using lim learn_bcE(2) f  U by blast
      then show ?thesis using that t c(2) by auto
    qed
  qed
  moreover have "extra U f n (χ (the (t (f  n))))" if "f  U" for f n
  proof -
    from t have "the (t (f  n)) = the (c (the (s (f  n))))"
      by (simp add: that)
    then have "χ (the (t (f  n))) = ψ (the (s (f  n)))"
      using c(2) by simp
    with extra show ?thesis using that by simp
  qed
  ultimately show ?thesis by auto
qed

corollary learn_bc_wrt_goedel:
  assumes "goedel_numbering χ" and "learn_bc ψ U s"
  shows "t. learn_bc χ U t"
  using assms learn_bc_extra_wrt_goedel[where ?extra="λ_ _ _ _. True"] by simp

corollary BC_wrt_phi_eq_BC: "BC_wrt φ = BC"
  using learn_bc_wrt_goedel goedel_numbering_phi BC_def BC_wrt_def by blast


subsection ‹CONS: Learning in the limit with consistent hypotheses›

text ‹A hypothesis is \emph{consistent} if it matches all values in the
prefix given to the strategy. Consistent learning in the limit requires the
strategy to output only consistent hypotheses for prefixes from the class.›

definition learn_cons :: "partial2  (partial1 set)  partial1  bool" where
  "learn_cons ψ U s 
     learn_lim ψ U s 
     (fU. n. kn. ψ (the (s (f  n))) k = f k)"

definition CONS_wrt :: "partial2  partial1 set set" where
  "CONS_wrt ψ  {U. s. learn_cons ψ U s}"

definition CONS :: "partial1 set set" where
  "CONS  {U. ψ s. learn_cons ψ U s}"

lemma CONS_subseteq_Lim: "CONS  LIM"
  using CONS_def Lim_def learn_cons_def by blast

lemma learn_consI:
  assumes "environment ψ U s"
    and "f. f  U  i. ψ i = f  (n. s (f  n) ↓= i)"
    and "f n. f  U  kn. ψ (the (s (f  n))) k = f k"
  shows "learn_cons ψ U s"
  using assms learn_lim_def learn_cons_def by simp

text ‹If a consistent strategy converges, it automatically converges to
a correct hypothesis. Thus we can remove @{term "ψ i = f"} from the second
assumption in the previous lemma.›

lemma learn_consI2:
  assumes "environment ψ U s"
    and "f. f  U  i. n. s (f  n) ↓= i"
    and "f n. f  U  kn. ψ (the (s (f  n))) k = f k"
  shows "learn_cons ψ U s"
proof (rule learn_consI)
  show "environment ψ U s"
    and cons: "f n. f  U  kn. ψ (the (s (f  n))) k = f k"
    using assms by simp_all
  show "i. ψ i = f  (n. s (f  n) ↓= i)" if "f  U" for f
  proof -
    from that assms(2) obtain i n0 where i_n0: "nn0. s (f  n) ↓= i"
      by blast
    have "ψ i x = f x" for x
    proof (cases "x  n0")
      case True
      then show ?thesis
        using i_n0 cons that by fastforce
    next
      case False
      moreover have "kx. ψ (the (s (f  x))) k = f k"
        using cons that by simp
      ultimately show ?thesis using i_n0 by simp
    qed
    with i_n0 show ?thesis by auto
  qed
qed

lemma learn_consE:
  assumes "learn_cons ψ U s"
  shows "environment ψ U s"
    and "f. f  U  i n0. ψ i = f  (nn0. s (f  n) ↓= i)"
    and "f n. f  U  kn. ψ (the (s (f  n))) k = f k"
  using assms learn_cons_def learn_lim_def by auto

lemma learn_cons_wrt_goedel:
  assumes "goedel_numbering χ" and "learn_cons ψ U s"
  shows "t. learn_cons χ U t"
  using learn_cons_def assms
    learn_lim_extra_wrt_goedel[where ?extra="λU f n h. kn. h k = f k"]
  by auto

lemma CONS_wrt_phi_eq_CONS: "CONS_wrt φ = CONS"
  using CONS_wrt_def CONS_def learn_cons_wrt_goedel goedel_numbering_phi
  by blast

lemma learn_cons_closed_subseteq:
  assumes "learn_cons ψ U s" and "V  U"
  shows "learn_cons ψ V s"
  using assms learn_cons_def learn_lim_closed_subseteq by auto

lemma CONS_closed_subseteq:
  assumes "U  CONS" and "V  U"
  shows "V  CONS"
  using assms learn_cons_closed_subseteq by (smt CONS_def mem_Collect_eq)

text ‹A consistent strategy cannot output the same hypothesis for two
different prefixes from the class to be learned.›

lemma same_hyp_different_init_not_cons:
  assumes "f  U"
    and "g  U"
    and "f  n  g  n"
    and "s (f  n) = s (g  n)"
  shows "¬ learn_cons φ U s"
  unfolding learn_cons_def by (auto, metis assms init_eqI)


subsection ‹TOTAL: Learning in the limit with total hypotheses›

text ‹Total learning in the limit requires the strategy to hypothesize
only total functions for prefixes from the class.›

definition learn_total :: "partial2  (partial1 set)  partial1  bool" where
  "learn_total ψ U s 
     learn_lim ψ U s 
     (fU. n. ψ (the (s (f  n)))  )"

definition TOTAL_wrt :: "partial2  partial1 set set" where
  "TOTAL_wrt ψ  {U. s. learn_total ψ U s}"

definition TOTAL :: "partial1 set set" where
  "TOTAL  {U. ψ s. learn_total ψ U s}"

lemma TOTAL_subseteq_LIM: "TOTAL  LIM"
  unfolding TOTAL_def Lim_def using learn_total_def by auto

lemma learn_totalI:
  assumes "environment ψ U s"
    and "f. f  U  i. ψ i = f  (n. s (f  n) ↓= i)"
    and "f n. f  U  ψ (the (s (f  n)))  "
  shows "learn_total ψ U s"
  using assms learn_lim_def learn_total_def by auto

lemma learn_totalE:
  assumes "learn_total ψ U s"
  shows "environment ψ U s"
    and "f. f  U  i n0. ψ i = f  (nn0. s (f  n) ↓= i)"
    and "f n. f  U  ψ (the (s (f  n)))  "
  using assms learn_lim_def learn_total_def by auto

lemma learn_total_wrt_goedel:
  assumes "goedel_numbering χ" and "learn_total ψ U s"
  shows "t. learn_total χ U t"
  using learn_total_def assms learn_lim_extra_wrt_goedel[where ?extra="λU f n h. h  "]
  by auto

lemma TOTAL_wrt_phi_eq_TOTAL: "TOTAL_wrt φ = TOTAL"
  using TOTAL_wrt_def TOTAL_def learn_total_wrt_goedel goedel_numbering_phi
  by blast

lemma learn_total_closed_subseteq:
  assumes "learn_total ψ U s" and "V  U"
  shows "learn_total ψ V s"
  using assms learn_total_def learn_lim_closed_subseteq by auto

lemma TOTAL_closed_subseteq:
  assumes "U  TOTAL" and "V  U"
  shows "V  TOTAL"
  using assms learn_total_closed_subseteq by (smt TOTAL_def mem_Collect_eq)


subsection ‹CP: Learning in the limit with class-preserving hypotheses›

text ‹Class-preserving learning in the limit requires all hypotheses
for prefixes from the class to be functions from the class.›

definition learn_cp :: "partial2  (partial1 set)  partial1  bool" where
  "learn_cp ψ U s 
     learn_lim ψ U s 
     (fU. n. ψ (the (s (f  n)))  U)"

definition CP_wrt :: "partial2  partial1 set set" where
  "CP_wrt ψ  {U. s. learn_cp ψ U s}"

definition CP :: "partial1 set set" where
  "CP  {U. ψ s. learn_cp ψ U s}"

lemma learn_cp_wrt_goedel:
  assumes "goedel_numbering χ" and "learn_cp ψ U s"
  shows "t. learn_cp χ U t"
  using learn_cp_def assms learn_lim_extra_wrt_goedel[where ?extra="λU f n h. h  U"]
  by auto

corollary CP_wrt_phi: "CP = CP_wrt φ"
  using learn_cp_wrt_goedel[OF goedel_numbering_phi]
  by (smt CP_def CP_wrt_def Collect_cong)

lemma learn_cpI:
  assumes "environment ψ U s"
    and "f. f  U  i. ψ i = f  (n. s (f  n) ↓= i)"
    and "f n. f  U  ψ (the (s (f  n)))  U"
  shows "learn_cp ψ U s"
  using assms learn_cp_def learn_lim_def by auto

lemma learn_cpE:
  assumes "learn_cp ψ U s"
  shows "environment ψ U s"
    and "f. f   U  i n0. ψ i = f  (nn0. s (f  n) ↓= i)"
    and "f n. f  U  ψ (the (s (f  n)))  U"
  using assms learn_lim_def learn_cp_def by auto

text ‹Since classes contain only total functions, a CP strategy is also
a TOTAL strategy.›

lemma learn_cp_imp_total: "learn_cp ψ U s  learn_total ψ U s"
  using learn_cp_def learn_total_def learn_lim_def by auto

lemma CP_subseteq_TOTAL: "CP  TOTAL"
  using learn_cp_imp_total CP_def TOTAL_def by blast


subsection ‹FIN: Finite learning›

text ‹In general it is undecidable whether a LIM strategy has reached
its final hypothesis. By contrast, in finite learning (also called ``one-shot
learning'') the strategy signals when it is ready to output a hypothesis. Up
until then it outputs a ``don't know yet'' value. This value is represented
by zero and the actual hypothesis $i$ by $i + 1$.›

definition learn_fin :: "partial2  partial1 set  partial1  bool" where
  "learn_fin ψ U s 
     environment ψ U s 
     (f  U. i n0. ψ i = f  (n<n0. s (f  n) ↓= 0)  (nn0. s (f  n) ↓= Suc i))"

definition FIN_wrt :: "partial2  partial1 set set" where
  "FIN_wrt ψ  {U. s. learn_fin ψ U s}"

definition FIN :: "partial1 set set" where
  "FIN  {U. ψ s. learn_fin ψ U s}"

lemma learn_finI:
  assumes "environment ψ U s"
    and "f. f  U 
      i n0. ψ i = f  (n<n0. s (f  n) ↓= 0)  (nn0. s (f  n) ↓= Suc i)"
  shows "learn_fin ψ U s"
  using assms learn_fin_def by auto

lemma learn_finE:
  assumes "learn_fin ψ U s"
  shows "environment ψ U s"
    and "f. f  U 
      i n0. ψ i = f  (n<n0. s (f  n) ↓= 0)  (nn0. s (f  n) ↓= Suc i)"
  using assms learn_fin_def by auto

lemma learn_fin_closed_subseteq:
  assumes "learn_fin ψ U s" and "V  U"
  shows "learn_fin ψ V s"
  using assms learn_fin_def by auto

lemma learn_fin_wrt_goedel:
  assumes "goedel_numbering χ" and "learn_fin ψ U s"
  shows "t. learn_fin χ U t"
proof -
  have env: "environment ψ U s"
    and fin: "f. f  U 
      i n0. ψ i = f  (n<n0. s (f  n) ↓= 0)  (nn0. s (f  n) ↓= Suc i)"
    using assms(2) learn_finE by auto
  obtain c where c: "c  " "i. ψ i = χ (the (c i))"
    using env goedel_numberingE[OF assms(1), of ψ] by auto
  define t where "t 
    λx. if s x  then None
         else if s x = Some 0 then Some 0
              else Some (Suc (the (c (the (s x) - 1))))"
  have "t  𝒫"
  proof -
    from c obtain rc where rc:
      "recfn 1 rc"
      "total rc"
      "x. c x = eval rc [x]"
      by auto
    from env obtain rs where rs: "recfn 1 rs" "x. s x = eval rs [x]"
      by auto
    then have "eval rs [f  n] " if "f  U" for f n
      using env that by simp
    define rt where "rt = Cn 1 r_ifz [rs, Z, Cn 1 S [Cn 1 rc [Cn 1 r_dec [rs]]]]"
    then have "recfn 1 rt"
      using rc(1) rs(1) by simp
    have "eval rt [x] " if "eval rs [x] " for x
      using rc(1) rs(1) rt_def that by auto
    moreover have "eval rt [x] ↓= 0" if "eval rs [x] ↓= 0" for x
      using rt_def that rc(1,2) rs(1) by simp
    moreover have "eval rt [x] ↓= Suc (the (c (the (s x) - 1)))" if "eval rs [x] ↓≠ 0" for x
      using rt_def that rc rs by auto
    ultimately have "eval rt [x] = t x" for x
      by (simp add: rs(2) t_def)
    with recfn 1 rt show ?thesis by auto
  qed
  have t: "t (f  n) ↓=
      (if s (f  n) = Some 0 then 0 else Suc (the (c (the (s (f  n)) - 1))))"
    if "f  U" for f n
    using that env by (simp add: t_def)
  have "learn_fin χ U t"
  proof (rule learn_finI)
    show "environment χ U t"
      using t by (simp add: t  𝒫 env goedel_numbering_P2[OF assms(1)])
    show "i n0. χ i = f  (n<n0. t (f  n) ↓= 0)  (nn0. t (f  n) ↓= Suc i)"
      if "f  U" for f
    proof -
      from fin obtain i n0 where
        i: "ψ i = f  (n<n0. s (f  n) ↓= 0)  (nn0. s (f  n) ↓= Suc i)"
        using f  U by blast
      let ?j = "the (c i)"
      have "χ ?j = f"
        using c(2) i by simp
      moreover have "n<n0. t (f  n) ↓= 0"
        using t[OF that] i by simp
      moreover have "t (f  n) ↓= Suc ?j" if "n  n0" for n
        using that i t[OF f  U] by simp
      ultimately show ?thesis by auto
    qed
  qed
  then show ?thesis by auto
qed

end