Theory Alpha_Beta_Overview

chapter ‹Overview›

(*<*)
theory Alpha_Beta_Overview
imports
  Alpha_Beta_Lattice
  LaTeXsugar2
begin
(*>*)

text‹
\section{Introduction}

Alpha-beta pruning is an efficient search strategy for two-player game trees.
It was invented in the late 1950s and is at the heart of most implementations
of combinatorial game playing programs. Most publications assume that the game values
are numbers augmented with $\pm\infty$. This generalizes easily to an arbitrary linear order
with $\bot$ and $\top$ values. We consider this standard case first. Later it was realized
that alpha-beta pruning can be generalized to distributive lattices.
We consider this case separately. In both cases we analyze two variants:
\emph{fail-hard} (analyzed by Knuth and Moore \cite{KnuthM75})
and \emph{fail-soft} (introduced by Fishburn~\cite{Fishburn80}).
Our analysis focusses on functional correctness, not quantitative results.
All algorithms operate on game trees of this type:
\begin{quote}
@{datatype tree}
\end{quote}

%During our investigations we also discovered that the implementation of alpha-beta pruning
%that Hughes \cite{Hughes89} derives is (probably) correct in that it computes the correct values,
%but it visits more nodes than alpha-beta pruning would, i.e.\ it is not alpha-beta pruning.
%Hughes was unaware of this when we contacted him but had Quickcheck confirm it.
%Verification is not a luxury.


\section{Linear Orders}

We assume that the type of values is a bounded linear order with $\bot$ and $\top$.
Game trees are evaluated in the standard manner via functions @{const maxmin} (the maximizer)
and the dual @{const minmax} (the minimizer).
\begin{quote}
@{fun maxmin}\medskip\\
@{fun minmax}\bigskip\\
@{fun maxs}\medskip\\
@{fun mins}
\end{quote}
The maximizer and minimizer functions are dual to each other. In this overview we will only show
the maximizer side from now on.


\subsection{Fail-Hard}

The fail-hard variant of alpha-beta pruning is defined like this:
\begin{quote}
@{fun ab_max}\medskip\\
@{fun [margin=70] ab_maxs}
\end{quote}
The intuitive meaning of @{term "ab_max a b t"} roughly is this:
search t›, focussing on values in the interval from a› to b›.
That is, a› is the maximum value that the maximizer is already assured of
and b› is the minimum value that the minimizer is already assured of (by the search so far).
During the search, the maximizer will increase a›, the minimizer will decrease b›.

The desired correctness property is that alpha-beta pruning with the full interval yields
the value of the game tree:
\begin{quote}
@{thm ab_max_bot_top} \hfill \eqnum{ab_max_bot_top}
\end{quote}
Knuth and Moore generalize this property for arbitrary a› and b›,
using the following predicate:
\begin{quote}
@{prop "knuth2 x y a b"} ≡›\\
@{prop [source] "((y  a  x  a) latex‹\\› (a < y  y < b  y = x) latex‹\\› (b  y  b  x))"}
\end{quote}
It follows easily that @{prop "knuth   x y"} implies x = y›.
(Also interesting to note is commutativity: @{thm knuth_comm}.)
We have verified Knuth and Moore's correctness theorem
\begin{quote}
@{thm knuth_val_ab(1)}
\end{quote}
which immediately implies (\ref{ab_max_bot_top}).


\subsection{Fail-Soft}

Fishburn introduced the fail-soft variant that agrees with fail-hard if the value is in between a› and b› but is more precise otherwise,
where fail-hard just returns a› or b›:
\begin{quote}
@{fun ab_max'}\medskip\\
@{fun [margin=70] ab_maxs'}
\end{quote}

Fishburn claims that fail-soft searches the same part of the tree as fail-hard
but that sometimes fail-soft bounds the real value more tightly than fail-hard
because fail-soft satisfies
\begin{quote}
@{thm fishburn_val_ab'(1)} \hfill \eqnum{fishburn_val_ab'}
\end{quote}
where ≤› is a strengthened version of ≅›:
\begin{quote}
@{prop "fishburn2 ab v a b"} ≡›\\
@{prop [source] "((ab  a  v  ab) latex‹\\› (a < ab  ab < b  ab = v) latex‹\\› (b  ab  ab  v))"}
%{abbrev [break,margin=40] "fishburn a b v ab"}
\end{quote}
We can confirm both claims. However, what Fishburn misses is that fail-hard already satisfies
@{const fishburn}:
\begin{quote}
@{thm fishburn_val_ab(1)}
\end{quote}
Thus (\ref{fishburn_val_ab'}) does not imply that fail-soft is better.
However, we have proved
\begin{quote}
@{thm fishburn_ab'_ab(1)}
\end{quote}
which does indeed show that fail-soft approximates the real value at least as well as fail-hard.

Fail-soft does not improve matters if one only looks at the top-level call with ⊥› and ⊤›.
It comes into its own when the tree is actually a graph and nodes are visited repeatedly from
different directions with different a› and b› which are typically not ⊥› and ⊤›.
Then it becomes crucial to store the results of previous alpha-beta calls in a cache and
use it to (possibly) narrow the search window on successive searches of the same subgraph.
The justification: Let ab› be some search function that @{const fishburn} the real value.
If on a previous call @{prop "b  ab a b t"}, then in a subsequent search of the same t›
with possibly different a'› and b'›, we can replace a'› by @{term "(max a' (ab a b t))"}:
\begin{quote}
@{thm [break,margin=70] ab_twice_lb}
\end{quote}
There is a dual lemma for replacing b'› by @{term "(min b' (ab a b t))"}.

We have a verified version of alpha-beta pruning with a cache, but it is not yet
part of this development.


\subsection{Negation}

Traditionally the definition of both the evaluation and of alpha-beta pruning
does not define a minimizer and maximizer separately. Instead it defines only one function
and uses negation (on numbers!) to flip between the two players. This is evaluation and the
fail-hard and fail-soft variants of alpha-beta pruning:
\begin{quote}
@{fun negmax}
\end{quote}
\begin{quote}
@{fun ab_negmax}\medskip\\
@{fun ab_negmaxs}
\end{quote}
\begin{quote}
@{fun ab_negmax'}\medskip\\
@{fun ab_negmaxs'}
\end{quote}
However, what does ``-›'' on a linear order mean? It turns out that the following two
properties of ``-›'' are required to make things work:
\begin{quote}
@{thm de_morgan_min} \qquad @{thm neg_neg}
\end{quote}
We call such linear orders \emph{de Morgan orders}. We have proved correctness of alpha-beta pruning
on de Morgan orders:
\begin{quote}
@{thm fishburn_negmax_ab_negmax(1)}\\
@{thm fishburn_val_ab_neg'(1)}\\
@{thm fishburn_ab'_ab_neg(1)}
\end{quote}


\section{Lattices}

Bird and Hughes \cite{BirdH87} were the first to generalize alpha-beta pruning from linear orders
to lattices. The generalization of the code is easy: simply replace @{const min} and @{const max}
by @{const inf} and @{const sup}. Thus, the value of a game tree is now defined like this:
\begin{quote}
@{fun supinf}\medskip\\
@{fun sups}
\end{quote}
And similarly fail-hard and fail-soft alpha-beta pruning:
\begin{quote}
@{fun ab_sup}\medskip\\
@{fun [margin=70] ab_sups}
\end{quote}
\begin{quote}
@{fun ab_sup'}\medskip\\
@{fun [margin=70] ab_sups'}
\end{quote}
It turns out that this version of alpha-beta pruning works for bounded distributive lattices.
To prove this we can generalize both ≅› and ≤› by first rephrasing them (for linear orders)
\begin{quote}
@{thm knuth_iff_max_min}

@{thm fishburn_iff_min_max}
\end{quote}
and then deriving from the right-hand sides new versions ≃› and ⊑› appropriate for lattices:
\begin{quote}
@{prop "eq_mod x y a b"} ≡› @{prop [source] "a  x  b = a  y  b"}

@{prop "bounded2 ab v a b"} ≡› @{prop [source] "b  v  ab  ab  a  v"}
\end{quote}
%Trivially {lemma "⊥ ⊔ x ⊓ ⊤ = ⊥ ⊔ y ⊓ ⊤ ⟹ x = (y::_::bounded_lattice)" by (simp)}.
%Bird and Hughes do not cite Knuth and Moore
As for linear orders, ⊑› implies ≃›, but not the other way around:
\begin{quote}
@{thm eq_mod_if_bounded}
\end{quote}
Both fail-hard and fail-soft are correct w.r.t.\ ⊑›:
\begin{quote}
@{thm bounded_val_ab(1)}\\
@{thm bounded_val_ab'(1)}
\end{quote}


\subsection{Negation}

This time we extend bounded distributive lattices to \emph{de Morgan algebras} by adding ``-›''
and assuming
\begin{quote}
@{thm de_Morgan_inf} \qquad @{thm neg_neg}
\end{quote}
Game tree evaluation:
\begin{quote}
@{fun negsup}
\end{quote}
Fail-hard alpha-beta pruning:
\begin{quote}
@{fun ab_negsup}\medskip\\
@{fun ab_negsups}
\end{quote}
Fail-soft alpha-beta pruning:
\begin{quote}
@{fun ab_negsup'}\medskip\\
@{fun ab_negsups'}
\end{quote}
Correctness:
\begin{quote}
@{thm bounded_negsup_ab_negsup}\\
@{thm bounded_val_ab'_neg(1)}
\end{quote}

›
(*<*)
end
(*>*)