section ‹Introduction and Definition› theory Universal_Hash_Families imports "HOL-Probability.Independent_Family" begin text ‹Universal hash families are commonly used in randomized algorithms and data structures to randomize the input of algorithms, such that probabilistic methods can be employed without requiring any assumptions about the input distribution. If we regard a family of hash functions from a domain $D$ to a finite range $R$ as a uniform probability space, then the family is $k$-universal if: \begin{itemize} \item For each $x \in D$ the evaluation of the functions at $x$ forms a uniformly distributed random variable on $R$. \item The evaluation random variables for $k$ or fewer distinct domain elements form an independent family of random variables. \end{itemize} This definition closely follows the definition from Vadhan~\<^cite>‹‹\textsection 3.5.5› in "vadhan2012"›, with the minor modification that independence is required not only for exactly $k$, but also for \emph{fewer} than $k$ distinct domain elements. The correction is due to the fact that in the corner case where $D$ has fewer than $k$ elements, the second part of their definition becomes void. In the formalization this helps avoid an unnecessary assumption in the theorems. The following definition introduces the notion of $k$-wise independent random variables:› definition (in prob_space) k_wise_indep_vars where "k_wise_indep_vars k M' X I = (∀J ⊆ I. card J ≤ k ⟶ finite J ⟶ indep_vars M' X J)" lemma (in prob_space) k_wise_indep_vars_subset: assumes "k_wise_indep_vars k M' X I" assumes "J ⊆ I" assumes "finite J" assumes "card J ≤ k" shows "indep_vars M' X J" using assms by (simp add:k_wise_indep_vars_def) lemma (in prob_space) k_wise_indep_subset: assumes "J ⊆ I" assumes "k_wise_indep_vars k M' X' I" shows "k_wise_indep_vars k M' X' J" using assms unfolding k_wise_indep_vars_def by simp text ‹Similarly for a finite non-empty set $A$ the predicate @{term "uniform_on X A"} indicates that the random variable is uniformly distributed on $A$:› definition (in prob_space) "uniform_on X A = ( distr M (count_space UNIV) X = uniform_measure (count_space UNIV) A ∧ A ≠ {} ∧ finite A ∧ random_variable (count_space UNIV) X)" lemma (in prob_space) uniform_onD: assumes "uniform_on X A" shows "prob {ω ∈ space M. X ω ∈ B} = card (A ∩ B) / card A" proof - have "prob {ω ∈ space M. X ω ∈ B} = prob (X -` B ∩ space M)" by (subst Int_commute, simp add:vimage_def Int_def) also have "... = measure (distr M (count_space UNIV) X) B" using assms by (subst measure_distr, auto simp:uniform_on_def) also have "... = measure (uniform_measure (count_space UNIV) A) B" using assms by (simp add:uniform_on_def) also have "... = card (A ∩ B) / card A" using assms by (subst measure_uniform_measure, auto simp:uniform_on_def)+ finally show ?thesis by simp qed text ‹With the two previous definitions it is possible to define the $k$-universality condition for a family of hash functions from $D$ to $R$:› definition (in prob_space) "k_universal k X D R = ( k_wise_indep_vars k (λ_. count_space UNIV) X D ∧ (∀i ∈ D. uniform_on (X i) R))" text ‹Note: The definition is slightly more generic then the informal specification from above. This is because usually a family is formed by a single function with a variable seed parameter. Instead of choosing a random function from a probability space, a random seed is chosen from the probability space which parameterizes the hash function. The following section contains some preliminary results about independent families of random variables. Section~\ref{sec:carter_wegman} introduces the Carter-Wegman hash family, which is an explicit construction of $k$-universal families for arbitrary $k$ using polynomials over finite fields. The last section contains a proof that the factor ring of the integers modulo a prime ideal is a finite field, followed by an isomorphic construction of prime fields over an initial segment of the natural numbers.› end