Theory In_Out_Parameters_Ex

(*
 * Copyright (c) 2023 Apple Inc. All rights reserved.
 *
 * SPDX-License-Identifier: BSD-2-Clause
 *)

section "In-Out Parameters, Abstracting Pointers to Values"

theory In_Out_Parameters_Ex imports "AutoCorres" 
begin

consts abs_step:: "word32  word32  bool"

subsection ‹Overview›

text ‹We introduce a new phase in AutoCorres to replace pointer parameters by ‹in/out parameters›:
Instead of passing a pointer into a function we pass the initial value of the pointer 
(by dereferencing the pointer) into the function and return the value at the end of the function
as additional output. E.g. a function ‹void inc(unsigned *n)› becomes ‹unsigned inc(unsigned n)›.
The initial motivation for this phase was to get rid of explicit pointers to local variables 
and replace them by ordinary values instead.
This conversion has two main aspects:
 Function signatures may change: pointer parameters become value parameters and
  the function may return the value of the pointer at the end as an additonal
  return value (tupled with the ordinary return value). Functions with side effects on the heap
  may become pure functions on values by this transformation.
 As a result of the first transformation, pointers to local variables may be eliminated 
  and be replaced by bound variables. This means that conststack_heap_state.with_fresh_stack_ptr 
  disappears.

The new phase is called ‹IO› and is placed between L2 and HL. This means local variables are
already represented as bound variables in L2 and we still operate on a monolithic heap.

›

subsection ‹Building Blocks›

text ‹
What ingredients do we need for the abstraction relation in the refinement proof?
 Heaps: As we eliminate pointers, the original function manipulates the heap more 
  often then the resulting function. So we need a notion to relate the heap states and keep track
  of the relevant pointers. As the stack of local variable pointers is also modelled in the heap
  the stack free locations are also of interest. The heaps of the original and the resulting
  program may differ in these locations. The heap value of a stack free location should be 
  irrelevant for the program.

 Function signatures: The function signature changes, pointer parameters become ordinary values,
  the return value is tupled with output parameters.

In the following we refer to the original state (containing the heap) as terms and the
resulting / abstract state as termt.
As I first idea we want to relate states terms and termt such that the heaps are the
same except for some pointers we want to eliminate and the stack free locations. It would be natural
to describe this as a relation. However, it turns out that dealing with relations within the
refinement proof are hard to make admissible in order to support recursive functions. Having a
abstraction / lifting function (instead of a relation) from terms to termt makes admissibility
straightforward.

As a prerequisite we encourage the reader to consider the documentation about
pointers to local variables: 🗏‹pointers_to_locals.thy›

context heap_state
begin
subsubsection constframe

text ‹We want to express that certain portions of the heap (typing and values) are 'irrelevant',
in particular regarding (free) stack space. The notion of 'irrelevant' is a bit vague, it means
that the behaviour of the resulting program does not depend on those locations and also that it does not 
modify those locations. Moreover, we prefer an abstraction function rather than an 
relation between states to avoid admissibility issues for refinement.
The central property is termt = frame A t0 s, for termA being a set of addresses and
termt, termt0 and terms being states. Think of termA as the set of ‹allocated› addresses
containing those pointers we want to ‹abstract› aka. eliminate.›

thm frame_def

text ‹
The standard use case we have in mind is @{term "A  stack_free (htd s) = {}"}, hence
@{prop "A - stack_free (htd s) = A"}, but nevertheless the intuition is:
 stack free typing from terms is preserved, the framed sate has at least as many stack free
  addresses as the original one. So we can simulate any stack allocation.
 heap values for stack free and termA are taken from reference state termt0, this
  captures that we do not depend on the original values in terms for those addresses.
 typing for allocations in termA is taken from reference state termt0, this captures
  that we do not depend on the original typing in terms for those addresses.

By taking the same reference state termt0 to frame two states terms and terms',
we can express that the 'irrelevant' parts of the heap did not change in the respective
framed states termframe A t0 s and termframe A t0 s'.
›

subsubsection constrel_alloc, constrel_stack, constrel_sum_stack

text ‹The core invariant between the states that is maintained by the refinement is
termrel_alloc 𝒮 M A t0 s t:
 term𝒮 is a static set of addresses containing the stack. Stack allocation and stack release
  are contained within term𝒮.
 termM is the set of addresses that might be modified by the original program.
 termA is the set of addresses that are eliminated (abstracted) in the resulting program. Another 
  good intuition about the set termA is that the original function might read from and depend on 
  those addresses but the resulting function does not. Moreover, the resulting function does not
  change any of heap locations in termA. The resulting function is agnostic to those locations. 
  It neither reads nor modifies them.
  For any heap valuation of termA the abstract function simulates the original one.
 termt0 is the reference state explained above.
 terms is the state of the original program
 termt is the state of the resulting program.
›

thm rel_alloc_def [of 𝒮 M A t0]

text ‹
The properties of constrel_alloc are:
 termt = frame A t0 s, the resulting heap in termt is the same as the original in terms, 
  except for the addresses in termA and the stack free space.
 termstack_free (htd s)  𝒮: The stack free space is contained in term𝒮.
 termstack_free (htd s)  A = {}: We only want to eliminate properly allocated pointers, which
  are not contained in the stack free space.  
  termstack_free (htd s)  M = {}: We only mention properly allocated pointers to be modified. 
  Modifications within the stack free space are irrelevant, as they are abstracted by the framing
  anyways.
›
text ‹
For an original function termf and the abstract aka. lifted function termg the refinement
property has the following shape:
proprel_alloc 𝒮 M A t0 s t  refines f g s t (rel_stack 𝒮 M1 A s t0 (rel_xval_stack L R))

Note that the output relation of constrefines relates both the resulting states and return
values to each other. The return values of the error monad are related by constrel_sum_stack, taking a
relation termL for error values (termInl) and the termR for normal termination (termInr).

›
thm rel_sum_stack_def
thm rel_stack_def [of 𝒮 M1 A s t0 "rel_sum_stack L R"]

text ‹Consider the output states terms' and termt' and the output values termv and termw 
for the original and the abstracted function. Then we have:
 termrel_xval_stack L R (hmem s') v w: The result values are related, (see below).
 termrel_alloc 𝒮 M1 A t0 s' t': In particular termt' = frame A t0 s'. Note that we use
   the same termA and the same reference state termt0 as for the initial states. So this means
   that the abstract program termg does not change the heap values and typing in termA and does
   not depend on the values. 
 termequal_upto (M1  stack_free (htd s')) (hmem s') (hmem s): The original program only modifies
   pointers contained in termM1 or in the stack free portions of the heap 
    (e.g. by nested function calls).
 termequal_upto M1 (htd s') (htd s): Changes to the heap typing of the original program 
   are confined within set termM1.
 termequal_on 𝒮 (htd s') (htd s): The stack typing of the original program remains unchanged.
  Note that it might change temporarily by nested function calls but the stacking discipline
  restores the state.
By construction we have termM1  M.
›

paragraph ‹Stacking pointers: constrel_singleton_stack, constrel_push

text ‹
The relation on the output value captures the idea that portions of the heap are stacked / tupled 
into result values. On the boundaries of a function the relation on error values termL is always the
trivial termλ_. (=). Errors are terminal as there is no exception handling in C. Within
the boundaries of a function there can be more complex relations reflecting the nesting of
break / continue / return and goto. 
For ordinary values the values of pointers are tupled by constrel_singleton_stack, in case 
the original function returned no result (void), or nested by  constrel_push in case the
original function returned some result.
›
thm rel_singleton_stack_def
thm rel_push_def
text ‹
 A typical relation is termR = rel_singleton_stack p for some pointer termp. This means
that the abstract value can be obtained by looking into the heap at pointer termp:
termrel_singleton_stack p h () v means that termv = h_val h p. The heap of the original program
 is propagated down the into the relations.
Similar termrel_push p (λ_. (=)) h x (v, x) relates the result value termx to the tuple
term(v, x) where termv = h_val h p. Note that termrel_push can be nested represent multiple
output parameters.
Depending on the role of the pointers there are additional assumptions about the pointers, 
in particular things like termptr_span p  A or  termptr_span p  M and disjointness properties 
like termdistinct_sets [ptr_span p, ptr_span q].
›

subsubsection constIOcorres

text ‹
There is an additional predicate constIOcorres which represents the result of the refinement
proof in the canonical autocorres ideom that is used within the other phases. This form
is what is expected when constructing the final theorem combining all autocorres layers
@{thm ac_corres_chain_sims}. The termrefines version and the termIOcorres version can
be converted into each other by @{thm ac_corres_chain_sims}.
›
thm IOcorres_def
thm IOcorres_refines_conv
thm ac_corres_chain_sims
end

install_C_file "in_out_parameters.c"


subsection ‹Options›

text ‹
To control the in/out- parameter abstraction there are two options of autocorres:
 skip_io_abs› a boolean which is true› by default. This skips the complete phase for all 
  functions.
 in_out_parameters› which takes a list of in-out specifications for functions. As soon
  as there is at least one specification present the phase is enabled (taking
  precedence over skip_io_abs›. 

So to properly enable the IO phase you have two main choices to specify what should be done
either in @{command "init-autocorres"} or subsequent calls to @{command autocorres}. You can
only disable option skip_io_abs› in @{command "init-autocorres"} and then provide the
individual options for in_out_parameters› in subsequent @{command autocorres} invocations, or
you can already define all in_out_parameters› already in @{command "init-autocorres"}.

More details on the options are provided in the following examples.
›

subsection ‹Examples›