(* Copyright (C) 2007--2010 Norbert Schirmer * All rights reserved, DFKI GmbH *) theory ReduceStoreBuffer imports Main begin (* Basic access policy: shared + owned by some thread (only the owner is allowed to write): - owner: LMS, volatile writes and arbitrary reads; - others: volatile reads shared + not owned by any thread: - read/write: LMS, volatile writes and volatile reads - read-only: arbitrary reads The core argument in the simulation proof is to show that all reads can be executed in the virtual machine, preserving the values they have seen in the store-buffer machine. * Non volatile reads * non-volatile means that the tread knows everything about the evolution of the value. It is thread local or read only. Writes in other store buffers may not interfere with non-volatile reads. Complicated case: i: [Load (read-only) a (m a)] j: [Acq {a}, Write a v] This may only be sceduled such that i comes first and then j. If Acq {a} already happend, safety of the Load would be violated. So when i is poised to Load, and j has already executed on the virtual machine we have a safety violation. On the other hand, if i has already executed on the virtual machine ("m a" has ben seen) and the Acq of j is fine. Not that read-only loads (before the first volatile store) in the store buffer may become stale before they hit memory. * volatile reads * The value seen can depend on writes of other threads. That is why the flushing policy enforces the store-buffer to be empty, if there is an outstanding volatile write in the store-buffer. The volatile reads must be executed in the virtual machine at the time they are issued to the store-buffer, to ensure that both machines see the same value. It is perfectly valid that volatile reads in the store buffer become stale, e.g. i: [VLoad a (m a)] j: [VWrite a v] As j's write is committed to memory the (history-)entry in i's store-buffer becomes stale. But as the virtual machine has also already executed the load this is fine. Complicated case: i: [VLoad a (m a)] j: [Write a v, Release {a}] As i sees (m a) and not v it has to come first. But if j has already executed and released a, there is no safety-violation when i is poised to execute the VLoad afterwards. This would lead to the wrong value (v instead of "m a", but we cannot rule out the scheduling by a safety argument. Currently we avoid this situation, by coupling Releases to volatile Writes. Hence the Release is not yet executed in the virtual machine and thus VLoad of i is unsafe. Another solution would be to go back in time, to the state before Release {a} is executed, and argue that this scheduling is also valid and would lead to a safeness violation and thus must be disregarded. Some thoughts.: I think some key to free-flowing release story is the following: flushed suspended / \ / \ i: […, … , Write True,…,…] j: […, … , Write True,…,…] \ / \ / races races ruled ok out by safety Races in the flushed part are ok, since they are already simulated in the virtual machine. The problem is how to formulate the invariants and all the stuff that this works out in the proof. E.g. i: [Write a v, Release {a}] j: poised: Rread a So the idea doesn't seem to work. It may only work as long as no write is involved. (i.e. Release / Acq) FIXME: shared: does not seem to buy anything right now, as owner has to do volatile reads and writes. (It should be possible that the owner does non-volatile reads, but as he is the only writer reading makes no sense) LMS: maybe store restriction could be relayed to owned + shared, if there is no sharing anymore there are no restrictions. What could be better is read-only memory which can be read accessed non-volatily. a: either owned by some thread; either unowned by any thread and read/write; either unowned by any thread and read-only:a but how does memory become read-only? Free-Flowing releases: Take proof as is, and augment machnes with set of released since last flush state. xsafety: nobody is allowed to acquire / write to released things: Second: proof that safety ==> xsafety for virtual machine (by induction on ->*, and an initial configuration). *) (* FIXME, think about this: Introduce additional control state for the direct-machine to keep track of whether there is an outstanding volatile store, and the set of acquired variables since the last flush. Add this protocol to the 'good'-stuff. This should make the enough-flushs invariant unnecessary, and it would make it more straightforward to introduce state-dependencies to A,L,R (maybe from the temporaries and ghost state?). If state dependencies make sense depends on the integration into PIMP. A stricter separation of ghost and ordinary state would be nice. shared: global ghost state owned: local ghost state. SB-Machine independent of ghost state. D-Machine uses all of it. SBH-Machine uses it only as history information ## notes: ernie 3. July 2009 ################################################################# final theorem: safe-free-flowing ⟹ sb-machine is sequential consistent we basically have (modulo dummy delay) : safe-delayed ⟹ sb-machine is sequential consistent need theorem safe-free-flowing ⟹ safe-dalayed prf by contraposition and induction on execution: S unsafe-dalayed ⟹ S unsafe free-flowing induction on S: let t be last step of T (owning the disuputed address) by Lemma: ∃S'. |S'| < |S| and S' ∘ t ∼ S and S' reachable case: t = release (L) then S' is unsafe free flowing ⟹ S unsafe free flowing otherwise: S' unsafe delayed by induction S' unsafe free flowing ⟹ S unsafe free flowing Lemma: S is reachable and last step of thread T is t, which is not a volatile write then ∃S' reachable. |S'| < |S| ∧ S' ∘ t ∼ S (∼: states are equal modulo stuff owned by T) proof * S = <>, tivial. * S = S⇩1 ∘ t, trivial. * S = S⇩1 ∘ u, (u ≠ t) (induction): ∃S⇩1' ∼ S⇩1 ∘ t moreover: t ∘ u ∼ u ∘ t ultimately: S = S⇩1 ∘ u ∼ S⇩1' ∘ t ∘ u ∼ S⇩1' ∘ u ∘ t ∼ S⇩1 ∘ t 6.10.2009: I don't think this proof works. The problem are volatile reads. We cannot just reschedule them to the end, since the memory content may well be changed by intermediate volatile writes of other threads. It might be possible to come up with another lemma that says that we can delete the last action thread T (if its not a volatile write), since this operation should not have an effect on the following ops of the other threads. *) (* ****************** 2.10.2009 *************************** *) (* Some new thoughts to introduce free-flowing releases in current proof scheme: * introduce safety notion on store-buffer machine as well: safe_sb * safe_sb takes also store-buffer content into accout, e.g. I'm not allowed to acquire what is acquired in someone elses sb. * make safe_sb a part of invariant on sb machine. * to show that safe_sb is preserved builds on safe_reach of virtual machine * safe_sb makes it possible to deal with releases in prefixes: aforementioned complicated case is not safe_sb, as a is still owned (and not shared) by j i: [VLoad a (m a)] j: [Write a v, Release {a}] what about: i: [VLoad a (m a)] j: [VWrite a v, Release {a}] (a is owned by j but is also shared) j's sb is suspended. Hence v has not yet made it to memory. Hence i sees (m a) as expected. Examples: preservation of safe_sb config 1 i: Acq a j: Acq a is safe_sb (if nowbody owns a) config 2: i: Acq a j: [Acq a] should not be safe_sb. rule out by safe_reach on config 1. fast-forward either sb of i or j and execute the Acq a. Then the remaining Acq a is unsafe. Important for safe_sb is, that for the current thread, the ghost-state (on ownership) is after executing its sb! Otherwise one has problems with quite simple things, e.g. a step (1) i: Acq a; Write a; owns = {} (2) i: Write a; [Acq a] owns = {} assume (1) is safe_sb; Why should (2) be safe_sb? It is if we consider acquire [Acq a] owns = {a} instead. And we know that it has to be acquired somewhere in the store-buffer by exploiting safe_reach. Maybe its better not to use safe_sb on the sb-state but on the state of the virtual machine. Better in the sense that it is easier to connect it with safe_reach. ### 5.10 ### Can't get safe_sb inductive. consider: i: [Rel a] j: someting; Acq a; [] this is still safe_sb; but after one step where something is executed I have i: [Rel a] j: Acq a; [] this should not be safe_sb; but I have no arguments at hand. ### next idea ### The control flow (and the reads) of a thread do not depend on the 'flush' parts of *other* threads, but can depend on their own flush parts. Moreover the information of the store buffer machine is not enough to construct a virtual execution that will violate safety. We have to go back in time even further. An virtual machine that just suspends everything is not inductive according to the rules of virtual machine steps: e.g. State 1) m i: [VRead a (m a)] j: [VWrite a 5] State 2) m(a:=5) i: [VRead a (m a)] j: [] State 1 could still be executed on a virtual machine that has i and j suspended, but State 2 not (since a is already visible). What could help is we just take an extra starting config for the virtual machine for which we have: c⇩s⇩b ∼ c⇩v c⇩s⇩b ∼i c⇩i; (we get virtual config c⇩i by flushing (only) store buffer i until first volatile write, all other buffers are suspended) c →* c⇩v ∀i. c →* c⇩i safe_reach c; we show: c⇩s⇩b → c⇩s⇩b' ⟹ c⇩v →* c⇩v' (hence we immediately have inductivity of c with respect to steps from c⇩v : c →* c⇩v →* c⇩v') What about c⇩i → c⇩i' * for thread c⇩i it should work similar as for c⇩v' * if thread i takes a step then we also have to justify c⇩j → c⇩j'. when something enters the storebuffer i it is just stuttering (as i is just suspended) when something of thread i leaves the store-buffer it is also in the front of the instructions in i (since suspended) and may only depend on volatile writes in thread j, but those are also suspended. The next question is can we instantiate the theorem with some kind of initial state? For a state where the store-buffers are all empty we should have: c⇩s⇩b == c⇩v == c == c⇩i Hence we trivially have all reachability constraints, and safe_reach c == safe_reach c⇩v Problem with this approach: just suspending all other threads does not work. consider justification of thread i for step c⇩i → c⇩i', where thread j takes a step, namely a volatile read. volatile read can become stale in the store buffer (e.g. by a volatile write in thread i). Hence we cannot simulate this read. (Thats why we flush all store-buffers until first volatile write in c⇩v). i: [VWrite a 5] j: [VRead a 0] i's virtual view: i: VWrite a 5 j: VRead a 0 if i takes a step with Write a 5, we have a problem simulating j's stale read. j's virtual view: i: VWrite a 5 j: (Read has already happened) Other problem: i: [VWrite b 5] Acq a j: [Rel a; VRead b 0] Thread j has to execute before i to justify the read. But intuitively in c⇩i, we wanted to delay the 'Rel a' such that we can say 'Acq a' in thread i is unsafe. #### 08.10.2009 ################################## ### Refined Approach for Free Flowing Releases ### ################################################## General setup ------------- Two safety predicates: * safe_free_flowing: free flowing ghost releases in instruction stream * safe_delayed: release is delayed until next volatile write safe_delayed is a variant of our current model, where 'Ghost' can only acquire, and releases are coupled with volatile (or interlocked) writes. However we don't want to introduce 2 different instruction streams (one with releases and one without but with a proper annotation at the next volatile write). Instead we attempt to model safe-delayed with additional ghost state. Mainly a thread local set 'rel' of releases addresses (since the last volatile write). A volatile write resets this set, and a ghost release adds to this set, and safe_delayed may check these sets, whereas safe_free_flowing ignores these sets. Example: thread i poised for "Acquire A" safe_free_flowing: ∀j ≠ i. A ∩ owns⇩j = {} ("A not owned by others") safe_delayed: ∀j ≠ i. A ∩ (owns⇩j ∪ rel⇩j)= {} ("A not owned by others, or release not yet committed") (sanity check: safe_delayed is more restrictive than safe_free_flowing, i.e. safe_delayed c ⟹ safe_free_flowing c) safe_reach safe c ≡ ∀c'. c ⇒⇧* c' ⟹ safe c' safe_reach⇧n safe c ≡ ∀c'. c ⇒⇧n c' ⟹ safe c' Then we show two main theorems: Theorem 1 (Simulation): [|c⇩s⇩b ⇒⇩s⇩b c⇩s⇩b'; c⇩s⇩b ∼ c; safe_reach safe_delayed c|] ⟹ ∃c'. c⇩s⇩b' ∼ c' ∧ c⇩s⇩b ⇒⇧* c' Proof. Hopefully straightforward adoption of current proof. Qed. Theorem 1 can easily be extended to many steps c⇩s⇩b ⇒⇩s⇩b⇧* c⇩s⇩b' Definition: init c ("initial state") Intuition: state where safe_delayed and safe_free_flowing are in-sync That is: all rel⇩i = {}; Fact: init c ⟹ safe_free_flowing c ⟶ safe_delayed c; Theorem 2 (Safety): [|safe_reach safe_free_flowing c; init c|] ⟹ safe_reach safe_delayed c Proof. One basic ingredient is contraposition: from an unsafe_delayed computation we attempt to construct an unsafe_free_flowing computation. Scenario for intuition: thread i: … Rel {a} … | … VWrite thread j: … | … Acq {a} thread i is somewhere in the computation between its release and the next volatile write. thread j tries to Acq {a} inbetween. This is safe_free_flowing (as the release has already happened), but not safe_delayed. We want to argue that there is another scheduling of the global computation such that we also get a violation of safe_free_flowing. For thread i we know that there are no volatile writes after the release until we hit the violation of safe_delayed. Intuitively this means that the other threads 'do not depend' on what is computed inside thread i after the release. Note that the opposite is not the case. There can be volatile reads in thread i, which depend on (volatile) writes of other threads. e.g. thread i: … Rel {a} … … VRead b 5 … | … VWrite thread j: … | … VWrite b 6 … … Acq {a} This means that Read b 5 must come before VWrite b 6. In general this implies that we cannot just reorder all steps of thread i (beginning from Rel {a}) after the Acq {a} in thread j to construct an unsafe_free_flowing state. It suggests that we instead try to *remove* all steps from thread i (beginning from and including Rel {a}) from the global computation, and argue that the other threads (especially j) can still do their computations until we reach a violation of safe_free_flowing (at latest at the Acq {a}). There can be other violations before (but that is also fine) e.g. thread i: … Rel {a} … Acq {b}; Write b 10; Rel {b} … | … VWrite thread j: … | … VRead b 20 … … Acq {a} We attempt to construct an unsafe_free_flowing state for the conflict with respect to a. But while we remove instructions from thread i, we encounter the violation with respect to b. As any violation of safe_free_flowing is fine, this still fits into the proof. The proof in more detail. We do induction on the length of the computation. [|safe_reach⇧n safe_free_flowing c; init c|] ⟹ safe_reach⇧n safe_delayed c Case n=0 -------- From fact on "init c" we know that safe_free_flowing c ⟹ safe_delayed c; safe_reach⇧0 safe_free_flowing c <=> safe_free_flowing c ⟹ safe_delayed c <=> safe_reach⇧0 safe_delayed c Case n → n+1 ------------- Consider a trace: c(i) where c(0) = c; c(i) ⇒ c(i+1) for i <= n; if there would be an k ≤ n for which ¬ safe_delayed c(k) we have ¬ safe_reach⇧n safe_delayed c and by induction hypothesis also ¬ safe_reach⇧n safe_free_flowing c. So we have: ∀k ≤ n. safe_delayed c(k) ¬ safe_delayed c(n+1) Moreover we assume ∀k ≤ n + 1. safe_free_flowing c(k) (if we have ¬ safe_free_flowing c(k) we have ¬ safe_reach⇧n safe_free_flowing c and are finished) We do case distinction on '¬ safe_delayed c(n+1)'. Some cases are trivially ruled out because of 'safe_free_flowing c(n+1)'. Let us consider the case of 'safety violation due to an Acq A' We get two racing threads i,j. Let j be the one issuing the 'Acq A'. From ¬ safe_delayed c(n+1): A ∩ (owns⇩i ∪ rel⇩i) ≠ {} From safe_free_flowing c(n+1): A ∩ (owns⇩i) = {} hence there is an a ∈ A and a ∈ rel⇩i. Let k < n+1 be the index where thread i did its last step in the transition from : c(k) ⇒ c(k+1). So for c(k+1) … c(n+1) the thread configuration of i does not change. last step of thread i | c(0) ⇒ … ⇒ c(k-1) ⇒ c(k) ⇒ c(k+1) ⇒ c(k+2) ⇒ … ⇒ c(n+1) c(k) ⇒ c(k+1) can't be a volatile or interlocked write of thread i. Otherwise rel⇩i would be {} beginning at c(k+1) and thus there would be no a ∈ rel⇩i. (in general only ops, where we cannot assert rel={} in the post state,i.e. reads, non-volatile writes and ghost ops) We want to remove the step c(k) ⇒ c(k+1) from the computation (by undoing tread i's last step) and argue on replaying the rest of the computation: c(0) ⇒ … ⇒ c(k-1) ⇒ c'(k+1) ⇒ … ⇒ c'(n+1) At latest when we reach c'(n+1) we encounter a violation of safe_delayed c'(k) (from our initial race) This requires a LEMMA! As the length of the trace is now ≤ n (since step k is removed) we can use the induction hypothesis to obtain ¬ safe_reach⇧n safe_free_flowing c and hence ¬ safe_reach⇧n⇧+⇧1 safe_free_flowing c. On the LEMMA: We want to argue on the step's like c(k-1) ⇒ c'(k+1) and then continuing c'(k+1) ⇒ c'(k+2). We can always do case distinction on 'safe_delayed'. If 'not safe_delayed' of any config we are fine with the main proof. So for the lemma we can assume "safe_delayed" of the initial state. Consider: * (ts,S,m) ⇒ (ts',S',m') * ts⇩i = ts'⇩i (thread i does not change) * safe_delayed (ts,S,m) * (uts,uS,um) is obtained from ts by "undoing tread i's last step" - uts⇩j = ts⇩j (for j≠i) (- (uts,uS,um) ⇒ (ts,S,m) (by a step of thread i)) - safe_delayed (uts,uS,um) - ∀a ∉ uowns⇩i. um(a) = m(a) show: (uts,uS,um) ⇒ (uts',uS',um') where * uts⇩i = uts'⇩i * uts'⇩j = ts'⇩j (for j≠i) * ∀a ∈ uowns⇩i. um'(a) = um(a) ( this may not be necessary to know (since i only care about steps of other threads ) * ∀a ∉ uowns⇩i. um'(a) = m'(a) We want to extend this simulation to a trace c(0)… c(k) We either can simulate with a trace c'(0) … c'(k) and have safe_delayed c'(k) or we encounter an intermediate config ¬ safe_delayed c'(i) for (i ≤ k) (from which simulation may no longer be possible). LEMMA: Assume: * trace c(0) ⇒ … ⇒ c(k) * ∀l,m ≤ k. ts(l)⇩i = ts(m)⇩i (tread i does not change) * undo config for thread i: ut⇩i, uS, um (initial new configuration) Show: ∃trace c', x ≤ k. (c' simulate c up to step x, and all reached states are safe_delayed) * ∀i ≤ x. safe_delayed c'(i) * x < k ⟶ ¬ safe_delayed c'(x+1) * ∀l ≤ x. ts'(l)⇩i = ut⇩i (tread i does not change) * S'(0) = uS * m'(0) = um * ∀l ≤ x. ∀j≠i. ts'(l)⇩j = ts(l)⇩j (tread j is simulated) * ∀l ≤ x. ∀a ∉ uowns⇩i. m'(l)(a) = m(l)(a) * ∀l ≤ x. ∀a ∉ uowns⇩i. S'(l)(a) = S(l)(a) * ∀l,o ≤ x. ∀a ∈ uowns⇩i. S'(l)(a) = S'(o)(a) (sharing of thread i stays constant) (In case we reach the final state k, we know enough about preservation of the ownership and sharing ghost state to (re)construct the initial race in our main proof). ###################################################################### ### Thoughts on extensions ########################################### ###################################################################### There is a common ideom for concurrency control that we currently cannot deal with properly within our programming discipline, typically for acquiring like ops (like acquiring a lock) and releases: Acquire Barrier Barrier Release The Acquire is an interlocked write (or a volatile write followed by a barrier) but the release is an ordinary volatile write, where maybe there is a barrier in front of it (I think the reason for this is to prohibit certain compiler optimizations) but there is no barrier after it. That means in our model, that the release leaves the store-buffer-state dirty. And we would have to insert a flush before the next volatile read. This extra flush is currently not done. The intuitive reason behind this is, that it is totally ok if other threads can only observe the release with a delay, since they don't do any harm by waiting for the release (typically they just wait in an polling loop). Ideally what we would like in our model is: 1. We want to allow releasing writes (like lock release or leave-turnstile) to leave the store-buffer-state clean, such that subsequent volatile reads do not require a flush. 2. The idea why this works is that other threads that may depend on the release but have not yet seen the releasing write do nothing bad, but are basically in a polling - loop (like lock acquire or wait-for-turnstile). So when the releasing write is not yet out of the store-buffer the store-buffer machine my take an extra polling loop (compared to an sequential consistent execution, where the write is considered immediately flushed to memory). But any final state (where all store-buffers are empty) should still be reachable by an sequential consistent execution. For simulation 2. suggests that the waiting thread of the virtual machine should just stay in the same state as long as the sb-machine is captured in the polling loop. So the sb-machine takes extra turns, whereas the virtual machine does nothing. Formally, this is an issue for our model, since the sb-machine may read a lot of temporaries during the idle loop, all of which are not at all read in the virtual machine. So expressing the simulation relation seems quite odd. Another viewpoint is to introduce an intermediate (nondeterministic) program in the virtual machine, that always can descide and take an extra polling loop (without reading anything). This simulates the sb-program (modulo the extra reads to temporaries in the sb-program). In a second step we can argue that we can refine the intermediate program to the real program (all in the virtual machine) and still calculate the same *result*. For wait-for-turnstile the pooling loop is even harder to justify being irrelevant compared to a spin-lock-acquire. Wait-for-turnstile takes 2 samples and compares if they have changed. If not it waits and takes another sample, otherwise it continues. So we really have to be able to compare the 2 values and deduce something from it. So they can't be just regarded as arbitrary values. Spinlock-example: Thread i: (release l in sb) [l := 0 (release l)] Thread j: (trying to acquire) while (test&set l)... If we maintain our flushing policy, and treat the l:=0 as nonvolatile (to keep the status clean), the release is already visible in the virtual machine and thus the test&set succeeds emmediately, whereas in the sb-machine thraed j has to spin. Idea: 1. Releasing writes are treated akin to non-volatile writes with respect to flushing policy (maybe we give them an extra name at some point) 2. To justify the non-volatile status when we want to write to them they have to be owned and *unshared* => This currently prohibits other threads to even read from them, which is bad for a memory cell that is used for synchronisation. 3. To compensate for that we relax the safety restriction for volatile reads. => Besides the 'shared' info we introduce a 'last' info (or maybe it is somehow merged with the shared info) last:: address => value option With 'last' we store the last value written at the point we acquired the address in there (e.g. the 1 when acquiring a lock) Note that our ownership model hopefully guarantees that this is unique. There may be only one thread waking on the lock as long as the release is in its store-buffer. acquiring-write: l := 1; shared = old(shared)(l := None); owns = old(owns) + {l}; last = old(last)(l:=Some 1) releasing-write: l := 0; shared = old(shared)(l := Some True); owns = old(owns) - {l}; last=old(last) Volatile read's are now always allowed according to safety (maybe we introduce a flag to distinguish those liberated volatile reads from the things we allowed before (owned or shared)). The semantics of a volatile read is also changed to nondeterministically choose between either reading from memory or the last-info. Note that the semantics is *only* changed in the virtual machine. The sb machine stays the same. ------------------------- read a t → tmps(t := m a) last a = Some v ------------------------- read a t → tmps(t := v) (nondeterminism ensures that a program that always reads from memory is a refinement of our program) By reading from last we attempt to be able to simulate the sb machine taking extra loops at program points where the virtual machine could already proceed. Spin-Lock example: invariant i. m l = 0 ==> lock is free invariant ii. last l = Some v ==> v = 1 The virtual machine may always decide to take an extra loop (in case the lock was acquired at least once), alltough m l = 0; Note: * Values stored in last are always values that actually where (or will be) in memory at some point. Turnstiles: enter-turnstile x := x + 1; shared = old(shared)(x:= None); owns = old(owns) + {x}; last = old(last)(x := Some (m x + 1)); leave-turnstile x := x + 1; shared = old(shared)(x:= Some True); owns = old(owns) - {x}; last = old(last); wait-for turnstile takes a sample of x and when it is odd waits until x changes (by taking at least one other sample of x) invariant i. even (m x) ==> not in turnstile invariant ii. last x = Some v ==> odd(v) Note that for turnstiles we know that only one thread will increment the counter; (What do we need to know as invariants about the relation of last x and m x?) To see if a turnstile is cautious enough we have various cases: 1. both samples are from memory: should be safe (if turnstiles make sense at all) 2. first sample from last, second sample from memory: * first sample from last: we think thread is in turnstile as it is always odd * second sample from memory: if its the same as the first sample we continue waiting (safe) if not last was either outdated and we werent in the turnstile at all, or we have really left ts at least once (safe) 3. first sample from memory, second sample from last: * as there is only one writer to the ts, the sample from last is always at least the counter we have seen by the first sample (safe) 4. both samples from last * both sample are the same, we continue waiting (safe) * samples are different: As only one thread writes the ts, the 'last' samples also follow the program order, hence the thread has left the ts at least once (safe). Think-tank One might try to update 'last' not only on acquire, but simply every time a write actually hits memory. Thinking of this 'last' actually just is the memory of the store-buffer machine. To correctly update 'last' in the virtual machine is then difficult. Currently we maintain two values: the last one (being made interlocked or with a barrier) and the (most actual) one in the virtual machine. What about making last a set? Storing everything what happens between the acquire and the release (or flush)? It seems to be a history. So the general thing seems to be: history: address ⇒ value list; keeping track of all values (and their order) between two flushes (or interlocked things). A volatile read has to be able to deal with any value in the histroy, starting from the one the thread has read last. If we would introduce a history for each thread, we would kind of explicitely model store-buffers. This would be the 'ultimate solution' for TSO.: A histroy that should work: history:: address ⇒ (lower-bound,upper-bound,(cnt,value) set) history a = (lb,ub,V); (n,v) ∈ V ⟹ lb ≤ ub ∧ lb ≤ n ∧ n ≤ ub; * every write increases the upper-bound and adds the value; * every read increases the lower-bound (and removes all lower values) * every flush empties the set; write a v: h a = (lb,ub,V) ⟹ h'=h(a:=(lb,ub+1,V ∪ {(ub+1,v)})) read a v: h a = (lb,ub,V); (n,v) ∈ V; lb ≤ n ∧ n ≤ ub ⟹ h'=h(a:=(lb+1,ub,V )) flush a h'=h(a:=(0,0,{})) read a v (syntax: h ----------> h') The single global histroy should be fine for our case, since there is only one writer, and the reads are ordered (as they are volatile) and are in sync between the sb and the virtual machine. 26.11.2009 More thoughts on history: Currently we see 4 operations on the history: Read (for stale reads), Write (non-volatile) , Commit (fence), WriteCommit (volatile write and LMS) 1. A read of a thread that has an outstanding write to the same address in the store buffer, always gets this value and should not have influence on the lb of the history for other threads. 2. In the concurrency control we thought of up to now (spinlocks release, turnstiles, versioning writer) the only use of the lb for reads is to guarantee an order in which a thread sees updates when it really issues multiple reads. We currently do not need arguments across threads like (if thread i has already seen a certain value than thread j has to see a newer one). 1 and 2 suggest that it might be a good idea to make the lb's thread local. This gives us some nice properties: * a read only effects the thread local lb (and neiter the lb of another thread nor the history) * a write, commit, writecommit only effects the history and no lb. * write, commite,writecommit are completely deterministic * To easily model fences we take the thread id into the history. history:: address => (hb,i,V :: timestamp ⇒ value option) lb (per thread): address ⇒ lb selectors: * top (hb,i,V) = hb * thread (hb,i,V) = i * vals (hb,i,V) = V invariants: * ∀a hb x V. h a = (hb,x,V) ⟶ ∀t ∈ dom V. t ≤ hb * ∀a hb x V. h a = (hb,x,V) ⟶ hb ∈ dom V * for all threads: ∀a. lb a ≤ top (h a) * the memory value of the sb-machine and all the values in takeWhile (Not ∘ is_volatile_Write) are in the history of the virtual machine. * m⇩s⇩b a ∈ range (vals (h a)) * Write False a v ∈ set (takeWhile (Not ∘ is_volatile_Write)) sb ⟹ v ∈ range (vals (h a)) * The top-value in the history of the virtual machine is the memory value in the virtual machine: vals (h a) (top (h a)) = Some (m a) * Commit preserves top value h'=λa. let (hb,j,V) = h a in if i=j then (hb,i,V|`{hb}) else h a --------------------------------------- Commit i h h' h a = (hb,j,V) h'=h(a:= (hb+1,i,V(hb+1 ↦ v))) ---------------------------------------------------------- Write i a v h h' WriteCommit is simply first Write then Commit (in this order, to make sure that there is only one value in V) After a WriteCommit the value read is determined. (* not a transition just a read *) h a = (hb,j,V) V t = Some v lb ≤ t ---------------------------------------------- Read a v t lb h h The thread updates his lb: lb(a:=t); *) (* (\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*?\)) (\([^()]*\),\([^()]*\),\([^()]*\),\([^()]*\),\([^()]*\),\([^()]*\),\([^()]*\),\([^()]*\)) (\1,\2,\3,\4,\5,\6,\8) *) subsection ‹Memory Instructions› type_synonym addr = nat type_synonym val = nat type_synonym tmp = nat type_synonym tmps = "tmp ⇒ val option" type_synonym sop = "tmp set × (tmps ⇒ val)" ― ‹domain and function› locale valid_sop = fixes sop :: "sop" assumes valid_sop: "⋀D f θ. ⟦sop=(D,f); D ⊆ dom θ⟧ ⟹ f θ = f (θ|`D)" type_synonym memory = "addr ⇒ val" type_synonym owns = "addr set" type_synonym rels = "addr ⇒ bool option" type_synonym = "addr ⇒ bool option" type_synonym acq = "addr set" type_synonym rel = "addr set" type_synonym lcl = "addr set" type_synonym wrt = "addr set" type_synonym cond = "tmps ⇒ bool" type_synonym ret = "val ⇒ val ⇒ val"