Shared Memory – Consistency of Shared Variables

The ideal picture of shared memory:

The actual architecture of shared memory systems:

**Symmetric Multi-Processor (SMP):**

**Distributed Shared Memory (DSM):**
**The Million $s Question:**
How/When Does One Process Read Other Process’s Writes?

Assumption: Initial value of shared variables is always 0.

Why is this a question? Because temporal order relations like “before/after” do not necessarily hold in a distributed system.
Why Memory Model?

Answers the question: “Which writes by a process are seen by which reads of the other processes?”
Memory Consistency Models

Example program:  

**Pi:** R V; W V,7; R V; R V  
**Pj:** R V; W V,13; R V; R V

A consistency/memory model is an “agreement” between the execution environment (H/W, OS, middleware) and the processes. Runtime guarantees to the application certain properties on the way values written to shared variables become visible to reads. This determines the memory model, what’s valid, what’s not.

Example execution:  

**Pi:** R V,0; W V,7; R V,7; R V,13  
**Pj:** R V,0; W V,13; R V,13; R V,7

Order of writes to V as seen to Pi: (1) W V,7; (2) W V,13  
Order of writes to V as seen to Pj: (1) W V,13; (2) W V,7
Memory Model: Coherence

Coherence is the memory model in which (the runtime guarantees to the program that) writes performed by the processes for every specific variable are viewed by all processes in the same full order.

Example program: All valid executions under Coherence:

<table>
<thead>
<tr>
<th>Pi:</th>
<th>Pj:</th>
</tr>
</thead>
<tbody>
<tr>
<td>W V,7</td>
<td>W V,13</td>
</tr>
<tr>
<td>R V</td>
<td>R V</td>
</tr>
<tr>
<td>R V</td>
<td>R V</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Pi:</th>
<th>Pj:</th>
</tr>
</thead>
<tbody>
<tr>
<td>W V,7</td>
<td>W V,13</td>
</tr>
<tr>
<td>R V,7</td>
<td>R V,13</td>
</tr>
<tr>
<td>R V,7</td>
<td>R V,7</td>
</tr>
</tbody>
</table>

The Register Property: the view of a process consists of the values it “sees” in its reads, and the writes it performs. If a R V in P which is later than W V,x in P sees value different than x, then a later R V cannot see x.
Formal definition of Coherence

Program Order: The order in which instructions appear in each process. This is a *partial order* on all the instructions in the program.

A serialization: A *full order* on all the instructions (reads/writes) of all the processes, which is consistent with the program order.

A legal serialization: A serialization in which each *read X* returns the value written by the latest *write X* in the full order.

Let $P$ be a program; let $P_X$ be the “sub-program” of $P$ which contains all the *read X/write X* operations on $X$ only.

Coherence: $P$ is said to be *coherent* if for every variable $X$ there exists a legal serialization of $P_X$. (Note: a process cannot distinguish one such serialization from another for a given execution)
## Examples

### Coherent. Serializations:

- **x**: write x,1, read x,1
- **y**: write y,1, read y,1

### Not Coherent.

- Cycle of dependencies.
- Cannot be serialized.

---

**Process 1**

- read x,1
- write y,1

**Process 2**

- read y,1
- write x,1

---

**Process 1**

- write x,1
- write x,2

**Process 2**

- read x,2
- write x,1

---

**Process 1**

- write x,1
- write x,2

**Process 2**

- read x,2
- read x,1
Sequential Consistency [Lamport 1979]

Sequential Consistency is the memory model in which all reads/writes performed by the processes are viewed by all processes in the same full order.

Process 1
- read x,1
- write y,1

Process 2
- read y,1
- write x,1

Coherent.
Not Sequentially consistent.

Process 1
- write x,1
- write y,1

Process 2
- read y,1
- read x,0

Coherent.
Not Sequentially consistent.
Strict (Strong) Memory Models

**Sequential Consistency:**
Given an execution, there exists an order of reads/writes which is consistent with all program orders.

**Coherence:**
For any variable \( x \), there exists an order of read \( x \)/write \( x \) consistent with all p.o.s.
Formal definition of Sequential Consistency

Let $\mathcal{P}$ be a program.

**Sequential Consistency:** $\mathcal{P}$ is said to be *sequentially consistent* if there exists a legal serialization of all reads/writes in $\mathcal{P}$.

**Observation:** Every program which is sequentially consistent is also coherent.

**Conclusion:** Sequential Consistency has *stronger requirements* and we thus say that it is *stronger* than Coherence.

**In general:** A consistency model $\mathcal{A}$ is said to be (strictly) stronger than $\mathcal{B}$ if all executions which are valid under $\mathcal{A}$ are also valid under $\mathcal{B}$. 
The problem of strong consistency models

The runtime system should ensure the existence of legal serialization, and the same consistent view for all processes.

This requires lots of expensive coordination $\rightarrow$ degrades performance!

\[
\begin{array}{ll}
\text{P1:} & \text{P2:} \\
\text{Print(U)} & \text{Print(V)} \\
\text{Write V,1} & \text{Write U,1}
\end{array}
\]

SC: Hardware cannot reorder locally in each thread for this will result in a possible printing 1,1. HW may reorder anyway and postpone writes, but then why reorder in the first place?
Coherence Forbids Reordering

Once thread sees an update – cannot “forget” it has seen it.

→ Cannot reorder two reads of the same memory location.
Coherence makes reads prevent common compiler optimizations.

p and q might point to the same object
p.x = 0

p.x = 1
a = p.x
b = q.x
c = p.x

assert(p == q  \(\Rightarrow\)  a \leq b \leq c)

reads can make a process see writes by another process. The read “kills” later reuse of local values.
Release Consistency
[Gharachorloo et al. 1990, DASH]

- Introduces a special type of variables, called *synchronization variables* or *locks*.
- Locks cannot be read or written. They can be *acquired* and *released*, denoted `acquire(L)` and `release(L)` for a lock $L$.
- A process that acquired a lock $L$ but has not released it, *holds* it.
- No more than one process can hold a lock $L$, while others wait.

(Almost Formal) Definition of Release Consistency

Assuming atomic read/write/acquire/release (never the case. For simplicity only)

(A) Before a read or write access is allowed to perform, all preceding (program order) acquire accesses must be performed, and

(B) Before a release access is allowed to perform, all preceding (program order) read or write accesses must be performed, and

(C) acquire and release accesses are sequentially consistent.
Understanding RC

From this point in this execution all processes must see the value 1 in X.

According to rule (B): 1 is read in the current execution. However, the programmer cannot be sure 1 will be read in all executions.

According to rules (A) and (C), the programmer knows that in all executions this read returns 1.

It is undefined what value is read here. It can be any value written by some process. Here it can be 0 or 1.

•A
  w(x)1
  rel(L₁)

•B
  r(x)0
  r(x)?
  r(x)1
  acq(L₁)
  r(x)1
Acquire and Release

- **release** serves as a *memory-synch* operation, or a *flush* of the local modifications to all other processes.
- **acquire** and **release** are not only used for *synchronization of execution*, but also for *synchronization of memory*, i.e. for propagation of writes from/to other processes.
  - This allows to overlap the two expensive types of synchronization.
  - This turns out also simpler on the programmer from semantic point of view.
Acquire and Release (cont.)

• A release followed by an acquire of the same lock guarantees (the programmer) that all writes previous to the release will be seen by all reads following the acquire.
• The idea is to let the programmer decide which blocks of operations need be synchronized, and put them between matching pair of acquire-release operations.
• In the absence of release/acquire pairs, there is no assurance that modifications will ever propagate between processes.
Happened-Before relation induced by acquire/release

A

w(x) rel(L1) w(x) acq(L2) r(y)

B

w(y) rel(L2) r(x) r(x) rel(L1)

C

acq(L1) r(y) w(y) acq(L2) rel(L2)
RC Memory Model: Transitivity

The process C sees the modification of $x$ by A.
Competing Accesses

• Two memory accesses are *not synchronized* if they are independent events according to the previously defined happened-before relationship.

• Two memory accesses are *conflicting* if they are accesses to the same memory location, and at least one of them is a *write*.

• Conflicting accesses are said to be *competing* if there exists an execution in which they are not synchronized.

• Competing accesses form a *race condition* as they may be executed concurrently.
Data Races in RC

Release Consistency does not guarantee anything about ordered propagation of updates

Initially: grades = oldDatabase; updated = false;

• Thread T.A.
  grades = newDatabase;
  updated = true;

• Thread Lecturer
  while (updated == false);
  X:=grades.gradeOf(lecturersSon);

• If the modification of variable updated is passed to Lecturer while the modification of grades is not, then Lecturer looks at the old database!

• This is possible in Release Consistency (and Coherence), but not in Sequential Consistency.
Expressiveness of Release Consistency
[Gharachorloo et.al 1990]

**Theorem:** RC = SC for programs having no data-races.

Given a data-race-free program P, the set of valid executions for P is the same on systems providing RC and those providing SC.

Conclusion (OpenMP, Java, C++, etc):
- System provide RC (performance)
- Programmer avoid data-races (program verification)
- ➔ Best of both worlds!
Implementation of RC

• Satisfying the happened-before relation between all operations is enough to satisfy RC.
  – Maintenance and usage of such a detailed ordering would be expensive.

• Instead, the ordering is applied to process intervals.
  – Intervals are segments of time in the execution of a single process.
  – New interval begins each time a process executes a synchronization operation.
Happened-before of Intervals

A happened before partial order is defined between intervals.

An interval $i_1$ precedes an interval $i_2$ according to *happened-before of intervals*, if all accesses in $i_1$ precede accesses in $i_2$ according to the *happened-before of accesses*. 
Vector Timestamps

• An interval is said to be performed at a process if all interval’s accesses have been performed at that process.

• Each process $p$ has vector timestamp $V_p$ that tracks which intervals have been performed at that process.
  – A vector timestamp consists of a set of interval indices, one per process in the system.
Lazy Release Consistency
[Keleher et al., Treadmarks 1992]*

- Postpone modifications until remote process “really” needs them
- More relaxed than RC

Formal Definition of Lazy Release Consistency

(A) Before a read or write access is allowed to perform with respect to any other process, all previous acquire accesses must be performed with respect to that other process, and

(B) Before a release access is allowed to perform with respect to any other process, all previous read or write accesses must be performed with respect to that other process, and

(C) acquire and release accesses are sequentially consistent.
Understanding the LRC Memory Model

- It is guaranteed that the acquirer of the same lock sees the modification that precede the release in program order.