## **CSE113: Parallel Programming** May 13, 2021

- **Topic**: Finish DOALL & Memory Consistency
  - DOALL schedules in OpenMP
  - Sequential Consistency
  - Total Store Order
  - Relaxed memory models

#### Announcements

- HW 3 is out:
  - ask questions on Piazza!
  - Thanks to those who are having good discussions!
  - Due date Friday May 21
- Midterm grades are released today by midnight
  - Please ask questions within two weeks
- Guest lecture in 1 week!
  - Message passing concurrency and testing GPU compilers

#### Announcements

• Thanks for those who find typos; it helps improve the slides!

### Quiz

## Quiz

• Discuss Answers

## Schedule

- Parallel schedules in OpenMP
- Memory consistency models:
  - Total store order
  - Relaxed memory consistency
  - Examples

## Schedule

- Parallel schedules in OpenMP
- Memory consistency models:
  - Total store order
  - Relaxed memory consistency
  - Examples

- We studied DOALL loops last week:
  - What is a DOALL loop?

- We studied DOALL loops last week:
  - What is a DOALL loop?

```
for (int i = 0; i < SIZE; i++) {
    a[i] = b[i] + c[i];
}</pre>
```

- We studied DOALL loops last week:
  - What is a DOALL loop?

```
for (int i = 0; i < SIZE; i++) {
    a[i] = b[i] + c[i];
}</pre>
```

```
for (int i = 0; i < SIZE; i++) {
    a[i] = b[i] + c[i+1];
}</pre>
```

- We studied DOALL loops last week:
  - What is a DOALL loop?

```
for (int i = 0; i < SIZE; i++) {
    a[i] = b[i] + c[i];
}
for (int i = 0; i < SIZE; i++) {
    a[i] = b[i] + c[i+1];
}</pre>
```

```
for (int i = 0; i < SIZE; i++) {
    a[i] = b[i] + a[i+1];
}</pre>
```

- We studied DOALL loops last week:
  - What is a DOALL loop?
- We talked about very complicated ways to implement parallelism over these loops
- But what if I was to tell you that there was an easier way?



- Built on top of C++ and Fortran
- First released in 1997 (way before C++11 threads!)
  - Still used widely today, esp. in HPC and ML
- consists of:
  - pragma based compiler directives
  - runtime



- Many features
  - atomic RMWs
  - thread spawn and join
  - shared memory
- Perhaps best known for supporting parallel DOALL loops

```
Why is it so popular?
```

```
for (int i = 0; i < SIZE; i++) {
    c[i] = a[i] + b[i];
}</pre>
```

parallelize a loop with one line!

code works with or without compiler support!

```
#pragma omp parallel for
for (int i = 0; i < SIZE; i++) {
    c[i] = a[i] + b[i];
}</pre>
```

Have to also add compile line: -fopenmp

# Lets try it out

### Customization in OpenMP pragmas

```
#pragma omp parallel for num_threads(N)
for (int i = 0; i < SIZE; i++) {
   c[i] = a[i] + b[i];
}</pre>
```

Number of threads is great for running scaling experiments or reducing the load on the machine

By default OpenMP will try to saturate your machine

### Customization in OpenMP pragmas

```
#pragma omp parallel for schedule(S,C)
for (int i = 0; i < SIZE; i++) {
   c[i] = a[i] + b[i];
}</pre>
```

Specify the parallel schedule. There are several options:

static - evenly chunks iterations across cores dynamic - workstealing others - we won't get into them in the class

Can specify the chunk size with C

By default OpenMP will select a good chunk size based on your architecture!

```
#pragma omp parallel for num_threads(N) schedule(S,C)
for (int i = 0; i < SIZE; i++) {
   c[i] = a[i] + b[i];
}</pre>
```



```
#pragma omp parallel for num_threads(4) schedule(S,C)
for (int i = 0; i < SIZE; i++) {
   c[i] = a[i] + b[i];
}</pre>
```



```
#pragma omp parallel for num_threads(4) schedule(static,1)
for (int i = 0; i < SIZE; i++) {
   c[i] = a[i] + b[i];
}</pre>
```



```
#pragma omp parallel for num_threads(4) schedule(static,2)
for (int i = 0; i < SIZE; i++) {
   c[i] = a[i] + b[i];
}</pre>
```



```
#pragma omp parallel for num_threads(4) schedule(static,2)
for (int i = 0; i < SIZE; i++) {
   c[i] = a[i] + b[i];
}</pre>
```



```
What about workstealing?
```

```
#pragma omp parallel for num_threads(4) schedule(dynamic)
for (int i = 0; i < SIZE; i++) {
   c[i] = a[i] + b[i];
}</pre>
```

what happens when we run this?

# What about workstealing?

What about a loop that has load imbalance? Recall this loop from the previous lecture

```
#pragma omp parallel for num_threads(2) schedule(dynamic)
for (x = 0; x < SIZE; x++) {
   for (y = x; y < SIZE; y++) {
        a[x,y] = b[x,y] + c[x,y];
    }
}</pre>
```

Inner loop does a variable amount of work depending on the outer loop iteration

## OpenMP takeaways

- Great for DOALL loops!
  - Rapid experimentation for different schedules and parameters
- Dynamic schedules are expensive: use with caution
- Specification includes:
  - RMWs
  - Mutexes
- Widely used in HPC community

## Schedule

• Parallel schedules in OpenMP

#### • Memory consistency models:

- Total store order
- Relaxed memory consistency
- Examples

### Memory Consistency

## Memory Consistency

- We have been very strict about using atomic types in this class
  - and the methods (.load and .store)
  - why?
  - Architectures do very strange things with memory loads and stores
  - Compilers do too (but we won't talk too much about them today)
  - C++ gives us sequential consistency if we use atomic types and operations
  - What do we remember sequential consistency from?

## Sequential consistency for atomic memory

• Let's play our favorite game:

atomic\_int x(0); atomic\_int y(0);

Thread 0:

x.store(1);
y.store(1);

```
<u>Thread 1:</u>
int t0 = y.load();
int t1 = x.load();
```

atomic\_int x(0); atomic\_int y(0);

#### Thread 0:

x.store(1);
y.store(1);

<u>Thread 1:</u> int t0 = y.load(); int t1 = x.load();

Is it possible for t0 == 0 and t1 ==1

atomic\_int x(0); atomic\_int y(0);

x.store(1);

y.store(1);

#### Thread 0:

x.store(1);
y.store(1);

Is it possible for t0 == 0 and t1 ==1



#### **Global variable:**

atomic\_int x(0); atomic\_int y(0);

#### Thread 0:

x.store(1);
y.store(1);



<u>Thread 1:</u> int t0 = y.load(); int t1 = x.load();

#### Is it possible for t0 == 0 and t1 == 1

yes!

atomic\_int x(0); atomic\_int y(0);

x.store(1);

y.store(1);

#### Thread 0:

x.store(1);
y.store(1);

Is it possible for t0 == 1 and t1 == 0



Is it possible for t0 == 1 and t1 == 0
#### <u>Global variable:</u>

atomic\_int x(0); atomic\_int y(0);

<u>Thread 0:</u> x.store(1); int t0 = y.load(); Another test Can t0 == t1 == 0?

<u>Thread 1:</u>
y.store(1);
int t1 = x.load();

#### <u>Global variable:</u>

atomic\_int x(0); atomic\_int y(0);

<u>Thread 0:</u> x.store(1); int t0 = y.load(); Another test Can t0 == t1 == 0?

<u>Thread 1:</u>
y.store(1);
int t1 = x.load();

int t0 = y.load();

y.store(1);

### Global variable:

atomic\_int x(0); atomic\_int y(0);

<u>Thread 0:</u> x.store(1); int t0 = y.load(); Another test Can t0 == t1 == 0?



- Plain atomic accesses are documented to be sequentially consistent (SC)
- Why wasn't SC very good for concurrent data structures?
  - Compossibility: two objects that are SC might not be SC when used together
  - Programs contain only 1 shared memory though; no reason to compose different main memories.

# Schedule

- Parallel schedules in OpenMP
- Memory consistency models:
  - Total store order
  - Relaxed memory consistency
  - Examples

# What about ISAs?

- Remember, it is important for us to understand how our code executes on the architecture to write high performing programs
- Lets think about x86
  - Instructions:
  - MOV %t0 [x] loads the value at x to register t0
  - MOV [y] 1 stores the value 1 to memory location y

<u>Global variable:</u>

int x[1] = {0}; int y[1] = {0};

| <u>Thread 0:</u> |      |     |  |
|------------------|------|-----|--|
| mov              | [X], | 1   |  |
| mov              | %t0, | [Y] |  |

Another test Can t0 == t1 == 0?

| <u>Thread 1:</u> |     |
|------------------|-----|
| mov [y],         | 1   |
| mov %t1,         | [X] |

### Global variable:

int x[1] = {0}; int y[1] = {0};



Another test Can t0 == t1 == 0?



This is great for C++! What about this test in x86?

### <u>Global variable:</u>

int x[1] = {0}; int y[1] = {0};



Another test Can t0 == t1 == 0?



This is great for C++! What about this test in x86?



int x[1] = {0}; int y[1] = {0};

<u>Thread O:</u> mov [x], 1 mov %t0, [y]

But if we run this program on hardware:

We would see the condition satisfied!

What is going on?!





This is great for C++! What about this test in x86?



Thread 1:

mov [y], 1

mov %t1, [x]

Core 1

| x:0 |             |  |
|-----|-------------|--|
| y:0 | Main Memory |  |
|     |             |  |



| x:0 |             |  |
|-----|-------------|--|
| y:0 | Main Memory |  |
|     |             |  |























## Thread 1:

Execute next instruction









## <u>Thread 1:</u>

Values get loaded from memory



<u>Thread 0:</u>

<u>Thread 1:</u>

we see t0 == t1 == 0!





Main Memory

x:0

y:0



# Our first relaxed memory execution!

- also known as weak memory behaviors
- An execution that is NOT allowed by sequential consistency
- A memory model that allows relaxed memory executions is known as a relaxed memory model
  - X86 has a relaxed memory model due to store buffering
  - If you restrict yourself to use only default atomic operations, C++ has does NOT have a weak memory model

# Litmus tests

- Small concurrent programs that check for relaxed memory behaviors
- Vendors have a long history of under documented memory consistency models
- Academics have empirically explored the memory models
  - Many vendors have unofficially endorsed academic models
  - X86 behaviors were documented by researchers before Intel!

## Litmus tests

This test is called "store buffering"

| <u>Thread 0:</u> |      |     |  |  |
|------------------|------|-----|--|--|
| mov              | [X], | 1   |  |  |
| mov              | %t0, | [Y] |  |  |

Another test Can t0 == t1 == 0?

# Restoring sequential consistency

- It is typical that relaxed memory models provide special instructions which can be used to disallow weak behaviors.
- These instructions are called Fences
- The X86 fence is called mfence. It flushes the store buffer.





## <u>Thread 0:</u>

























execute next instruction





values are loaded from memory



We don't get the problematic behavior: t0 = 0 and t1 = 0


# Next example



single thread same address

possible outcomes: t0 = 1 t0 = 0

Which one do you expect?

| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |

| <u>Thread 0:</u> |              |                        |
|------------------|--------------|------------------------|
| mov [x],         | 1            | How does this execute? |
| mov %t0,         | [x]          |                        |
| Core 0           | Store Buffer |                        |
|                  |              |                        |

| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |



#### execute first instruction

mov %t0, [x]



| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |

Store the value in the store buffer

mov %t0, [x]

|        | Store Buffer |
|--------|--------------|
| Core 0 | x:1          |
|        |              |
|        |              |

| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |



Next instruction



| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |

Where to load??

Store buffer? Main memory?





Where to load??

Threads check store buffer before going to main memory

It is close and cheap to check.



| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |

# Memory Consistency

- How to specify a relaxed memory model?
- Good time for a 5 minute break!

# Memory Consistency

- How to specify a relaxed memory model?
- We can do it operationally
  - by constructing a high-level machine and reasoning about operations through the machine.
  - or we can talk about instructions that are allowed to "break" program order.

**Global variable:** 

int x[1] = {0}; int y[1] = {0};

| <u>Thread 0:</u> |      |     |  |
|------------------|------|-----|--|
| mov              | [X], | 1   |  |
| mov              | %t0, | [Y] |  |

Another test Can t0 == t1 == 0?

| <u>Thread 1:</u> |     |
|------------------|-----|
| mov [y],         | 1   |
| mov %t1,         | [X] |

We will annotate instructions with S for store, and L for loads

**Global variable:** 

int x[1] = {0}; int y[1] = {0};

Thread 0:

S:mov [x], 1 L:mov %t0, [y] Another test Can t0 == t1 == 0?

| <u>Thread 1:</u> |       |
|------------------|-------|
| S:mov [y],       | 1     |
| L:mov %t1,       | [ X ] |

We will annotate instructions with S for store, and L for loads

int x[1] = {0}; int y[1] = {0};



Another test Can t0 == t1 == 0?



int x[1] = {0}; int y[1] = {0};



Another test Can t0 == t1 == 0?



S(tores) followed by a L(oad) do not have to follow program order

int x[1] = {0}; int y[1] = {0};

## Thread O:

S:mov [x], 1 L:mov %t0, [y] Another test Can t0 == t1 == 0?



S(tores) followed by a L(oad) do not have to follow program order Global variable:

int x[1] = {0}; int y[1] = {0};

#### Thread 0:

S:mov [x], 1 L:mov %t0, [y] Another test Can t0 == t1 == 0?

we can ignore this condition!!



| Thread 2 | <u>1:</u> |     |  |
|----------|-----------|-----|--|
| S:mov    | [Y],      | 1   |  |
| L:mov    | %t1,      | [X] |  |

Now we can satisfy the condition!



Thread 0: S:mov [x], 1

L:mov %t0, [y]

Lets peak under the hood here

Another test Can t0 == t1 == 0?



int x[1] = {0}; int y[1] = {0};

<u>Thread 0:</u>

S:mov [x], 1 L:mov %t0, [y]

Lets peak under the hood here

Global timeline is when the Store operation becomes visible to other threads





int x[1] = {0}; int y[1] = {0};

<u>Thread 0:</u> S:mov [x], 1 L:mov %t0, [y]

Lets peak under the hood here

Global timeline is when the Store operation becomes visible to other threads





int x[1] = {0}; int y[1] = {0};

#### Thread 0:

S:mov [x], 1 L:mov %t0, [y]

Lets peak under the hood here

Global timeline is when the Store operation becomes visible to other threads Another test Can t0 == t1 == 0?





| Thread 1:  |       |
|------------|-------|
| S:mov [y], | 1     |
| L:mov %t1, | [ X ] |

# Questions

• Can stores be reordered with stores?

mov [x], 1

mov [y], 1



| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |

mov [y], 1



#### execute the first instruction

| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |

mov [y], 1



| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |

value goes into store buffer

mov [y], 1



| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |

execute next instruction

execute next instruction



| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |

value goes into the store buffer



| x:0 |             |
|-----|-------------|
| y:0 | Main Memory |
|     |             |



On x86, the store buffer trains in a FIFO way: thus stores cannot be reordered



On x86, the store buffer trains in a FIFO way: thus stores cannot be reordered



# Questions

- Can stores be reordered with stores?
- How do we make rules about mfence?

int x[1] = {0}; int y[1] = {0};

### Thread 0:

S:mov [x], 1 mfence

L:mov %t0, [y]

S:mov [x], 1

mfence

L:mov %t0, [y]

Another test Can t0 == t1 == 0?



Rules: S(tores) followed by a L(oad) do not have to follow program order.

int x[1] = {0}; int y[1] = {0};

# <u>Thread 0:</u>

S:mov [x], 1 mfence L:mov %t0, [y]

```
Can t 0 == t 1 == 0?
  mfence
L:mov %t1, [X]
S:mov [x], 1
   mfence
```

L:mov %t0, [y]

Another test

<u>Thread 1:</u> S:mov [y], 1 mfence L:mov %t1, [x]

S:mov [y], 1

Rules:

S(tores) followed by a L(oad) do not have to follow program order.

int x[1] = {0}; int y[1] = {0};

# <u>Thread 0:</u> S:mov [x], 1 mfence L:mov %t0, [y]

So we can't reorder this instruction at all!



| S:mov [y], 1<br>mfence | <u>.</u> |
|------------------------|----------|
| L:mov %t1, [           | [x]      |

S:mov [y], 1

Rules:

S(tores) followed by a L(oad) do not have to follow program order.

# Rules

• Are we done?

Rules: S(tores) followed by a L(oad) do not have to follow program order.

## Global variable:

int x[1] = {0}; int y[1] = {0};

#### Thread 0:

S:mov [x], 1 L:mov %t0, [x]

S:mov [x], 1

L:mov %t0, [x]

Another test Can t0 == 0?

Rules: S(tores) followed by a L(oad) do not have to follow program order.




### TSO - Total Store Order

### **Rules:**

S(tores) followed by a L(oad) do not have to follow program order.

S(tores) cannot be reordered past a fence in program order

S(tores) cannot be reordered past L(oads) from the same address

# Schedule

- Parallel schedules in OpenMP
- Memory consistency models:
  - Total store order
  - Relaxed memory consistency
  - Examples

• We can specify them in terms of what reorderings are allowed



• We can specify them in terms of what reorderings are allowed



### **Sequential Consistency**

• We can specify them in terms of what reorderings are allowed



TSO - total store order

• We can specify them in terms of what reorderings are allowed



### Weaker models?

• We can specify them in terms of what reorderings are allowed



### **PSO - partial store order**

If memory access 0 appears before memory access 1 in program order, can it bypass program order?

Allows stores to drain from the store buffer in any order

• We can specify them in terms of what reorderings are allowed



**RMO - Relaxed Memory Order** 

If memory access 0 appears before memory access 1 in program order, can it bypass program order?

Very relaxed model!

• FENCE: can always restore order using fences. Accesses cannot be reordered past fences!



### **Any Memory Model**

If memory access 0 appears before memory access 1 in program order, and there is a FENCE between the two accesses, can it bypass program order?

# Schedule

- Parallel schedules in OpenMP
- Memory consistency models:
  - Total store order
  - Relaxed memory consistency
  - Examples

<u>Global variable:</u>

int x[1] = {0}; int y[1] = {0};

Thread 0:

L:mov %t0, [y] S:mov [x], 1 First thing: change our syntax to pseudo code

| Thread 2 | Thread 1: |     |  |
|----------|-----------|-----|--|
| L:mov    | %t1,      | [X] |  |
| S:mov    | [Y],      | 1   |  |

<u>Global variable:</u>

int x[1] = {0}; int y[1] = {0}; First thing: change our syntax to pseudo code You should be able to find natural mappings to any ISA

<u>Thread 0:</u> L:%t0 = load(y) S:store(x,1) <u>Thread 1:</u> L:%t1 = load(x) S:store(y,1) <u>Global variable:</u>

Question:  $can \pm 0 == \pm 1 == 1$ ?

<u>Thread 0:</u> L:%t0 = load(y) S:store(x,1)

int  $x[1] = \{0\};$ 

int  $y[1] = \{0\};$ 

<u>Thread 1:</u> L:%t1 = load(x) S:store(y,1)

int x[1] = {0}; int y[1] = {0}; Question:  $can \pm 0 == \pm 1 == 1$ ?

Get out our lego bricks and try for sequential consistency

<u>Thread 0:</u> L:%t0 = load(y) S:store(x,1)

L:t = load(y)

S:store(x,1)

<u>Thread 1:</u> L:%t1 = load(x) S:store(y,1)

L:
$$ti = load(x)$$



Question: can t0 == t1 == 1?

Get out our lego bricks



Not allowed under sequential consistency!



Question: can t0 == t1 == 1?





Question: can t0 == t1 == 1?





Question: can t0 == t1 == 1?





Question: can t0 == t1 == 1?





Question: can t0 == t1 == 1?





Question: can t0 == t1 == 1?





Question: can t0 == t1 == 1?





Question: can t0 == t1 == 1?





Question: can t0 == t1 == 1?





```
int x[1] = {0};
int y[1] = {0};
```

Question: can t0 == t1 == 1?





Question: can t0 == t1 == 1?





Question: can t0 == t1 == 1?





Question: can t0 == t1 == 1?

Get out our lego bricks



Are we done? The behavior is no longer allowed

## One more example

int x[1] = {0}; int y[1] = {0};

Thread 0:

S:store(x,1) S:store(y,1)

S:store(x,1)

S:store(y,1)

Question: can t 0 == 1 and t 1 == 0?

<u>*Thread 1:*</u> L:%t0 = load(y) S:%t1 = load(x)

$$L:$$
%t0 = load(y)

L:t1 = load(x)

int x[1] = {0}; int y[1] = {0};

Thread 0:

S:store(x,1) S:store(y,1) Question: can t 0 == 1 and t 1 == 0?

start off thinking about sequential consistency

| <u>Thread 1:</u> |         |
|------------------|---------|
| L:%t0 =          | load(y) |
| S:%t1 =          | load(x) |

L:
$$t = load(y)$$

L:%t1 = load(x)

S:store(x,1)

S:store(y,1)



int  $x[1] = \{0\};$ 

Question: can t 0 == 1 and t 1 == 0?



int  $x[1] = \{0\};$ 

int  $y[1] = \{0\};$ 

Question: can t 0 == 1 and t 1 == 0?



What about TSO? NO
int  $x[1] = \{0\};$ 

Question: can t 0 == 1 and t 1 == 0?



int  $x[1] = \{0\};$ 

Question: can t 0 == 1 and t 1 == 0?



int  $x[1] = \{0\};$ 

int  $y[1] = \{0\};$ 

Question: can t 0 == 1 and t 1 == 0?



What about PSO? YES

int  $x[1] = \{0\};$ 

int  $y[1] = \{0\};$ 

Question: can t 0 == 1 and t 1 == 0?



Now it is disallowed in PSO

int  $x[1] = \{0\};$ 

int  $y[1] = \{0\};$ 

Question: can t 0 == 1 and t 1 == 0?



What about RMO?

Question: can t 0 == 1 and t 1 == 0?

int x[1] = {0}; int y[1] = {0};



What about RMO?

Thread 0:

fence

int  $x[1] = \{0\};$ int  $y[1] = \{0\};$  Question: can t 0 == 1 and t 1 == 0?



What about RMO? The loads can be reordered also!

int  $x[1] = \{0\};$ 

int  $y[1] = \{0\};$ 

Question: can t 0 == 1 and t 1 == 0?



What about RMO? add a fence

Question: can t 0 == 1 and t 1 == 0?

int x[1] = {0}; int y[1] = {0};



Now the relaxed behavior is disallowed

- Historic Chips:
  - X86: TSO
    - Surprising robost
    - mutexes and concurrent data structures generally seem to work
    - watch out for store buffering
  - IBM Power and ARM
    - Very relaxed. Similar to RMO with even more rules
    - Mutexes and data structures must be written with care
    - ARM recently strengthened theirs
    - Very difficult to write correct code under! PPoPP example

- Historic Chips:
  - X86: TSO
    - Surprising robost
    - mutexes and concurrent data structures generally seem to work
    - watch out for store buffering
  - IBM Power and ARM
    - Very relaxed. Similar to RMO with even more rules
    - Mutexes and data structures must be written with care
    - ARM recently strengthened theirs
    - Very difficult to write correct code under! PPoPP example

Companies have a history of providing insufficient documentation about their rules: academics have then gone and figured it out!

Getting better these days

- Modern Chips:
  - RISC-V : two specs: one similar to TSO, one similar to RMO
  - Apple M1: toggles between TSO and weaker
  - Vulkan does not provide any fences that provide S L ordering

- PSO and RMO were never implemented widely
  - I have not met anyone who knows of any RMO taped out chip
  - They are part of SPARC ISAs (i.e. RISC-V before it was cool)
  - These memory models might have been part of specialized chips
- Interestingly:
  - Early Nvidia GPUs appeared to informally implement RMO
- Other chips have very strange memory models:
  - Alpha DEC basically no rules

#### Next week

- Finish up memory models:
  - Compilers
- Execution barriers
- Watch for midterm grades sometime today