Brendan Ang

Search

Search IconIcon to open search

Cache

Last updated Nov 8, 2022 Edit Source

# Cache

# Memory Organisation

Memory in cache is stored as cache lines. A CPU will try to access cache through a memory address:

# Instruction cache and Data cache

Storing these separately will allow in better parallelism. CPU is able to fetch instructions from instruction cache while writing to data cache for STUR instructions.

Cache Placement Policies

Cache Replacement Policies

Cache Write Policies

# Performance

The key factor affecting cache performance is the effects of cache misses. When a cache miss occurs, compute cycles are needed to find a victim, request the appropriate data from memory, fill the cache line with this new block and resume execution.

# Types of Misses

# Compulsory miss

First reference to a given block of memory. This is an inevitable miss as data has not been accessed before.

# Capacity miss

This occurs when the current working set exceeds the cache capacity. Current useful blocks have to be replaced.

# Conflict miss

When useful blocks are displaced due to placement policies. E.g. fully associative cache mapping

# Design considerations

# False sharing

An issue arises when different cores have cached values which share the same cache line. The picture below depicts how 2 cores try to access different independent values (X and Y) that resides on the same cache line in L3 cache. If X and Y are highly used variables

# Measuring Impact with CPI

$$ \begin{align} &CPU_{time}=(CPU_{\text{execution cycles}}+\text{Memory stall cycles})\times\text{Cycle Time}\\ &CPU_{time}=((IC\times CPI)+(IC\times%\text{Memory Access}\times\text{Miss Rate}\times\text{Miss Penalty}))\times \text{Cycle Time} \end{align} $$ L1 cache hit can be considered to be part of CPI ideal as it is often possible to complete the data access within the ideal clock cycles.

# Example

# Multi-level cache example

# Measuring Impact with Average Memory Access Time (AMAT)

We need a way to measure the performance of cache, standalone from the performance of the CPU. AMAT is the average time to access memory considering both hits and misses $$ \begin{aligned} AMAT&=\text{Hit Time}\times(1-\text{Miss Rate})+\text{Miss Rate}\times(\text{Hit Time}+\text{Miss Penalty})\\ &=\text{Time for hit}+\text{Miss Rate}\times\text{Miss Penalty}\\ \end{aligned} $$ Note here that Miss Penalty is the loss in cycles for a miss, and not just the cost of main memory access. e.g If time to hit cache = 1, time to hit main memory = 100, miss penalty is $100-1=99$. One example to show that AMAT is superior would be to consider two different caches with similar miss rates, but drastically different hit times. Using the miss rate metric, we would rate both caches the same. Using the AMAT metric, a cache with a lower hit time or lower miss penalty will outperform a cache with a higher respective time, assuming all other variables are the same.

# Practice Problems

3 bit index, 2bit tag with 0 bit offset.

AddrIndexTagH/MState
1001101110M001,10
0000100100H
0011011000H
0101001001M010,01
0111011001M110,01
1100100111M001,11
0000100100M001,00
1110010011M100,11
1010010010M100,10
Bits for offset = $log_24=2$
2 way set associative cache: Each set contains 2 cache lines -> 2 blocks per set -> 8 bytes per set -> 2 sets
Bits for index -> 1
Access100011011011001010111111100001100
————–——–——–———
Tag10001101101011110001
Offset01101100
Index1011
H/MMMMH
LRU110001101101011110001
LRU2NANA1000110111

Hit rate: 2/8 = 25% a. $$ \begin{align} &T=IC\times CPI\times Period\\ &CPI_{ideal}=1.25\\ &\text{L1 stall cycles}=0.2\times0.2\times8=0.32\\ &\text{L2 stall cycles}=0.2\times0.2\times0.1\times30=0.12\\ &\text{CPI stall}=1.25+0.32+0.12=1.69\\ \end{align} $$ b. $$ \begin{align} &\text{L1 stall cycles}=0.2\times0.2\times30=1.2\\ &\text{CPI stall}=1.25+1.2=2.45 \end{align} $$