Cache Yaddaş

Cache Nədir?

Cache - CPU və main memory arasında yerləşən kiçik, sürətli yaddaş. Tez-tez istifadə olunan məlumatları saxlayır və memory access latency-ni azaldır.

Cache Əsasları

Cache Line (Block)

Cache-də məlumat cache line və ya block şəklində saxlanılır.

graph LR
    A[Cache Line] --> B[Tag]
    A --> C[Data Block]
    A --> D[Valid Bit]
    A --> E[Dirty Bit]
    
    C --> C1[Byte 0]
    C --> C2[Byte 1]
    C --> C3[...]
    C --> C4[Byte 63]

Tipik cache line size: 64 bytes

Address Breakdown

graph LR
    Addr[Memory Address] --> Tag[Tag Bits]
    Addr --> Index[Index Bits]
    Addr --> Offset[Offset Bits]
    
    Tag --> T[Hansı block]
    Index --> I[Hansı set]
    Offset --> O[Block daxilində byte]

Cache Structure Növləri

1. Direct-Mapped Cache

Hər memory block yalnız bir cache line-a map olunur.

graph TB
    subgraph "Memory Addresses"
        M0[Block 0]
        M1[Block 1]
        M2[Block 2]
        M3[Block 3]
        M4[Block 4]
        M5[Block 5]
    end
    
    subgraph "Cache Lines"
        C0[Line 0]
        C1[Line 1]
        C2[Line 2]
        C3[Line 3]
    end
    
    M0 --> C0
    M1 --> C1
    M2 --> C2
    M3 --> C3
    M4 --> C0
    M5 --> C1

Mapping: Cache Line = (Memory Block) % (Number of Cache Lines)

Üstünlükləri:

Sadə implementasiya
Sürətli lookup (bir müqayisə)
Az hardware

Çatışmazlıqları:

Conflict miss çoxdur
Poor cache utilization

2. Fully-Associative Cache

Hər memory block istənilən cache line-a map oluna bilər.

graph TB
    subgraph "Memory Blocks"
        M0[Block 0]
        M1[Block 1]
        M2[Block 2]
    end
    
    subgraph "Cache Lines - Any Position"
        C0[Line 0]
        C1[Line 1]
        C2[Line 2]
        C3[Line 3]
    end
    
    M0 -.-> C0
    M0 -.-> C1
    M0 -.-> C2
    M0 -.-> C3

Üstünlükləri:

Minimum conflict miss
Optimal cache utilization
Flexibility

Çatışmazlıqları:

Çox bahalı (paralel comparators)
Yavaş lookup (bütün line-ları yoxlamaq)
Kompleks hardware

3. Set-Associative Cache

Hybrid approach - cache set-lərə bölünür, hər set-də N line var.

N-way set-associative: Hər set-də N line

graph TB
    subgraph "4-Way Set-Associative Cache"
        subgraph "Set 0"
            S0L0[Way 0]
            S0L1[Way 1]
            S0L2[Way 2]
            S0L3[Way 3]
        end
        
        subgraph "Set 1"
            S1L0[Way 0]
            S1L1[Way 1]
            S1L2[Way 2]
            S1L3[Way 3]
        end
        
        subgraph "Set 2"
            S2L0[Way 0]
            S2L1[Way 1]
            S2L2[Way 2]
            S2L3[Way 3]
        end
    end

Mapping:

Set Number = (Memory Block) % (Number of Sets)
Block istənilən way-ə yerləşə bilər

Üstünlükləri:

Balance between direct-mapped və fully-associative
Reasonable performance və cost
Ən çox istifadə olunan (L1: 8-way, L2/L3: 16-way)

Müqayisə

Növ	Hit Time	Miss Rate	Hardware Cost	Flexibility
Direct-Mapped	Ən sürətli	Yüksək	Aşağı	Aşağı
Set-Associative	Orta	Orta	Orta	Orta
Fully-Associative	Yavaş	Ən aşağı	Yüksək	Maksimum

Replacement Policies

Cache dolu olanda, hansı line-ı replace etmək?

1. LRU (Least Recently Used)

Ən az istifadə olunan line-ı çıxarır
Yaxşı performans
Kompleks hardware (timestamp və ya counter)

2. Random

Random line seçir
Sadə implementasiya
LRU-ya yaxın performans

3. FIFO (First-In-First-Out)

Ən köhnə line-ı çıxarır
Sadə
LRU-dan pis

4. LFU (Least Frequently Used)

Ən az frequency-li line
Counter lazımdır
Praktikada az istifadə olunur

Write Policies

Write-Hit Policies

1. Write-Through

sequenceDiagram
    participant CPU
    participant Cache
    participant Memory
    
    CPU->>Cache: Write data
    Cache->>Cache: Update cache
    Cache->>Memory: Write to memory
    Note over Memory: Always synchronized

Xüsusiyyətlər:

Cache və memory həmişə sync
Hər yazma memory-ə gedír
Yavaş (memory latency)
Write buffer istifadə olunur

2. Write-Back

sequenceDiagram
    participant CPU
    participant Cache
    participant Memory
    
    CPU->>Cache: Write data
    Cache->>Cache: Update cache + set dirty bit
    Note over Cache: Memory NOT updated yet
    
    Note over Cache: Eviction zamanı
    Cache->>Memory: Write dirty line

Xüsusiyyətlər:

Yalnız cache-ə yazır
Memory eviction zamanı update olunur
Sürətli
Dirty bit lazımdır
Cache coherence çətinliyi

Write-Miss Policies

1. Write-Allocate

Line-ı cache-ə yüklə
Sonra write et
Write-back ilə istifadə olunur

2. No-Write-Allocate

Birbaşa memory-ə yaz
Cache-ə yüklə-mə
Write-through ilə istifadə olunur

Cache Miss Növləri

1. Compulsory (Cold) Miss

İlk dəfə access
Qarşısı alına bilməz
Cache warming ilə minimize olunur

2. Capacity Miss

Cache kiçik olduğu üçün
Working set cache-ə sığmır
Həll: Daha böyük cache

3. Conflict Miss

Direct-mapped və set-associative-də
Eyni set-ə map olan block-lar
Həll: Daha yüksək associativity

pie title Cache Miss Paylaşması
    "Compulsory Miss" : 5
    "Capacity Miss" : 30
    "Conflict Miss" : 15
    "Cache Hit" : 50

Cache Coherence

Multi-core sistemlərdə hər core-un öz cache-i var. Cache coherence - bütün cache-lərdə eyni məlumatın consistency-ni təmin edir.

Problem Nümunəsi

sequenceDiagram
    participant Core1
    participant Cache1
    participant Memory
    participant Cache2
    participant Core2
    
    Core1->>Cache1: Read X (X=5)
    Cache1->>Memory: Fetch X
    Memory-->>Cache1: X=5
    
    Core2->>Cache2: Read X (X=5)
    Cache2->>Memory: Fetch X
    Memory-->>Cache2: X=5
    
    Core1->>Cache1: Write X=10
    Note over Cache1: Cache1: X=10
    Note over Cache2: Cache2: X=5 (STALE!)
    Note over Memory: Memory: X=5 (write-back)

MESI Protocol

MESI - ən populyar cache coherence protokolu.

States:

M (Modified): Yalnız bu cache-də var, dəyişdirilib, dirty
E (Exclusive): Yalnız bu cache-də var, təmiz, memory ilə sync
S (Shared): Bir neçə cache-də var, read-only
I (Invalid): Keçərsiz, istifadə olunmur

stateDiagram-v2
    [*] --> I
    I --> E: Read (no other cache)
    I --> S: Read (other caches have)
    
    E --> M: Write
    E --> S: Other cache reads
    E --> I: Other cache writes
    
    S --> M: Write
    S --> I: Other cache writes
    
    M --> S: Other cache reads
    M --> I: Other cache writes
    
    S --> I: Invalidation
    M --> I: Invalidation
    E --> I: Invalidation

MESI State Transitions

Olay	I	E	S	M
Local Read	E/S	-	-	-
Local Write	M	M	M	-
Remote Read	-	S	-	S
Remote Write	-	I	I	I

MOESI Protocol

MOESI = MESI + O (Owned)

O (Owned):

Cache dirty data var
Başqa cache-lər də oxuya bilər
Bu cache memory-ə write etmək məsuliyyətini daşıyır

Üstünlük: Cache-to-cache transfer, memory-ə write gecikdirilir

stateDiagram-v2
    [*] --> I
    I --> E: Exclusive read
    I --> S: Shared read
    
    E --> M: Write
    E --> S: Remote read
    
    S --> M: Write
    S --> O: Remote read from M
    
    M --> O: Remote read
    M --> I: Remote write
    
    O --> M: Write
    O --> S: Remote write to memory
    O --> I: Invalidate

False Sharing - müxtəlif thread-lər eyni cache line-dakı müxtəlif variable-lara yazır.

Problem

struct Counter {
    int count1;  // Thread 1 istifadə edir
    int count2;  // Thread 2 istifadə edir
} counter;  // Eyni cache line-da!

sequenceDiagram
    participant T1
    participant Cache1
    participant Cache2
    participant T2
    
    T1->>Cache1: Write count1
    Note over Cache1,Cache2: Cache line invalidated
    Cache1->>Cache2: Invalidate
    
    T2->>Cache2: Write count2
    Note over Cache1,Cache2: Cache line invalidated
    Cache2->>Cache1: Invalidate
    
    Note over T1,T2: Ping-pong effect!<br/>Performance degradation

Həll: Cache Line Padding

#define CACHE_LINE_SIZE 64

struct Counter {
    int count1;
    char padding1[CACHE_LINE_SIZE - sizeof(int)];
    int count2;
    char padding2[CACHE_LINE_SIZE - sizeof(int)];
} counter;

// və ya C++17
struct alignas(CACHE_LINE_SIZE) AlignedCounter {
    int count;
};

Modern Dillər

Java:

// @Contended annotation (JDK 8+)
@sun.misc.Contended
public class PaddedCounter {
    private volatile long count;
}

C++17:

struct alignas(std::hardware_destructive_interference_size) Counter {
    std::atomic<int> count;
};

Cache Optimizasiya Texnikaları

1. Blocking (Tiling)

// BAD: Poor cache locality
for (int i = 0; i < N; i++)
    for (int j = 0; j < N; j++)
        for (int k = 0; k < N; k++)
            C[i][j] += A[i][k] * B[k][j];

// GOOD: Cache blocking
#define BLOCK_SIZE 64
for (int i0 = 0; i0 < N; i0 += BLOCK_SIZE)
    for (int j0 = 0; j0 < N; j0 += BLOCK_SIZE)
        for (int k0 = 0; k0 < N; k0 += BLOCK_SIZE)
            for (int i = i0; i < min(i0+BLOCK_SIZE, N); i++)
                for (int j = j0; j < min(j0+BLOCK_SIZE, N); j++)
                    for (int k = k0; k < min(k0+BLOCK_SIZE, N); k++)
                        C[i][j] += A[i][k] * B[k][j];

2. Loop Fusion

// BAD: İki separate loop
for (int i = 0; i < N; i++)
    a[i] = b[i] + c[i];

for (int i = 0; i < N; i++)
    d[i] = a[i] * 2;

// GOOD: Fused loop - better cache reuse
for (int i = 0; i < N; i++) {
    a[i] = b[i] + c[i];
    d[i] = a[i] * 2;
}

3. Prefetching

// Software prefetching
for (int i = 0; i < N; i++) {
    __builtin_prefetch(&array[i + 8]);  // GCC
    process(array[i]);
}

Cache Performance Metrics

Average Memory Access Time (AMAT)

AMAT = Hit Time + (Miss Rate × Miss Penalty)

Nümunə:

Hit Time = 1 cycle
Miss Rate = 5%
Miss Penalty = 100 cycles

AMAT = 1 + (0.05 × 100) = 6 cycles

Cache Hit Rate

Hit Rate = (Cache Hits) / (Total Accesses)
Miss Rate = 1 - Hit Rate

Hardware Prefetching

Modern CPU-lar avtomatik prefetch edirlər:

1. Sequential Prefetcher

Ardıcıl access pattern aşkarlayır
Növbəti cache line-ları fetch edir

2. Stride Prefetcher

Fixed stride pattern (hər N-ci element)
Array access-də effektiv

3. Stream Prefetcher

Multiple parallel stream-lər
Complex pattern-lər

Praktiki Məsləhətlər

Data structure layout-a fikir ver
- Hot data birlikdə saxla
- Cache line-a align et
Sequential access prefer et
- Array traverse: row-major order
- Linked list-dən array üstünlük ver
False sharing-dən qaç
- Thread-specific data ayrı cache line-da
- Padding istifadə et
Working set kiçik saxla
- Cache-ə sığan qədər data
- Blocking/tiling texnikaları
Profiling et
- perf stat - cache miss statistics
- valgrind --tool=cachegrind

Əlaqəli Mövzular

Memory Hierarchy: Cache-in yaddaş sistemindəki yeri
Multiprocessor Systems: Cache coherence multi-core-da
Performance Optimization: Cache-friendly code
False Sharing: Multi-threading cache problemi

Cache Nədir?​

Cache Əsasları​

Cache Line (Block)​

Address Breakdown​

Cache Structure Növləri​

1. Direct-Mapped Cache​

2. Fully-Associative Cache​

3. Set-Associative Cache​

Müqayisə​

Replacement Policies​

1. LRU (Least Recently Used)​

2. Random​

3. FIFO (First-In-First-Out)​

4. LFU (Least Frequently Used)​

Write Policies​

Write-Hit Policies​

Write-Miss Policies​

Cache Miss Növləri​

1. Compulsory (Cold) Miss​

2. Capacity Miss​

3. Conflict Miss​

Cache Coherence​

Problem Nümunəsi​

MESI Protocol​

MESI State Transitions​

MOESI Protocol​

False Sharing​

Problem​

Həll: Cache Line Padding​

Modern Dillər​

Cache Optimizasiya Texnikaları​

1. Blocking (Tiling)​

2. Loop Fusion​

3. Prefetching​

Cache Performance Metrics​

Average Memory Access Time (AMAT)​

Cache Hit Rate​

Hardware Prefetching​

Praktiki Məsləhətlər​

Əlaqəli Mövzular​

Cache Nədir?

Cache Əsasları

Cache Line (Block)

Address Breakdown

Cache Structure Növləri

1. Direct-Mapped Cache

2. Fully-Associative Cache

3. Set-Associative Cache

Müqayisə

Replacement Policies

1. LRU (Least Recently Used)

2. Random

3. FIFO (First-In-First-Out)

4. LFU (Least Frequently Used)

Write Policies

Write-Hit Policies

Write-Miss Policies

Cache Miss Növləri

1. Compulsory (Cold) Miss

2. Capacity Miss

3. Conflict Miss

Cache Coherence

Problem Nümunəsi

MESI Protocol

MESI State Transitions

MOESI Protocol

False Sharing

Problem

Həll: Cache Line Padding

Modern Dillər

Cache Optimizasiya Texnikaları

1. Blocking (Tiling)

2. Loop Fusion

3. Prefetching

Cache Performance Metrics

Average Memory Access Time (AMAT)

Cache Hit Rate

Hardware Prefetching

Praktiki Məsləhətlər

Əlaqəli Mövzular