Əsas məzmuna keçin

Branch Prediction

Branch Prediction Nədir?

Branch Prediction - CPU-nun branch təlimatının hansı yolla gedəcəyini əvvəlcədən proqnozlaşdırmasıdır. Pipeline-da control hazard-ları minimizə etmək üçün istifadə olunur.

Problem: Control Hazards

if (x > 0) {
// Path A
y = x + 5;
} else {
// Path B
y = x - 5;
}

CPU-ya branch qərarı verilənədək gözləmək lazımdır, bu da pipeline-ı dayanmağa məcbur edir.

gantt
title Pipeline Without Branch Prediction
dateFormat X
axisFormat %s

section Branch Instruction
Fetch :0, 1
Decode :1, 2
Execute :2, 3

section Next (if taken)
Stall :3, 4
Stall :4, 5
Fetch :5, 6

Branch Misprediction Penalty

Modern CPU-da 15-20 cycle itki!

Cost = Pipeline Depth × Misprediction Rate

Nümunə:

  • Pipeline depth: 20 stages
  • Misprediction rate: 5%
  • Branch hər 5 təlimatdan bir
Penalty = 20 × 0.05 × (1/5) = 0.2 cycles per instruction
IPC reduction: ~20%

Branch Növləri

1. Conditional Branches

if (condition) { ... }
while (condition) { ... }
for (...) { ... }
cmp rax, 0
je label ; Jump if equal
jne label ; Jump if not equal
jg label ; Jump if greater

2. Unconditional Branches

goto label;
break;
continue;
jmp label         ; Always jump

3. Function Calls/Returns

function();
return value;
call function     ; Call function
ret ; Return

Static Branch Prediction

Compiler və ya hardware tərəfindən sabit qərar.

1. Predict Not Taken

Hər zaman branch alınmaz fərz edilir.

graph LR
B[Branch] --> NT[Predict: Not Taken]
NT --> Continue[Continue Sequential]

Üstünlük: Sadə implementasiya Çatışmazlıq: Loop-lar üçün pis (çox taken)

2. Predict Taken

Hər zaman branch alınır fərz edilir.

Üstünlük: Loop-lar üçün yaxşı Çatışmazlıq: Sequential kod üçün pis

3. Backward Taken, Forward Not Taken (BTFNT)

// Backward branch (loop) - predict taken
for (int i = 0; i < 100; i++) {
// Loop body
} // Branch back - usually taken

// Forward branch - predict not taken
if (unlikely_condition) {
// Rare case
}
// Continue - usually not taken

Accuracy: ~60-70%

Dynamic Branch Prediction

Runtime-da branch behavior-a əsasən proqnoz.

1-Bit Predictor

Ən sadə dynamic predictor.

stateDiagram-v2
[*] --> NotTaken
NotTaken --> Taken: Branch Taken
NotTaken --> NotTaken: Branch Not Taken

Taken --> NotTaken: Branch Not Taken
Taken --> Taken: Branch Taken

Problem: Loop-da 2 dəfə səhv edir!

// Loop iterations: TTTTTTTTN (7 taken, 1 not taken)
for (int i = 0; i < 8; i++) {
// Body
}

Mispredictions:

  • Son iterasiyada: T → N (səhv)
  • Növbəti loop-da ilk dəfə: N → T (səhv)

2-Bit Saturating Counter

Daha stabil - iki dəfə səhv olmalıdır ki, dəyişsin.

stateDiagram-v2
[*] --> SNT

SNT: Strongly Not Taken (00)
WNT: Weakly Not Taken (01)
WT: Weakly Taken (10)
ST: Strongly Taken (11)

SNT --> WNT: Taken
SNT --> SNT: Not Taken

WNT --> WT: Taken
WNT --> SNT: Not Taken

WT --> ST: Taken
WT --> WNT: Not Taken

ST --> WT: Not Taken
ST --> ST: Taken

States:

  • 00: Strongly Not Taken - predict not taken
  • 01: Weakly Not Taken - predict not taken
  • 10: Weakly Taken - predict taken
  • 11: Strongly Taken - predict taken

Loop nümunəsi:

Iterations: TTTTTTTTN
Predictor: 11→11→11→11→11→11→11→10→01 (yalnız 1 səhv!)

Pattern History Table (PHT)

Branch address-ə görə predictor.

graph LR
PC[Branch PC] --> Hash[Hash Function]
Hash --> Index[PHT Index]
Index --> Entry[2-bit Counter]
Entry --> Pred[Prediction]

Ölçü: 4K-64K entries (tipik)

Əks tərəfi: Aliasing - müxtəlif branch-lər eyni entry-ə map ola bilər

Branch Target Buffer (BTB)

Branch-in target address-ini cache edir.

graph TB
PC[Branch PC] --> BTB[Branch Target Buffer]
BTB --> Hit{Hit?}

Hit -->|Yes| Target[Get Target Address]
Hit -->|No| Calc[Calculate Target]

Target --> Fetch[Fetch from Target]
Calc --> Update[Update BTB]
Update --> Fetch

BTB Entry:

  • Branch PC
  • Target Address
  • Branch Type
  • Prediction bits

BTB Structure

Branch PCTarget AddressPredictionValid
0x10000x2000Taken1
0x11000x1200Not Taken1
0x15000x3000Taken1

Two-Level Adaptive Predictor

Branch history pattern istifadə edir.

graph TB
PC[Branch PC] --> BHR[Branch History Register]
BHR --> Pattern[Pattern: 101...]

PC --> XOR[XOR]
Pattern --> XOR

XOR --> PHT[Pattern History Table]
PHT --> Counter[2-bit Counter]
Counter --> Pred[Prediction]

Global History

Bütün branch-lərin ümumi tarixi.

Branch History Register (BHR):

Last 8 branches: 10110101

Correlation nümunəsi:

if (a > 0) {      // Branch 1
x = 1;
}
if (a > 0) { // Branch 2 - same as Branch 1!
y = 1;
}

Branch 2-nin davranışı Branch 1-dən asılıdır!

Local History

Hər branch-in öz tarixi.

graph TB
PC[Branch PC] --> LHT[Local History Table]
LHT --> History[Branch History]
History --> PHT[Pattern History Table]
PHT --> Pred[Prediction]

Üstünlük: Müxtəlif branch-lərin tarixi qarışmır

Tournament Predictor

Bir neçə predictor-dan ən yaxşısını seçir.

graph TB
PC[Branch PC] --> G[Global Predictor]
PC --> L[Local Predictor]
PC --> S[Selector/Meta Predictor]

G --> Pred1[Prediction 1]
L --> Pred2[Prediction 2]

S --> Choose{Choose}
Pred1 --> Choose
Pred2 --> Choose

Choose --> Final[Final Prediction]

Selector: Hansı predictor daha yaxşı işləyir?

Intel Core və AMD Ryzen istifadə edir.

Speculative Execution

Branch proqnozuna əsasən əvvəlcədən təlimatları icra edir.

sequenceDiagram
participant Fetch
participant Predict
participant Execute
participant Commit

Fetch->>Predict: Branch instruction
Predict->>Fetch: Predicted path
Fetch->>Execute: Speculative instructions

alt Prediction Correct
Execute->>Commit: Commit results
else Prediction Wrong
Execute->>Execute: Flush pipeline
Execute->>Fetch: Fetch correct path
end

Speculative Execution States

1. Fetch: Predicted path-dən təlimatlar gətir 2. Execute: Speculatively icra et 3. Verify: Həqiqi branch qərarını yoxla 4. Commit/Flush: Düzgündürsə commit, yoxsa flush

Misprediction Recovery

graph LR
M[Misprediction Detected] --> F[Flush Pipeline]
F --> R[Restore State]
R --> FT[Fetch Correct Path]
FT --> C[Continue Execution]

Cost: 15-20 cycles modern CPU-da

Return Address Stack (RAS)

Function return address-lərini predict edir.

graph TB
Call[CALL instruction] --> Push[Push return address to RAS]
Ret[RET instruction] --> Pop[Pop from RAS]
Pop --> Predict[Predict return target]

Stack structure:

Top  → 0x2000  (most recent call)
0x1500
0x1000
Bottom → 0x500 (oldest call)

Accuracy: ~95-99% (çox yüksək)

Branch Prediction Accuracy

Modern CPU Performance

PredictorAccuracyHardware Cost
Static (BTFNT)60-70%Çox aşağı
1-bit70-80%Aşağı
2-bit85-90%Aşağı
Two-level92-95%Orta
Tournament95-98%Yüksək
Neural97-99%Çox yüksək

Kod Optimizasiyası

1. Predictable Branches

// BAD: Unpredictable (random)
if (rand() % 2 == 0) {
// 50% chance
}

// GOOD: Predictable (loop)
for (int i = 0; i < N; i++) {
// Taken N-1 times, not taken once
}

2. Likely/Unlikely Hints

// GCC/Clang
if (__builtin_expect(rare_condition, 0)) {
// Unlikely path
}

// C++20
if (unlikely(rare_condition)) {
// Unlikely path
}

3. Branchless Code

// BAD: Branch
int max(int a, int b) {
if (a > b) return a;
else return b;
}

// GOOD: Branchless (using conditional move)
int max(int a, int b) {
return a > b ? a : b; // Compiler uses CMOV
}

// GOOD: Branchless (arithmetic)
int abs(int x) {
int mask = x >> 31; // -1 if negative, 0 if positive
return (x + mask) ^ mask;
}

4. Loop Unswitching

// BAD: Branch inside loop
for (int i = 0; i < N; i++) {
if (flag) {
a[i] = b[i] + c[i];
} else {
a[i] = b[i] - c[i];
}
}

// GOOD: Move branch outside
if (flag) {
for (int i = 0; i < N; i++) {
a[i] = b[i] + c[i];
}
} else {
for (int i = 0; i < N; i++) {
a[i] = b[i] - c[i];
}
}

5. Data-Driven Instead of Control-Driven

// BAD: Multiple branches
if (type == TYPE_A) process_a();
else if (type == TYPE_B) process_b();
else if (type == TYPE_C) process_c();

// GOOD: Function pointer table
void (*handlers[])(void) = {process_a, process_b, process_c};
handlers[type]();

Profiling Branch Behavior

Linux perf

# Branch statistics
perf stat -e branches,branch-misses ./program

# Branch prediction rate
perf stat -e cpu/event=0xc4,umask=0x00/ ./program

Output nümunəsi:

1,234,567,890  branches
61,234,567 branch-misses # 4.96% of all branches

Intel VTune

  • Branch misprediction analysis
  • Hot branch identification
  • Speculation impact

Real-World Impact

Performance Loss from Mispredictions

Program A: 95% prediction accuracy
- 5% misprediction × 20 cycle penalty = 1 cycle average loss per branch
- Branches every 5 instructions
- IPC loss: ~20%

Program B: 99% prediction accuracy
- 1% misprediction × 20 cycle penalty = 0.2 cycle average loss
- IPC loss: ~4%

Modern Branch Predictors

Intel

  • Pentium: 2-bit bimodal
  • Pentium Pro: Two-level
  • Core: Tournament
  • Skylake+: Neural-inspired (TAGE)

AMD

  • K8: Hybrid bimodal/two-level
  • Zen: Perceptron-based
  • Zen 3+: Advanced neural predictor

ARM

  • Cortex-A: Two-level adaptive
  • Cortex-X: Tournament + indirect predictor

Branch Prediction və Security

Spectre Attack

Speculative execution istifadə edərək gizli məlumat oxumaq.

// Victim code
if (index < array_size) {
value = array[index]; // Speculative load
temp = array2[value * 256]; // Side channel
}

Mitigations:

  • Speculation barriers
  • Bounds check masking
  • Microcode updates

Best Practices

  1. Predictable patterns prefer et

    • Loop-lar yaxşı predict olunur
    • Random behavior pis
  2. Branch-free kod yaz (mümkünsə)

    • Conditional moves
    • Arithmetic tricks
  3. Profiling et

    • Hansı branch-lər problematik?
    • Optimization targets
  4. Likely/unlikely annotations

    • Compiler-ə kömək et
    • Rare case-lər işarələ
  5. Code layout optimize et

    • Hot path sequential
    • Cold path ayrı

Əlaqəli Mövzular

  • Pipelining: Branch prediction pipeline efficiency üçün
  • Speculative Execution: Branch prediction əsasında
  • CPU Performance: Misprediction impact
  • Security: Spectre və Meltdown