mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-07 22:23:55 -05:00

Files

George Hotz 55845f7de7 schedule: cache unbinds for consistent cache keys (#13664 )

* schedule: cache unbinds for consistent cache keys

strip BIND values before computing cache key so different bound values
(e.g. KV cache positions) hit the same schedule cache entry.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* spec: allow single-src BIND for schedule cache key normalization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add lessons learned to CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* more claude.md

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-12 17:27:42 -05:00

5.9 KiB

Raw Blame History

Claude Code Guide for tinygrad

Architecture Overview

tinygrad compiles tensor operations into optimized kernels. The pipeline:

Tensor (tensor.py) - User-facing API, creates UOp graph
UOp (uop/ops.py) - Unified IR for all operations (both tensor and kernel level)
Schedule (engine/schedule.py, schedule/) - Converts tensor UOps to kernel UOps
Codegen (codegen/) - Converts kernel UOps to device code
Runtime (runtime/) - Device-specific execution

Key Concepts

UOp (Universal Operation)

Everything is a UOp - tensors, operations, buffers, kernels. Key properties:

op: The operation type (Ops enum)
dtype: Data type
src: Tuple of source UOps
arg: Operation-specific argument
tag: Optional tag for graph transformations

UOps are immutable and cached - creating the same UOp twice returns the same object (ucache).

PatternMatcher

Used extensively for graph transformations:

pm = PatternMatcher([
  (UPat(Ops.ADD, src=(UPat.cvar("x"), UPat.cvar("x"))), lambda x: x * 2),
])
result = graph_rewrite(uop, pm)

Schedule Cache

Schedules are cached by graph structure. BIND nodes (variables with bound values) are unbound before cache key computation so different values hit the same cache.

Directory Structure

tinygrad/
├── tensor.py          # Tensor class, user API
├── device.py          # Buffer, device management
├── dtype.py           # Data types
├── helpers.py         # Utilities, environment vars
├── uop/
│   ├── ops.py         # UOp class, Ops enum, PatternMatcher
│   ├── spec.py        # UOp type verification
│   └── symbolic.py    # Symbolic math simplification
├── engine/
│   ├── schedule.py    # Schedule creation, caching
│   ├── realize.py     # Tensor realization
│   ├── jit.py         # JIT compilation
│   └── memory.py      # Memory planning
├── schedule/
│   ├── rangeify.py    # Convert movements to ranges
│   └── indexing.py    # Index calculations
├── codegen/
│   ├── kernel.py      # Kernel optimization
│   └── uopgraph.py    # UOp graph transformations
├── renderer/          # Code generation (CUDA, Metal, etc.)
└── runtime/           # Device backends

Testing

# Run specific test
python -m pytest test/unit/test_schedule_cache.py -xvs

# Run with timeout
python -m pytest test/test_symbolic_ops.py -x --timeout=60

# Debug with print
DEBUG=2 python -m pytest test/test_schedule.py::test_name -xvs

# Visualize UOp graphs
VIZ=1 python -c "from tinygrad import Tensor; Tensor.ones(10).sum().realize()"

Common Environment Variables

DEBUG=1-4 - Increasing verbosity
VIZ=1 - Enable graph visualization
SPEC=1 - Enable UOp spec verification
NOOPT=1 - Disable optimizations
DEVICE=CPU/CUDA/AMD/METAL - Set default device

Debugging Tips

Print UOp graphs: print(tensor.uop) or print(tensor.uop.sink())
Check schedule: tensor.schedule() returns list of ScheduleItems
Trace graph rewrites: Use VIZ=1 or add print in PatternMatcher callbacks
Find UOps by type: [u for u in uop.toposort() if u.op is Ops.SOMETHING]

Workflow Rules

NEVER commit without explicit user approval - always show the diff and wait for approval
Run tests before proposing commits
Test with SPEC=2 when modifying UOp-related code

Style Notes

2-space indentation, 150 char line limit
PatternMatchers should be defined at module level (slow to construct)
Prefer graph_rewrite over manual graph traversal
UOp methods like .replace() preserve tags unless explicitly changed
Use .rtag(value) to add tags to UOps

Lessons Learned

UOp ucache Behavior

UOps are cached by their contents - creating a UOp with identical (op, dtype, src, arg) returns the same object. This means:

uop.replace(tag=None) on a tagged UOp returns the original untagged UOp if it exists in cache
Two UOps with same structure are identical (is comparison works)

Spec Validation

When adding new UOp patterns, update tinygrad/uop/spec.py. Test with:

SPEC=2 python3 test/unit/test_something.py

Spec issues appear as RuntimeError: SPEC ISSUE None: UOp(...).

Schedule Cache Key Normalization

The schedule cache strips values from BIND nodes so different bound values (e.g., KV cache positions) hit the same cache entry:

pm_pre_sched_cache: BIND(DEFINE_VAR, CONST) → BIND(DEFINE_VAR) for cache key
pm_post_sched_cache: restores original BIND from context
When accessing bind.src[1], check len(bind.src) > 1 first (might be stripped)
Extract var_vals from input_buffers dict after graph_rewrite (avoids extra toposort)

Avoiding Extra Work

Use ctx dict from graph_rewrite to collect info during traversal instead of separate toposort
Only extract var_vals when schedule is non-empty (no kernels = no vars needed)
PatternMatchers are slow to construct - define at module level, not in functions

Testing LLM Changes

# Quick smoke test
echo "Hello" | DEBUG=1 python tinygrad/apps/llm.py --model "llama3.2:1b"

# Check cache hits (should see "cache hit" after warmup)
echo "Hello world" | DEBUG=1 python tinygrad/apps/llm.py --model "llama3.2:1b" 2>&1 | grep cache

# Test with beam search
echo "Hello" | BEAM=2 python tinygrad/apps/llm.py --model "llama3.2:1b"

Common Patterns

Graph Transformation

def my_transform(ctx, x):
  # Return new UOp or None to skip
  return x.replace(arg=new_arg)

pm = PatternMatcher([
  (UPat(Ops.SOMETHING, name="x"), my_transform),
])
result = graph_rewrite(input_uop, pm, ctx={})

Finding Variables

# Get all variables in a UOp graph
variables = uop.variables()

# Get bound variable values
var, val = bind_uop.unbind()

Shape Handling

# Shapes can be symbolic (contain UOps)
shape = tensor.shape  # tuple[sint, ...] where sint = int | UOp

5.9 KiB Raw Blame History