* schedule: cache unbinds for consistent cache keys strip BIND values before computing cache key so different bound values (e.g. KV cache positions) hit the same schedule cache entry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * spec: allow single-src BIND for schedule cache key normalization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add lessons learned to CLAUDE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * more claude.md --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
5.9 KiB
Claude Code Guide for tinygrad
Architecture Overview
tinygrad compiles tensor operations into optimized kernels. The pipeline:
- Tensor (
tensor.py) - User-facing API, creates UOp graph - UOp (
uop/ops.py) - Unified IR for all operations (both tensor and kernel level) - Schedule (
engine/schedule.py,schedule/) - Converts tensor UOps to kernel UOps - Codegen (
codegen/) - Converts kernel UOps to device code - Runtime (
runtime/) - Device-specific execution
Key Concepts
UOp (Universal Operation)
Everything is a UOp - tensors, operations, buffers, kernels. Key properties:
op: The operation type (Ops enum)dtype: Data typesrc: Tuple of source UOpsarg: Operation-specific argumenttag: Optional tag for graph transformations
UOps are immutable and cached - creating the same UOp twice returns the same object (ucache).
PatternMatcher
Used extensively for graph transformations:
pm = PatternMatcher([
(UPat(Ops.ADD, src=(UPat.cvar("x"), UPat.cvar("x"))), lambda x: x * 2),
])
result = graph_rewrite(uop, pm)
Schedule Cache
Schedules are cached by graph structure. BIND nodes (variables with bound values) are unbound before cache key computation so different values hit the same cache.
Directory Structure
tinygrad/
├── tensor.py # Tensor class, user API
├── device.py # Buffer, device management
├── dtype.py # Data types
├── helpers.py # Utilities, environment vars
├── uop/
│ ├── ops.py # UOp class, Ops enum, PatternMatcher
│ ├── spec.py # UOp type verification
│ └── symbolic.py # Symbolic math simplification
├── engine/
│ ├── schedule.py # Schedule creation, caching
│ ├── realize.py # Tensor realization
│ ├── jit.py # JIT compilation
│ └── memory.py # Memory planning
├── schedule/
│ ├── rangeify.py # Convert movements to ranges
│ └── indexing.py # Index calculations
├── codegen/
│ ├── kernel.py # Kernel optimization
│ └── uopgraph.py # UOp graph transformations
├── renderer/ # Code generation (CUDA, Metal, etc.)
└── runtime/ # Device backends
Testing
# Run specific test
python -m pytest test/unit/test_schedule_cache.py -xvs
# Run with timeout
python -m pytest test/test_symbolic_ops.py -x --timeout=60
# Debug with print
DEBUG=2 python -m pytest test/test_schedule.py::test_name -xvs
# Visualize UOp graphs
VIZ=1 python -c "from tinygrad import Tensor; Tensor.ones(10).sum().realize()"
Common Environment Variables
DEBUG=1-4- Increasing verbosityVIZ=1- Enable graph visualizationSPEC=1- Enable UOp spec verificationNOOPT=1- Disable optimizationsDEVICE=CPU/CUDA/AMD/METAL- Set default device
Debugging Tips
- Print UOp graphs:
print(tensor.uop)orprint(tensor.uop.sink()) - Check schedule:
tensor.schedule()returns list of ScheduleItems - Trace graph rewrites: Use
VIZ=1or add print in PatternMatcher callbacks - Find UOps by type:
[u for u in uop.toposort() if u.op is Ops.SOMETHING]
Workflow Rules
- NEVER commit without explicit user approval - always show the diff and wait for approval
- Run tests before proposing commits
- Test with
SPEC=2when modifying UOp-related code
Style Notes
- 2-space indentation, 150 char line limit
- PatternMatchers should be defined at module level (slow to construct)
- Prefer
graph_rewriteover manual graph traversal - UOp methods like
.replace()preserve tags unless explicitly changed - Use
.rtag(value)to add tags to UOps
Lessons Learned
UOp ucache Behavior
UOps are cached by their contents - creating a UOp with identical (op, dtype, src, arg) returns the same object. This means:
uop.replace(tag=None)on a tagged UOp returns the original untagged UOp if it exists in cache- Two UOps with same structure are identical (
iscomparison works)
Spec Validation
When adding new UOp patterns, update tinygrad/uop/spec.py. Test with:
SPEC=2 python3 test/unit/test_something.py
Spec issues appear as RuntimeError: SPEC ISSUE None: UOp(...).
Schedule Cache Key Normalization
The schedule cache strips values from BIND nodes so different bound values (e.g., KV cache positions) hit the same cache entry:
pm_pre_sched_cache: BIND(DEFINE_VAR, CONST) → BIND(DEFINE_VAR) for cache keypm_post_sched_cache: restores original BIND from context- When accessing
bind.src[1], checklen(bind.src) > 1first (might be stripped) - Extract var_vals from
input_buffersdict after graph_rewrite (avoids extra toposort)
Avoiding Extra Work
- Use ctx dict from graph_rewrite to collect info during traversal instead of separate toposort
- Only extract var_vals when schedule is non-empty (no kernels = no vars needed)
- PatternMatchers are slow to construct - define at module level, not in functions
Testing LLM Changes
# Quick smoke test
echo "Hello" | DEBUG=1 python tinygrad/apps/llm.py --model "llama3.2:1b"
# Check cache hits (should see "cache hit" after warmup)
echo "Hello world" | DEBUG=1 python tinygrad/apps/llm.py --model "llama3.2:1b" 2>&1 | grep cache
# Test with beam search
echo "Hello" | BEAM=2 python tinygrad/apps/llm.py --model "llama3.2:1b"
Common Patterns
Graph Transformation
def my_transform(ctx, x):
# Return new UOp or None to skip
return x.replace(arg=new_arg)
pm = PatternMatcher([
(UPat(Ops.SOMETHING, name="x"), my_transform),
])
result = graph_rewrite(input_uop, pm, ctx={})
Finding Variables
# Get all variables in a UOp graph
variables = uop.variables()
# Get bound variable values
var, val = bind_uop.unbind()
Shape Handling
# Shapes can be symbolic (contain UOps)
shape = tensor.shape # tuple[sint, ...] where sint = int | UOp