# Claude Code Guide for tinygrad ## Architecture Overview tinygrad compiles tensor operations into optimized kernels. The pipeline: 1. **Tensor** (`tensor.py`) - User-facing API, creates UOp graph 2. **UOp** (`uop/ops.py`) - Unified IR for all operations (both tensor and kernel level) 3. **Schedule** (`engine/schedule.py`, `schedule/`) - Converts tensor UOps to kernel UOps 4. **Codegen** (`codegen/`) - Converts kernel UOps to device code 5. **Runtime** (`runtime/`) - Device-specific execution ## Key Concepts ### UOp (Universal Operation) Everything is a UOp - tensors, operations, buffers, kernels. Key properties: - `op`: The operation type (Ops enum) - `dtype`: Data type - `src`: Tuple of source UOps - `arg`: Operation-specific argument - `tag`: Optional tag for graph transformations UOps are **immutable and cached** - creating the same UOp twice returns the same object (ucache). ### PatternMatcher Used extensively for graph transformations: ```python pm = PatternMatcher([ (UPat(Ops.ADD, src=(UPat.cvar("x"), UPat.cvar("x"))), lambda x: x * 2), ]) result = graph_rewrite(uop, pm) ``` ### Schedule Cache Schedules are cached by graph structure. BIND nodes (variables with bound values) are unbound before cache key computation so different values hit the same cache. ## Directory Structure ``` tinygrad/ ├── tensor.py # Tensor class, user API ├── device.py # Buffer, device management ├── dtype.py # Data types ├── helpers.py # Utilities, environment vars ├── uop/ │ ├── ops.py # UOp class, Ops enum, PatternMatcher │ ├── spec.py # UOp type verification │ └── symbolic.py # Symbolic math simplification ├── engine/ │ ├── schedule.py # Schedule creation, caching │ ├── realize.py # Tensor realization │ ├── jit.py # JIT compilation │ └── memory.py # Memory planning ├── schedule/ │ ├── rangeify.py # Convert movements to ranges │ └── indexing.py # Index calculations ├── codegen/ │ ├── kernel.py # Kernel optimization │ └── uopgraph.py # UOp graph transformations ├── renderer/ # Code generation (CUDA, Metal, etc.) └── runtime/ # Device backends ``` ## Testing ```bash # Run specific test python -m pytest test/unit/test_schedule_cache.py -xvs # Run with timeout python -m pytest test/test_symbolic_ops.py -x --timeout=60 # Debug with print DEBUG=2 python -m pytest test/test_schedule.py::test_name -xvs # Visualize UOp graphs VIZ=1 python -c "from tinygrad import Tensor; Tensor.ones(10).sum().realize()" ``` ## Common Environment Variables - `DEBUG=1-4` - Increasing verbosity - `VIZ=1` - Enable graph visualization - `SPEC=1` - Enable UOp spec verification - `NOOPT=1` - Disable optimizations - `DEVICE=CPU/CUDA/AMD/METAL` - Set default device ## Debugging Tips 1. **Print UOp graphs**: `print(tensor.uop)` or `print(tensor.uop.sink())` 2. **Check schedule**: `tensor.schedule()` returns list of ScheduleItems 3. **Trace graph rewrites**: Use `VIZ=1` or add print in PatternMatcher callbacks 4. **Find UOps by type**: `[u for u in uop.toposort() if u.op is Ops.SOMETHING]` ## Workflow Rules - **NEVER commit without explicit user approval** - always show the diff and wait for approval - Run tests before proposing commits - Test with `SPEC=2` when modifying UOp-related code ## Style Notes - 2-space indentation, 150 char line limit - PatternMatchers should be defined at module level (slow to construct) - Prefer `graph_rewrite` over manual graph traversal - UOp methods like `.replace()` preserve tags unless explicitly changed - Use `.rtag(value)` to add tags to UOps ## Lessons Learned ### UOp ucache Behavior UOps are cached by their contents - creating a UOp with identical (op, dtype, src, arg) returns the **same object**. This means: - `uop.replace(tag=None)` on a tagged UOp returns the original untagged UOp if it exists in cache - Two UOps with same structure are identical (`is` comparison works) ### Spec Validation When adding new UOp patterns, update `tinygrad/uop/spec.py`. Test with: ```bash SPEC=2 python3 test/unit/test_something.py ``` Spec issues appear as `RuntimeError: SPEC ISSUE None: UOp(...)`. ### Schedule Cache Key Normalization The schedule cache strips values from BIND nodes so different bound values (e.g., KV cache positions) hit the same cache entry: - `pm_pre_sched_cache`: BIND(DEFINE_VAR, CONST) → BIND(DEFINE_VAR) for cache key - `pm_post_sched_cache`: restores original BIND from context - When accessing `bind.src[1]`, check `len(bind.src) > 1` first (might be stripped) - Extract var_vals from `input_buffers` dict after graph_rewrite (avoids extra toposort) ### Avoiding Extra Work - Use ctx dict from graph_rewrite to collect info during traversal instead of separate toposort - Only extract var_vals when schedule is non-empty (no kernels = no vars needed) - PatternMatchers are slow to construct - define at module level, not in functions ### Testing LLM Changes ```bash # Quick smoke test echo "Hello" | DEBUG=1 python tinygrad/apps/llm.py --model "llama3.2:1b" # Check cache hits (should see "cache hit" after warmup) echo "Hello world" | DEBUG=1 python tinygrad/apps/llm.py --model "llama3.2:1b" 2>&1 | grep cache # Test with beam search echo "Hello" | BEAM=2 python tinygrad/apps/llm.py --model "llama3.2:1b" ``` ## Common Patterns ### Graph Transformation ```python def my_transform(ctx, x): # Return new UOp or None to skip return x.replace(arg=new_arg) pm = PatternMatcher([ (UPat(Ops.SOMETHING, name="x"), my_transform), ]) result = graph_rewrite(input_uop, pm, ctx={}) ``` ### Finding Variables ```python # Get all variables in a UOp graph variables = uop.variables() # Get bound variable values var, val = bind_uop.unbind() ``` ### Shape Handling ```python # Shapes can be symbolic (contain UOps) shape = tensor.shape # tuple[sint, ...] where sint = int | UOp ```