Commit Graph

794 Commits

Author SHA1 Message Date
George Hotz
ee4f696086 delete more tests (#12043)
* delete more tests

* delete and simplify

* flaky on windows

* a few more, those remained
2025-09-05 15:31:30 -07:00
George Hotz
12c7b1bb01 cleanup lin tests without Kernel (#12041)
* cleanup lin tests without Kernel

* no kernel.py there

* remove that test
2025-09-05 15:13:14 -07:00
Sieds Lykles
f5404ca53c Divmod combine - associative variations (#12017)
* add rule and test

* more rules and tests

* add all four variations

* fix test

* test fixed!

* adjust commment

* add new variations

* disable intel tensor core ops count test for bigger_matmul_half
2025-09-05 03:44:02 +02:00
chenyu
677220ae7e test_tesnor_data to unit/ (#12013) 2025-09-04 19:58:27 -04:00
qazal
da61b40604 some viz tests don't need track_rewrites (#12010) 2025-09-04 23:59:32 +03:00
qazal
be364a1adb viz: add default tracing group (#12009)
This enables seeing rewrites in unit tests like `VIZ=1 python3 test/test_uop_graph.py TestUOpGraph.test_in_bounds_access_gated_local` that call graph_rewrite directly.

`@track_rewrites` keeps existing as an optional helper to organize larger traces.
2025-09-04 23:29:56 +03:00
chenyu
dc8501af30 clean up wino tests (#12008)
removed the one that tests hcopt and added one for backward kernel counts
2025-09-04 16:14:55 -04:00
qazal
4996bb668b load all traces before asserting in test_viz (#12004) 2025-09-04 21:34:48 +03:00
George Hotz
09106e4aae refactor and split test_linearizer (#12001)
* refactor and split test_linearizer

* forget that file

* imports

* remove from docs

* test gen float4
2025-09-04 10:53:07 -07:00
Sieds Lykles
572a3c15c6 Move Ops.SPECIAL arg to src (#11918)
* initial moving bound to src

* arg to src

* remove import

* fixup linearizer

* arg to src

* fix test_uop_graph

* fix more tests

* fix python renderer

* get const value from const uop

* ssimplify uop estimates

* fix webgpu locals

* fix old test

* gate Ops.SPECIAL in linearizer

* use ssimplify() for local/global_size

* remove toposort gate_parents_instead_of_self

* fix rendering in comment

* cleanup

* rename and add comments

* add BottomUpGate with test
2025-09-04 09:31:44 +02:00
George Hotz
5cf42dc4db add Scheduler to replace Kernel with POSTOPT=2 (#11924)
* ** simple kernel to replace Kernel for postopt

* support old

* fix beam

* beaming

* beam on old

* bring tensor cores back

* raise

* postbeam

* test ops passes on mac

* skip that

* postopt default

* gate that

* fix tensor cores

* a few test fixes

* dsp fix

* tc fix

* loop

* support swap

* test_gemv

* fix beam for variable

* test opts from high level stuff

* range annoying

* compile slow

* metal slow

* better beam

* no POSTBEAM

* fix nolocals

* hc opt mostly works

* put that back

* lil

* some work

* fix that

* POSTOPT 2

* fix tests

* no postopt 2

* work

* back

* padded tensors cores

* shift_to

* postopt 0 passes?

* write PADTO

* fix padded tensor cores

* compare hcopt

* 18000 lines

* should pass tests

* fix rangeify

* put types back
2025-09-03 19:23:30 -07:00
chenyu
b13e071463 move test_winograd to unit test (#11993) 2025-09-03 21:47:32 -04:00
Jordan Chalupka
68e83b850f nbytes should raise an exception when size is unlimited (#11928)
* nbytes should raise an exception when size is unlimited

* adding a test
2025-09-03 07:06:20 -07:00
Sieds Lykles
86e908db57 cast parents of int64 alu to int32 if possible (#11977)
* add overflows helper

* add rules

* x -> y

* check overflow of u too

* cleaner

* use alu instead of replace to preserve vectorization

* just one rule

* add test
2025-09-03 11:05:04 +02:00
Sieds Lykles
033184b3cb parse_valid with non const rhs (#11957)
* const to using vmin/vmax

* add test

* convert to int

* remove left over part of and
2025-09-03 08:08:46 +02:00
Sieds Lykles
53eff8970a add Ops.GEP to _min_max (#11976) 2025-09-03 07:07:54 +02:00
chenyu
69dd1817d0 raise RuntimeError in merge_dicts instead of assert [pr] (#11965) 2025-09-02 17:18:44 -04:00
qazal
f750c15965 viz: add python marker (#11952)
* viz: add python marker

* remove duplicate
2025-09-02 23:44:00 +03:00
qazal
0a53e72f70 viz: fix trace duration in python test decoder (#11949) 2025-09-01 14:32:25 +03:00
qazal
27c9ed5a84 viz: more consistent naming of events (#11948)
* s/shapes/events in test_viz

* s/bufs/events in the memory packer
2025-09-01 14:16:47 +03:00
Sieds Lykles
a19d689481 fix vec dtype _min_max (#11944) 2025-09-01 03:24:07 +02:00
Sieds Lykles
d3252ccd85 fix special vmax when arg is UOp (#11930) 2025-08-31 06:54:39 +02:00
qazal
c27b99d68f viz: refactor to indexed rewrite traces (#11923) 2025-08-30 20:01:47 +03:00
qazal
bf0d055b39 viz: color by name (#11919) 2025-08-30 16:04:58 +03:00
Sieds Lykles
0bc34c000f simplify range mod its own upper bound (#11917)
* add rules

* add tests
2025-08-30 08:37:35 +02:00
George Hotz
afad7d0cd1 remove dtype from range, it will be dtypes.index soon [pr] (#11914)
* remove dtype from range, it will be dtypes.index soon [pr]

* a few more
2025-08-29 09:52:07 -07:00
Ben Waldron
ea1be2e4cd [bounty] Remove using reshape to register symbolic shape (#11771)
* Modify tests and start work towards removing symbolic reshape

* Refactor symbolic reshape

* fix small error

* much cleaner + fix more tests

* Can remove this now

* Update test_symbolic_ops and test_tiny

* Couple more tests

* Unused import

* More tests and add EXPAND to Tensor.empty

* Fix test beam search

* all int

* Fix rangeify by adding shrink

* Remove OOB check and so fix test_symbolic_jit

* test_symbolic_jit doesn't need OOB Context anymore either

* Should remove that test now

* Cleanups part 1

* fix linters

* Final cleanups

* Don't reassign inside for loop

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 12:30:49 -04:00
Nino Risteski
54be477152 rope cache optim for jit prune in llm.py (#11678)
* rope cache optim for jit prune

* rope test

* tests in test attention

* Revert "rope test"

This reverts commit 69ede543d0.

* lint
2025-08-28 08:31:29 -07:00
George Hotz
6d6f0dada7 support for tuple ranges (#11890)
* support for tuple ranges

* breaks it
2025-08-28 07:02:31 -07:00
Jordan Chalupka
e9789d8a70 Add mxfp4 support (#11873)
* bump ggml url

* map mxfp4 to tensor

* tests
2025-08-27 10:56:56 -07:00
Sieds Lykles
d39365809a add ctx to z3_renderer arg (#11867)
* add ctx to z3_renderer arg

* update symbolic fuzzer

* rewrite u1,u2,u3

* update fuzz_fast_idiv

* remove imports
2025-08-27 03:38:15 +02:00
Sieds Lykles
a3aeef45cc associative variation of where branch-merging (#11851)
* add rule and test

* change comment
2025-08-26 19:27:05 +02:00
b1tg
1dd613cb89 test float_to_bf16 round-to-even behavior (#11849)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-26 12:16:10 -04:00
b1tg
409399c609 fix nan in float_to_bf16 (#11843)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-26 11:42:25 -04:00
chenyu
f28f613f85 improved float_to_bf16 (#11848)
round instead of truncate
2025-08-26 11:14:06 -04:00
chenyu
ac3449b0c8 truncate_fp16 cleanup (#11838)
native `@` is default
2025-08-25 19:03:41 -04:00
qazal
a1f6823060 viz: memory layout in client side (#11830)
* viz: memory layout in client side

* update test_viz
2025-08-25 14:49:33 +03:00
George Hotz
6540bb32a6 move into codegen late [pr] (#11823) 2025-08-24 10:23:25 -07:00
Sieds Lykles
dd69114573 Revert "Better div nesting (#11811)" (#11818)
This reverts commit 952f729b07.
2025-08-24 18:11:24 +02:00
Sieds Lykles
952f729b07 Better div nesting (#11811)
* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line
2025-08-24 04:17:40 +02:00
Sieds Lykles
e652062f92 tweak divmod_folding condition (#11810) 2025-08-24 02:59:02 +02:00
Sieds Lykles
07d4ed7e4c one more symbolic add variation (#11807) 2025-08-24 01:15:04 +02:00
qazal
0d86288bd7 viz: calculate timeline fixed points in client side (#11805)
* viz: calculate timeline fixed points in client side

* 26 bytes / event

* math
2025-08-24 01:44:40 +03:00
qazal
2407fecdae viz bytepack format (#11792)
* viz bytepack format

Training a 1B llama yields ~20M profiler events.

With JSON serialization, the browser tries to load 6GB to memory. This OOMs since each tab is limited to <3-4GB memory usage. Using a packed format, we only need ~600MB.

**Design decisions:**

- Timestamps are in microseconds relative to start time. They're stored in u32, which can express up to ~1 hr of trace events.
- Strings (kernel names, metadata, etc) are deduped.
- Buffer sizes are in u64 nbytes.

More optimization possible:

- The string lookup is a JSON dumped array, we can compress this.
- Can store less for memory by moving the layout to client.

**Results**

|  | Events | JSON | bytepack |
|----------------|---------|-------------|-------------|
| DP=8 llama 1B train (`command: [1]`) | 24M | 5.8GB | 640MB |
| examples/beautiful_mnist.py | 16K | 3.7MB | 745KB |
| examples/gpt2.py | 55K | 12.54MB | 1.40MB |

`[1]`: `VIZ=1 FAKEDATA=1 OFFLOAD_OPTIM=1 DP=8 BS=8 GRADIENT_ACC_STEPS=2 BLOCK_REORDER=0 LR=3e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=8192 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py`

* python reference decoder

* 27 bytes / event, 1hr hard limit
2025-08-23 23:50:21 +03:00
qazal
b12d1d866c count bytes per kernel in test_viz (#11801)
Currently at ~100 bytes/kernel with JSON.
2025-08-23 23:35:27 +03:00
Sieds Lykles
6a50ab6b87 adjust idiv min_max (#11802)
* change div min_max

* add tests
2025-08-23 22:25:51 +02:00
George Hotz
aefabaf774 add AxisType to range (#11798)
* add AxisType to range

* missed them

* fix that test

* fix that test
2025-08-23 11:15:00 -07:00
qazal
b975830424 add profile loader helper in test_viz (#11797) 2025-08-23 19:20:29 +03:00
qazal
9ff03680ba viz: store relative timestamps (#11787)
* viz: store relative timestamps

* err

* update test
2025-08-22 19:30:21 +03:00
qazal
2e0eb88549 viz: add metadata to UOp tracing (#11772)
* viz: add metadata to UOp tracing

* place after tag

* optional field

* err, refcount of root must be 0
2025-08-22 00:18:45 +03:00