Commit Graph

10417 Commits

Author SHA1 Message Date
nimlgen
874c1db4af am: init support for aql (#11888) 2025-08-28 18:41:46 +03:00
Ben Waldron
17ecaf4682 Add test_variable_empty (#11889)
* Add test_variable_empty

* Move test and add TODO

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 11:38:27 -04:00
Nino Risteski
54be477152 rope cache optim for jit prune in llm.py (#11678)
* rope cache optim for jit prune

* rope test

* tests in test attention

* Revert "rope test"

This reverts commit 69ede543d0.

* lint
2025-08-28 08:31:29 -07:00
quortus
5f8fe9a331 Replace ASSIGN with STORE in test_linearizer (#11821) 2025-08-28 07:33:20 -07:00
geohotstan
4e8370309c Support onnx If OP (#11648)
* start

* tiny clean up

* whoops, didn't mean to accidentally fix this

* fix .to(device), kinda hacky and this fix makes it slower?

* merge properly

* FINALLY figured out slowness, also hack pylint for now

* add DEBUGONNX print for subgraph

* oops

* WOOOOOOOO SHAPE CACHE 50% SPEED INCREASE

* small fix, but maybe all deterministic Tensor creation in fp should be cached

* cache condition

* sliiiightly cleaner

* better abstraction?

* remove sam from model_benchmark

* remove shape cache speed up for now

* less lines

* isinstance fix

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 10:17:35 -04:00
George Hotz
6d6f0dada7 support for tuple ranges (#11890)
* support for tuple ranges

* breaks it
2025-08-28 07:02:31 -07:00
nimlgen
60dd9a162c memory: tiny tlsf cleanup (#11887) 2025-08-28 14:07:18 +03:00
chenyu
beb5982165 FUSE_ATTENTION (#11884) 2025-08-27 19:59:17 -04:00
George Hotz
cb5295168d postrange boilerplate work (#11881) 2025-08-27 15:22:59 -07:00
George Hotz
fd579433bc pre expander shouldn't go in gpudims (#11880) 2025-08-27 14:52:24 -07:00
nimlgen
44816218b5 memplan: fix large buffers planning (#11878)
* memplan: fix large buffers planning

* fix

* fix dsp
2025-08-27 23:54:27 +03:00
nimlgen
4006366752 Revert "memplan: fix large buffers planning (#11876)" (#11877)
This reverts commit 7f90497efc.
2025-08-27 22:36:14 +03:00
nimlgen
7f90497efc memplan: fix large buffers planning (#11876)
* memplan: fix large buffers planning

* fix
2025-08-27 22:04:15 +03:00
George Hotz
e4afdf9ea1 improve DEBUG=2 string with TB/s and TFLOPS [pr] (#11875) 2025-08-27 11:42:41 -07:00
Jordan Chalupka
e9789d8a70 Add mxfp4 support (#11873)
* bump ggml url

* map mxfp4 to tensor

* tests
2025-08-27 10:56:56 -07:00
qazal
884eb53e89 tracing: fix types (#11871)
* tracing: fix types

* /profiler isn't a thing

* return list
2025-08-27 15:50:43 +03:00
Sieds Lykles
d39365809a add ctx to z3_renderer arg (#11867)
* add ctx to z3_renderer arg

* update symbolic fuzzer

* rewrite u1,u2,u3

* update fuzz_fast_idiv

* remove imports
2025-08-27 03:38:15 +02:00
George Hotz
24c00a4061 darken hex on viz (#11865)
* darken hex on viz

* more readable
2025-08-26 15:57:50 -07:00
qazal
f38e4af226 viz: add custom zoom filter (#11861) 2025-08-27 01:30:29 +03:00
nimlgen
62df6c39af amd: correct handling of relocations (#11863)
* amd: correct handling of relocations

* ops

* add
2025-08-27 01:26:45 +03:00
George Hotz
d261458ecd add colors to range (#11860) 2025-08-26 14:32:12 -07:00
Sieds Lykles
7dfc7e4abc uops_to_z3 helper(#11859) 2025-08-26 22:58:05 +02:00
chenyu
1bbb578afd named expression for POW and MAX gradient (#11858) 2025-08-26 16:03:03 -04:00
chenyu
7028cb4167 clean up TestBitcastConstFolding (#11856) 2025-08-26 15:26:47 -04:00
George Hotz
d4154e0349 split devectorizing of buf/index (#11855) 2025-08-26 12:05:48 -07:00
George Hotz
b268755d51 small changes from postopt (#11854) 2025-08-26 11:56:16 -07:00
Sieds Lykles
a3aeef45cc associative variation of where branch-merging (#11851)
* add rule and test

* change comment
2025-08-26 19:27:05 +02:00
chenyu
aabe7756be fix type in fold_bitcast [pr] (#11853) 2025-08-26 13:22:30 -04:00
Jordan Chalupka
4785cd959a [TYPED=1] cvar should allow dtype as a tuple (#11770)
* cvar dtype:DType|tuple[DType, ...]|None=None

* fmt

* add a test

* list typeguard as a dep for CI

* extra step to install mypy

* fix venv

* ci fixes

* mv typeguard to testing install group

* simpler TYPED=1 test

* add typeguard to lint group
2025-08-26 12:49:51 -04:00
qazal
b111076301 viz: fixup click on overlay rect (#11850) 2025-08-26 19:25:42 +03:00
b1tg
1dd613cb89 test float_to_bf16 round-to-even behavior (#11849)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-26 12:16:10 -04:00
b1tg
409399c609 fix nan in float_to_bf16 (#11843)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-26 11:42:25 -04:00
qazal
43d5d66d34 viz: add UOp ports to edges (#11847)
* viz: add UOp ports to edges

* one edge label

* g.tag styling

* replace with NodeList
2025-08-26 18:31:52 +03:00
chenyu
f28f613f85 improved float_to_bf16 (#11848)
round instead of truncate
2025-08-26 11:14:06 -04:00
nimlgen
afe14ccbfa amd: aql default when several xccs (#11832) 2025-08-26 15:16:36 +03:00
qazal
3674c0754e viz: small uop click changes (#11846)
* also highlight self

* can always unselect by clicking outside

* less layout
2025-08-26 14:56:13 +03:00
qazal
f2a3c27372 viz: g.edges() once (#11845) 2025-08-26 13:29:59 +03:00
qazal
b0df3e62a8 viz: light up srcs and paths on UOp click (#11844)
* viz: light up srcs and paths on UOp click

* safari doesn't have context-stroke

* safari also has a bug

* safari acceptance
2025-08-26 09:03:09 +03:00
qazal
6236749867 viz: move rect styles to classes (#11842)
* viz: move rect styles to classes

* add rect
2025-08-26 07:55:34 +03:00
qazal
81ffa07439 viz: pass through nodes without a link (#11841) 2025-08-26 07:00:43 +03:00
Sieds Lykles
265d287615 add decomp for !x&!y -> !(x|y) (#11836) 2025-08-26 05:21:06 +02:00
chenyu
337e979a59 call dtypes.as_const in Tensor(list) (#11840) 2025-08-25 22:08:26 -04:00
George Hotz
215818379b new (post) group for reduce (#11837)
* new (post) group for reduce

* fixes

* leave if

* fix locals

* size

* no vectorized buf

* image fixes

* don't track that

* fix ptx

* name buffer with reduce range

* remove unused in lowerer

* yay DEFINE_REG refactor
2025-08-25 18:03:00 -07:00
chenyu
ac3449b0c8 truncate_fp16 cleanup (#11838)
native `@` is default
2025-08-25 19:03:41 -04:00
qazal
e146418f65 hotfix: profiler content-type is application/octet-stream (#11831) 2025-08-25 15:56:42 +03:00
qazal
a1f6823060 viz: memory layout in client side (#11830)
* viz: memory layout in client side

* update test_viz
2025-08-25 14:49:33 +03:00
George Hotz
a6dbb09058 changes for postrange (#11828) 2025-08-24 17:37:07 -07:00
George Hotz
27701ef823 add locals support to rangeify (#11826) 2025-08-24 14:03:12 -07:00
Sieds Lykles
a286a1a6f7 Fast idiv try removing factors of two before cast (#11824)
* try removing factors of two

* dont return if None

* add test
2025-08-24 20:04:25 +02:00
George Hotz
a03b930339 hotfix: green v2 in docs 2025-08-24 10:25:14 -07:00