Commit Graph

9999 Commits

Author SHA1 Message Date
George Hotz
0e0be99b55 Merge branch 'master' into simpler_postrange 2025-08-28 07:22:39 -07:00
geohotstan
4e8370309c Support onnx If OP (#11648)
* start

* tiny clean up

* whoops, didn't mean to accidentally fix this

* fix .to(device), kinda hacky and this fix makes it slower?

* merge properly

* FINALLY figured out slowness, also hack pylint for now

* add DEBUGONNX print for subgraph

* oops

* WOOOOOOOO SHAPE CACHE 50% SPEED INCREASE

* small fix, but maybe all deterministic Tensor creation in fp should be cached

* cache condition

* sliiiightly cleaner

* better abstraction?

* remove sam from model_benchmark

* remove shape cache speed up for now

* less lines

* isinstance fix

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 10:17:35 -04:00
George Hotz
6d6f0dada7 support for tuple ranges (#11890)
* support for tuple ranges

* breaks it
2025-08-28 07:02:31 -07:00
nimlgen
60dd9a162c memory: tiny tlsf cleanup (#11887) 2025-08-28 14:07:18 +03:00
chenyu
beb5982165 FUSE_ATTENTION (#11884) 2025-08-27 19:59:17 -04:00
George Hotz
cb5295168d postrange boilerplate work (#11881) 2025-08-27 15:22:59 -07:00
George Hotz
fd579433bc pre expander shouldn't go in gpudims (#11880) 2025-08-27 14:52:24 -07:00
nimlgen
44816218b5 memplan: fix large buffers planning (#11878)
* memplan: fix large buffers planning

* fix

* fix dsp
2025-08-27 23:54:27 +03:00
George Hotz
e9575c81e2 delete 2025-08-27 12:49:58 -07:00
George Hotz
ea1b853a60 delete 2025-08-27 12:49:58 -07:00
nimlgen
4006366752 Revert "memplan: fix large buffers planning (#11876)" (#11877)
This reverts commit 7f90497efc.
2025-08-27 22:36:14 +03:00
nimlgen
7f90497efc memplan: fix large buffers planning (#11876)
* memplan: fix large buffers planning

* fix
2025-08-27 22:04:15 +03:00
George Hotz
73f83e6fe6 Merge branch 'master' into simpler_postrange 2025-08-27 11:43:12 -07:00
George Hotz
e4afdf9ea1 improve DEBUG=2 string with TB/s and TFLOPS [pr] (#11875) 2025-08-27 11:42:41 -07:00
Jordan Chalupka
e9789d8a70 Add mxfp4 support (#11873)
* bump ggml url

* map mxfp4 to tensor

* tests
2025-08-27 10:56:56 -07:00
qazal
884eb53e89 tracing: fix types (#11871)
* tracing: fix types

* /profiler isn't a thing

* return list
2025-08-27 15:50:43 +03:00
George Hotz
99c8c37511 working double tc 2025-08-26 22:32:26 -07:00
George Hotz
195feb1b10 flash attention tc 2025-08-26 18:44:20 -07:00
Sieds Lykles
d39365809a add ctx to z3_renderer arg (#11867)
* add ctx to z3_renderer arg

* update symbolic fuzzer

* rewrite u1,u2,u3

* update fuzz_fast_idiv

* remove imports
2025-08-27 03:38:15 +02:00
George Hotz
68d7218f80 double gemm is failing 2025-08-26 17:27:47 -07:00
George Hotz
78e092d59d reorder 2025-08-26 17:10:06 -07:00
George Hotz
8b067e5dca Merge branch 'master' into simpler_postrange 2025-08-26 15:58:02 -07:00
George Hotz
24c00a4061 darken hex on viz (#11865)
* darken hex on viz

* more readable
2025-08-26 15:57:50 -07:00
qazal
f38e4af226 viz: add custom zoom filter (#11861) 2025-08-27 01:30:29 +03:00
nimlgen
62df6c39af amd: correct handling of relocations (#11863)
* amd: correct handling of relocations

* ops

* add
2025-08-27 01:26:45 +03:00
George Hotz
91ecb1532e Merge branch 'master' into simpler_postrange 2025-08-26 14:41:38 -07:00
George Hotz
d261458ecd add colors to range (#11860) 2025-08-26 14:32:12 -07:00
Sieds Lykles
7dfc7e4abc uops_to_z3 helper(#11859) 2025-08-26 22:58:05 +02:00
George Hotz
c94adb3594 Merge branch 'master' into simpler_postrange 2025-08-26 13:41:24 -07:00
chenyu
1bbb578afd named expression for POW and MAX gradient (#11858) 2025-08-26 16:03:03 -04:00
George Hotz
4836d6bc60 axis colors 2025-08-26 12:56:10 -07:00
chenyu
7028cb4167 clean up TestBitcastConstFolding (#11856) 2025-08-26 15:26:47 -04:00
George Hotz
03fb0c9ad0 Merge branch 'master' into simpler_postrange 2025-08-26 12:06:18 -07:00
George Hotz
d4154e0349 split devectorizing of buf/index (#11855) 2025-08-26 12:05:48 -07:00
George Hotz
f0f7437385 cleanups 2025-08-26 12:02:14 -07:00
George Hotz
15886fd513 Merge branch 'master' into simpler_postrange 2025-08-26 11:57:18 -07:00
George Hotz
b268755d51 small changes from postopt (#11854) 2025-08-26 11:56:16 -07:00
George Hotz
4a8d24f69b work 2025-08-26 11:30:44 -07:00
George Hotz
0c696e64ef work 2025-08-26 11:21:44 -07:00
George Hotz
b0698865b8 bufferize 2025-08-26 10:50:06 -07:00
Sieds Lykles
a3aeef45cc associative variation of where branch-merging (#11851)
* add rule and test

* change comment
2025-08-26 19:27:05 +02:00
chenyu
aabe7756be fix type in fold_bitcast [pr] (#11853) 2025-08-26 13:22:30 -04:00
Jordan Chalupka
4785cd959a [TYPED=1] cvar should allow dtype as a tuple (#11770)
* cvar dtype:DType|tuple[DType, ...]|None=None

* fmt

* add a test

* list typeguard as a dep for CI

* extra step to install mypy

* fix venv

* ci fixes

* mv typeguard to testing install group

* simpler TYPED=1 test

* add typeguard to lint group
2025-08-26 12:49:51 -04:00
qazal
b111076301 viz: fixup click on overlay rect (#11850) 2025-08-26 19:25:42 +03:00
b1tg
1dd613cb89 test float_to_bf16 round-to-even behavior (#11849)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-26 12:16:10 -04:00
b1tg
409399c609 fix nan in float_to_bf16 (#11843)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-26 11:42:25 -04:00
qazal
43d5d66d34 viz: add UOp ports to edges (#11847)
* viz: add UOp ports to edges

* one edge label

* g.tag styling

* replace with NodeList
2025-08-26 18:31:52 +03:00
chenyu
f28f613f85 improved float_to_bf16 (#11848)
round instead of truncate
2025-08-26 11:14:06 -04:00
nimlgen
afe14ccbfa amd: aql default when several xccs (#11832) 2025-08-26 15:16:36 +03:00
qazal
3674c0754e viz: small uop click changes (#11846)
* also highlight self

* can always unselect by clicking outside

* less layout
2025-08-26 14:56:13 +03:00