George Hotz
0e0be99b55
Merge branch 'master' into simpler_postrange
2025-08-28 07:22:39 -07:00
geohotstan
4e8370309c
Support onnx If OP ( #11648 )
...
* start
* tiny clean up
* whoops, didn't mean to accidentally fix this
* fix .to(device), kinda hacky and this fix makes it slower?
* merge properly
* FINALLY figured out slowness, also hack pylint for now
* add DEBUGONNX print for subgraph
* oops
* WOOOOOOOO SHAPE CACHE 50% SPEED INCREASE
* small fix, but maybe all deterministic Tensor creation in fp should be cached
* cache condition
* sliiiightly cleaner
* better abstraction?
* remove sam from model_benchmark
* remove shape cache speed up for now
* less lines
* isinstance fix
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-08-28 10:17:35 -04:00
George Hotz
6d6f0dada7
support for tuple ranges ( #11890 )
...
* support for tuple ranges
* breaks it
2025-08-28 07:02:31 -07:00
nimlgen
60dd9a162c
memory: tiny tlsf cleanup ( #11887 )
2025-08-28 14:07:18 +03:00
chenyu
beb5982165
FUSE_ATTENTION ( #11884 )
2025-08-27 19:59:17 -04:00
George Hotz
cb5295168d
postrange boilerplate work ( #11881 )
2025-08-27 15:22:59 -07:00
George Hotz
fd579433bc
pre expander shouldn't go in gpudims ( #11880 )
2025-08-27 14:52:24 -07:00
nimlgen
44816218b5
memplan: fix large buffers planning ( #11878 )
...
* memplan: fix large buffers planning
* fix
* fix dsp
2025-08-27 23:54:27 +03:00
George Hotz
e9575c81e2
delete
2025-08-27 12:49:58 -07:00
George Hotz
ea1b853a60
delete
2025-08-27 12:49:58 -07:00
nimlgen
4006366752
Revert "memplan: fix large buffers planning ( #11876 )" ( #11877 )
...
This reverts commit 7f90497efc .
2025-08-27 22:36:14 +03:00
nimlgen
7f90497efc
memplan: fix large buffers planning ( #11876 )
...
* memplan: fix large buffers planning
* fix
2025-08-27 22:04:15 +03:00
George Hotz
73f83e6fe6
Merge branch 'master' into simpler_postrange
2025-08-27 11:43:12 -07:00
George Hotz
e4afdf9ea1
improve DEBUG=2 string with TB/s and TFLOPS [pr] ( #11875 )
2025-08-27 11:42:41 -07:00
Jordan Chalupka
e9789d8a70
Add mxfp4 support ( #11873 )
...
* bump ggml url
* map mxfp4 to tensor
* tests
2025-08-27 10:56:56 -07:00
qazal
884eb53e89
tracing: fix types ( #11871 )
...
* tracing: fix types
* /profiler isn't a thing
* return list
2025-08-27 15:50:43 +03:00
George Hotz
99c8c37511
working double tc
2025-08-26 22:32:26 -07:00
George Hotz
195feb1b10
flash attention tc
2025-08-26 18:44:20 -07:00
Sieds Lykles
d39365809a
add ctx to z3_renderer arg ( #11867 )
...
* add ctx to z3_renderer arg
* update symbolic fuzzer
* rewrite u1,u2,u3
* update fuzz_fast_idiv
* remove imports
2025-08-27 03:38:15 +02:00
George Hotz
68d7218f80
double gemm is failing
2025-08-26 17:27:47 -07:00
George Hotz
78e092d59d
reorder
2025-08-26 17:10:06 -07:00
George Hotz
8b067e5dca
Merge branch 'master' into simpler_postrange
2025-08-26 15:58:02 -07:00
George Hotz
24c00a4061
darken hex on viz ( #11865 )
...
* darken hex on viz
* more readable
2025-08-26 15:57:50 -07:00
qazal
f38e4af226
viz: add custom zoom filter ( #11861 )
2025-08-27 01:30:29 +03:00
nimlgen
62df6c39af
amd: correct handling of relocations ( #11863 )
...
* amd: correct handling of relocations
* ops
* add
2025-08-27 01:26:45 +03:00
George Hotz
91ecb1532e
Merge branch 'master' into simpler_postrange
2025-08-26 14:41:38 -07:00
George Hotz
d261458ecd
add colors to range ( #11860 )
2025-08-26 14:32:12 -07:00
Sieds Lykles
7dfc7e4abc
uops_to_z3 helper( #11859 )
2025-08-26 22:58:05 +02:00
George Hotz
c94adb3594
Merge branch 'master' into simpler_postrange
2025-08-26 13:41:24 -07:00
chenyu
1bbb578afd
named expression for POW and MAX gradient ( #11858 )
2025-08-26 16:03:03 -04:00
George Hotz
4836d6bc60
axis colors
2025-08-26 12:56:10 -07:00
chenyu
7028cb4167
clean up TestBitcastConstFolding ( #11856 )
2025-08-26 15:26:47 -04:00
George Hotz
03fb0c9ad0
Merge branch 'master' into simpler_postrange
2025-08-26 12:06:18 -07:00
George Hotz
d4154e0349
split devectorizing of buf/index ( #11855 )
2025-08-26 12:05:48 -07:00
George Hotz
f0f7437385
cleanups
2025-08-26 12:02:14 -07:00
George Hotz
15886fd513
Merge branch 'master' into simpler_postrange
2025-08-26 11:57:18 -07:00
George Hotz
b268755d51
small changes from postopt ( #11854 )
2025-08-26 11:56:16 -07:00
George Hotz
4a8d24f69b
work
2025-08-26 11:30:44 -07:00
George Hotz
0c696e64ef
work
2025-08-26 11:21:44 -07:00
George Hotz
b0698865b8
bufferize
2025-08-26 10:50:06 -07:00
Sieds Lykles
a3aeef45cc
associative variation of where branch-merging ( #11851 )
...
* add rule and test
* change comment
2025-08-26 19:27:05 +02:00
chenyu
aabe7756be
fix type in fold_bitcast [pr] ( #11853 )
2025-08-26 13:22:30 -04:00
Jordan Chalupka
4785cd959a
[TYPED=1] cvar should allow dtype as a tuple ( #11770 )
...
* cvar dtype:DType|tuple[DType, ...]|None=None
* fmt
* add a test
* list typeguard as a dep for CI
* extra step to install mypy
* fix venv
* ci fixes
* mv typeguard to testing install group
* simpler TYPED=1 test
* add typeguard to lint group
2025-08-26 12:49:51 -04:00
qazal
b111076301
viz: fixup click on overlay rect ( #11850 )
2025-08-26 19:25:42 +03:00
b1tg
1dd613cb89
test float_to_bf16 round-to-even behavior ( #11849 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-26 12:16:10 -04:00
b1tg
409399c609
fix nan in float_to_bf16 ( #11843 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-26 11:42:25 -04:00
qazal
43d5d66d34
viz: add UOp ports to edges ( #11847 )
...
* viz: add UOp ports to edges
* one edge label
* g.tag styling
* replace with NodeList
2025-08-26 18:31:52 +03:00
chenyu
f28f613f85
improved float_to_bf16 ( #11848 )
...
round instead of truncate
2025-08-26 11:14:06 -04:00
nimlgen
afe14ccbfa
amd: aql default when several xccs ( #11832 )
2025-08-26 15:16:36 +03:00
qazal
3674c0754e
viz: small uop click changes ( #11846 )
...
* also highlight self
* can always unselect by clicking outside
* less layout
2025-08-26 14:56:13 +03:00