chenyu
6a40216724
correct bf16 fuzz input in test_dtype_alu ( #11933 )
...
it was using float16 inputs, now it's uint16 then convert to bf16
2025-09-01 10:52:26 -04:00
chenyu
965ea59b16
test_dtype_alu use AMD_LLVM from helpers ( #11950 )
2025-09-01 10:03:17 -04:00
b1tg
a9f07c31bc
fix amd llvm sqrt ( #11936 )
...
* fix amd llvm sqrt
* lint
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-09-01 09:31:14 -04:00
qazal
0a53e72f70
viz: fix trace duration in python test decoder ( #11949 )
2025-09-01 14:32:25 +03:00
qazal
27c9ed5a84
viz: more consistent naming of events ( #11948 )
...
* s/shapes/events in test_viz
* s/bufs/events in the memory packer
2025-09-01 14:16:47 +03:00
Sieds Lykles
d9560a631c
remove cast between ints if safe ( #11946 )
2025-09-01 05:56:49 +02:00
Sieds Lykles
a19d689481
fix vec dtype _min_max ( #11944 )
2025-09-01 03:24:07 +02:00
Sieds Lykles
f32f3464d6
Can safe cast from certain ints to floats ( #11941 )
...
* add rule
* add some tests
* prevent infinite loop with bfloat16
* add some ints to double and float can_safe_cast
* add tests
2025-09-01 00:51:24 +02:00
Sieds Lykles
1c6e43c203
Double cast is one cast if intermediate cast is safe ( #11939 )
...
* add rule
* add some tests
* prevent infinite loop with bfloat16
* prevent more infinite rewrite
2025-09-01 00:36:29 +02:00
b1tg
c1eeb3b99c
only skip AMD_LLVM ( #11934 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-31 18:15:47 +03:00
b1tg
75d380a77c
fix transcendentals in python renderer ( #11932 )
...
* fix transcendentals in python renderer
* add test
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-31 09:37:17 -04:00
Sieds Lykles
d3252ccd85
fix special vmax when arg is UOp ( #11930 )
2025-08-31 06:54:39 +02:00
chenyu
af89be317e
relax rtol for bfloat16 test_dtype_alu ( #11926 )
2025-08-30 17:16:08 -04:00
qazal
c27b99d68f
viz: refactor to indexed rewrite traces ( #11923 )
2025-08-30 20:01:47 +03:00
qazal
bf0d055b39
viz: color by name ( #11919 )
2025-08-30 16:04:58 +03:00
Sieds Lykles
0bc34c000f
simplify range mod its own upper bound ( #11917 )
...
* add rules
* add tests
2025-08-30 08:37:35 +02:00
chenyu
561318fea7
Tensor.cos in test_stype_alu ( #11916 )
...
* Tensor.cos in test_stype_alu
* need this fix anyway
2025-08-29 20:26:36 -04:00
nimlgen
c6e342cdac
mockgpu: no hang if gpuocelot failed ( #11915 )
2025-08-30 00:44:49 +03:00
chenyu
26d03a86a1
test_symbolic_ops.py cleanup ( #11895 )
2025-08-29 17:11:59 -04:00
b1tg
b2cc06218a
python bfloat16 ( #11912 )
...
* python bf16
* _to_torch_storage_type
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-29 15:18:02 -04:00
George Hotz
afad7d0cd1
remove dtype from range, it will be dtypes.index soon [pr] ( #11914 )
...
* remove dtype from range, it will be dtypes.index soon [pr]
* a few more
2025-08-29 09:52:07 -07:00
George Hotz
394c2d1db1
update Kernel API in tests + move optimize_local_size ( #11907 )
2025-08-28 15:12:47 -07:00
nimlgen
fa695ac1ce
ci: mac gpuocelot ( #11906 )
...
* gm
* fix?
* ops
* imp
* xx
* add file
2025-08-28 23:29:43 +03:00
George Hotz
b9b438c516
small updates from postopt ( #11903 )
...
* tests from postopt
* modernize
* skip lin tests
* that's fixed?
* skip, not failure
2025-08-28 12:34:52 -07:00
Ben Waldron
ea1be2e4cd
[bounty] Remove using reshape to register symbolic shape ( #11771 )
...
* Modify tests and start work towards removing symbolic reshape
* Refactor symbolic reshape
* fix small error
* much cleaner + fix more tests
* Can remove this now
* Update test_symbolic_ops and test_tiny
* Couple more tests
* Unused import
* More tests and add EXPAND to Tensor.empty
* Fix test beam search
* all int
* Fix rangeify by adding shrink
* Remove OOB check and so fix test_symbolic_jit
* test_symbolic_jit doesn't need OOB Context anymore either
* Should remove that test now
* Cleanups part 1
* fix linters
* Final cleanups
* Don't reassign inside for loop
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-08-28 12:30:49 -04:00
Ben Waldron
17ecaf4682
Add test_variable_empty ( #11889 )
...
* Add test_variable_empty
* Move test and add TODO
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-08-28 11:38:27 -04:00
Nino Risteski
54be477152
rope cache optim for jit prune in llm.py ( #11678 )
...
* rope cache optim for jit prune
* rope test
* tests in test attention
* Revert "rope test"
This reverts commit 69ede543d0 .
* lint
2025-08-28 08:31:29 -07:00
quortus
5f8fe9a331
Replace ASSIGN with STORE in test_linearizer ( #11821 )
2025-08-28 07:33:20 -07:00
geohotstan
4e8370309c
Support onnx If OP ( #11648 )
...
* start
* tiny clean up
* whoops, didn't mean to accidentally fix this
* fix .to(device), kinda hacky and this fix makes it slower?
* merge properly
* FINALLY figured out slowness, also hack pylint for now
* add DEBUGONNX print for subgraph
* oops
* WOOOOOOOO SHAPE CACHE 50% SPEED INCREASE
* small fix, but maybe all deterministic Tensor creation in fp should be cached
* cache condition
* sliiiightly cleaner
* better abstraction?
* remove sam from model_benchmark
* remove shape cache speed up for now
* less lines
* isinstance fix
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-08-28 10:17:35 -04:00
George Hotz
6d6f0dada7
support for tuple ranges ( #11890 )
...
* support for tuple ranges
* breaks it
2025-08-28 07:02:31 -07:00
chenyu
beb5982165
FUSE_ATTENTION ( #11884 )
2025-08-27 19:59:17 -04:00
nimlgen
44816218b5
memplan: fix large buffers planning ( #11878 )
...
* memplan: fix large buffers planning
* fix
* fix dsp
2025-08-27 23:54:27 +03:00
nimlgen
4006366752
Revert "memplan: fix large buffers planning ( #11876 )" ( #11877 )
...
This reverts commit 7f90497efc .
2025-08-27 22:36:14 +03:00
nimlgen
7f90497efc
memplan: fix large buffers planning ( #11876 )
...
* memplan: fix large buffers planning
* fix
2025-08-27 22:04:15 +03:00
Jordan Chalupka
e9789d8a70
Add mxfp4 support ( #11873 )
...
* bump ggml url
* map mxfp4 to tensor
* tests
2025-08-27 10:56:56 -07:00
Sieds Lykles
d39365809a
add ctx to z3_renderer arg ( #11867 )
...
* add ctx to z3_renderer arg
* update symbolic fuzzer
* rewrite u1,u2,u3
* update fuzz_fast_idiv
* remove imports
2025-08-27 03:38:15 +02:00
chenyu
7028cb4167
clean up TestBitcastConstFolding ( #11856 )
2025-08-26 15:26:47 -04:00
George Hotz
b268755d51
small changes from postopt ( #11854 )
2025-08-26 11:56:16 -07:00
Sieds Lykles
a3aeef45cc
associative variation of where branch-merging ( #11851 )
...
* add rule and test
* change comment
2025-08-26 19:27:05 +02:00
b1tg
1dd613cb89
test float_to_bf16 round-to-even behavior ( #11849 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-26 12:16:10 -04:00
b1tg
409399c609
fix nan in float_to_bf16 ( #11843 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-26 11:42:25 -04:00
chenyu
f28f613f85
improved float_to_bf16 ( #11848 )
...
round instead of truncate
2025-08-26 11:14:06 -04:00
chenyu
337e979a59
call dtypes.as_const in Tensor(list) ( #11840 )
2025-08-25 22:08:26 -04:00
chenyu
ac3449b0c8
truncate_fp16 cleanup ( #11838 )
...
native `@` is default
2025-08-25 19:03:41 -04:00
qazal
a1f6823060
viz: memory layout in client side ( #11830 )
...
* viz: memory layout in client side
* update test_viz
2025-08-25 14:49:33 +03:00
Sieds Lykles
a286a1a6f7
Fast idiv try removing factors of two before cast ( #11824 )
...
* try removing factors of two
* dont return if None
* add test
2025-08-24 20:04:25 +02:00
George Hotz
6540bb32a6
move into codegen late [pr] ( #11823 )
2025-08-24 10:23:25 -07:00
Sieds Lykles
dd69114573
Revert "Better div nesting ( #11811 )" ( #11818 )
...
This reverts commit 952f729b07 .
2025-08-24 18:11:24 +02:00
Sieds Lykles
952f729b07
Better div nesting ( #11811 )
...
* remove check
* use fold_divmod_congruence instead of simplify
* adjust tests
* shorten line
2025-08-24 04:17:40 +02:00
Sieds Lykles
e652062f92
tweak divmod_folding condition ( #11810 )
2025-08-24 02:59:02 +02:00