George Hotz
44e4934167
fast pattern matcher [pr] ( #9737 )
...
* FastPatternMatcher
* works without that
* fix test pickle
* strict len
* compile match function
* dynamic compile
* fast
* faster
* compile
* track
* a lot faster
* clean up
* dup or
* faster and simpler
* fast match doesn't support store
* plane
* minor refactor
* real speed
* don't imply return None
* upat
* fix test
* heard you wanted more speed
* no generator
* split cf
* early fixup
* fxn fixup
* reconstruct_function
* Revert "reconstruct_function"
This reverts commit 37dac010ab .
* simpler stuff
* too big
* upat compile error
* cleanups
* don't cache that
* cleanups
* 10 -> 15
2025-04-14 15:24:41 +01:00
qazal
e201bc3e93
process replay kernel asts in toposort order [pr] ( #9869 )
...
* process replay kernel asts in toposort order [pr]
* use HEAD replay
2025-04-13 17:20:34 +08:00
Alexey Zaytsev
7dda6aae7d
Skip CLOUD in external_test_example ( #9857 )
...
Closes #9814
2025-04-12 10:17:44 +08:00
George Hotz
dd52951dd0
fix single kernel softmax with cast ( #9842 )
...
* fix single kernel softmax with cast
* tolerate none
* 3e-4
* skip on dtype
2025-04-11 12:12:02 +08:00
chenyu
8c6299bced
move hand_coded_optimizations to heuristic.py [pr] ( #9844 )
...
* move hand_coded_optimizations to heuristic.py [pr]
also folded all long lines
* make a copy and rename self -> k
* fix test
2025-04-10 23:40:16 -04:00
chenyu
e0ec8be37d
use CPU for test_schedule_ring ( #9843 )
...
* use CPU for test_schedule_ring
* why pre-commit is good
2025-04-10 23:20:53 -04:00
qazal
fbc6aa53d4
script for local process_replay + fix viz name [pr] ( #9837 )
2025-04-11 00:39:18 +08:00
qazal
16956b79de
canonicalize Device.DEFAULT ( #9835 )
2025-04-10 23:02:11 +08:00
George Hotz
f666dd14eb
fix get reduce contraction with test ( #9834 )
2025-04-10 22:24:21 +08:00
chenyu
7fa5f29582
add test_embedding to test_softmax_fusion ( #9832 )
2025-04-10 08:25:34 -04:00
George Hotz
53f0b2aad7
fix infinite loop in flash attention ( #9827 )
...
* fix infinite loop in flash attention
* get_contraction_with_reduce
* skip that test
* SINGLE_KERNEL_SOFTMAX + fix multi
* default IGNORE_OOB
* print change
2025-04-10 20:06:44 +08:00
qazal
16afe04f45
move process replay to grouper ( #9830 )
...
* simpler
* sched
2025-04-10 18:27:42 +08:00
chenyu
c8f47c1d07
not_support_multi_device helper ( #9831 )
...
unify the test helper to skip ci device that does not support multi
2025-04-10 05:25:29 -04:00
chenyu
c462162db8
update benchmark bert scripts with BS and ACC_DTYPE ( #9826 )
...
BS=16, ACC_DTYPE=half for tinybox, BS=128, ACC_DTYPE=float for mi300x
2025-04-10 02:06:02 -04:00
qazal
498a2bf738
add err handling tests to viz + cleanups ( #9825 )
...
* cleanup
* add err handling tests to viz + cleanups
* lint
2025-04-10 14:05:05 +08:00
George Hotz
fce432d2e3
Ops.FUSE makes softmax a single kernel ( #9808 )
...
* KERNELIZE makes softmax a single kernel
* single kernel works
* softmax works
* broken
* correct
* skip that test
* kernelize tests
* rename to fuse
* better reduce_push_add_ones code
* correct now
* cleanups
* oops
* return None if we can't push ones
* rename + docs
* atol fixes group
* flash attention broken test
2025-04-09 22:56:28 +08:00
qazal
3bd992dc95
multi stage graph_rewrite_map ( #9803 )
...
* multistage graph_rewrite_map
* s/merge_map/input_map
* build up kernel_map from the tensor_map
2025-04-09 15:59:45 +08:00
George Hotz
78caf55154
Revert "FP8 support on NVIDIA ( #8631 )"
...
This reverts commit 2c8e4ea865 .
2025-04-09 12:27:41 +08:00
George Hotz
d1505137ad
Revert "move TestOpsFp8s skipTest ( #9797 )"
...
This reverts commit a3aaf92b21 .
2025-04-09 12:27:40 +08:00
chenyu
a3aaf92b21
move TestOpsFp8s skipTest ( #9797 )
...
so get_available_devices is not called when running other tests
2025-04-08 22:44:07 -04:00
pkotzbach
2c8e4ea865
FP8 support on NVIDIA ( #8631 )
...
* squashed fp8 commits
* tensorcore start
* minor changes
* pre-commit
* pylint
* Delete fp8mul.cu
* clean
* small bugfix
* fix test_dtype
* fix test_dtype_alu
* add EMULATE_CUDA_SM89
* fix ci
* fix test_linearizer
* fix test_linearizer
* fix swizzle
* add debug to simple_matmul
* fixed swizzle
* python emulator
* refactor python emulator
* setup fix
* numpy setup
* ml_dtypes only in emulate_cuda_sm89
* fix pylint
* fix tests
* fix mypy
* fix mypy
* fix ruff
* done python emulator
* add acc type
* tests
* mypy
* clean code
* add cuda tensor core tests to CI
* minor fix
* clean test_dtype.py
* clean cstyle.py
* clean test_ops.py
* fix test
* fix test
* whitespaces
* pylint
* pylint
* amd?
* amd?
* amd
* reduce lines
* mockgpu remove
* fix
* ruff
* ruff
* fix mypy
* ruff
* test only for cuda
* fixed formatting
* small fixes
* small fix
* least_upper_dtype if fp8s not supported
* log and reciprocal are supported for fp8s
* ops python fixes
* dtypes.fp8s use
* e4m3 + e5m2 result dtype test
* truncate linter fix
---------
Co-authored-by: pkotzbach <pawkotz@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-04-08 21:54:04 -04:00
qazal
f13e9cf2d9
move view_left to grouper.py + tiny reorders [pr] ( #9780 )
...
* move view_left to grouper.py [pr]
* reorder grouper
* test_schedule
2025-04-08 15:39:28 +08:00
chenyu
7a28133b37
failed test for single softmax backward ( #9778 )
...
getting RecursionError with DONT_GROUP_REDUCES=1
2025-04-08 02:36:32 -04:00
George Hotz
fefee5d3ab
single kernel softmax ( #9776 )
...
* real single kernel softmax
* cleanup
* fix blockend insertion
* add to bert test
2025-04-08 12:35:48 +08:00
qazal
9963bb51e0
grouper tests cleanups [pr] ( #9777 )
...
* grouper tests cleanups [pr]
* viz
* tuple
* whitespace
2025-04-08 12:33:11 +08:00
George Hotz
db22094d35
hotfix: update softmax fusion test
2025-04-08 11:23:19 +08:00
Eitan Turok
bb7922b95f
Vectorize Transcendental Regression Tests ( #9753 )
...
* init test
* cleanup
2025-04-08 01:27:39 +08:00
Sieds Lykles
07d1aefaf4
fast idiv ( #9755 )
...
* fast idiv with tests and fuzzer
* Add todo comment
* Add env variable to toggle fast_idiv
* Move env check
* Add fuzz fast_idiv to ci
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-04-07 08:32:24 -04:00
nimlgen
fa888ee077
minor test cleanups ( #9770 )
...
* fix test_graph on max
* pcie5
2025-04-07 15:29:12 +03:00
qazal
891322fd51
split into grouper.py ( #9768 )
...
* split into grouper.py
* update tests
* reorder
2025-04-07 18:40:59 +08:00
qazal
8ddb1357c0
fix UPat.location after pickle ( #9763 )
...
* fix UPat.location after pickle [pr]
* named upat test
2025-04-07 15:16:42 +08:00
chenyu
b190d85ad7
benchmark script bert softmax ( #9759 )
2025-04-07 00:31:18 -04:00
Ignacio Sica
58785181a8
AMD bf16xf32 TC ( #9717 )
...
* dont test bf16 for emulated amd tc
* skip bf16 tc test in ci
* skip bf16 for AMD in test_tensor_cores_codegen
* add simple bf16 gemm test to benchmark
2025-04-07 11:41:04 +08:00
chenyu
43e4565148
weighted linear in external_benchmark_bert_matmuls ( #9757 )
...
include the linear to get qkv, and permute so that stride matches with the real run
2025-04-06 23:35:42 -04:00
chenyu
8a585dc5c1
benchmark script for matmuls in bert ( #9752 )
...
2 main matmuls in the bert layers. getting these to be fast makes bert fast
2025-04-06 19:34:25 +08:00
nimlgen
5f7c79676f
jit: prune independent copies ( #9749 )
...
* jit: prune independent copies
* linter
* check kernel cnt
2025-04-05 20:50:28 +03:00
nimlgen
c2573b247c
jit: rename optimize_weights -> replan_buffers_memory_layout ( #9751 )
2025-04-05 20:35:15 +03:00
chenyu
407ca54382
symbolic fold double where ( #9436 )
...
* symbolic fold double where
a.where(b.where(c, d), d) -> (a & b).where(c, d). a pattern in optimizer
* test case
2025-04-05 05:12:17 -04:00
Sieds Lykles
9c2fc695b5
cond.logical_not().where(a,b) -> cond.where(b,a) ( #9741 )
...
* Add rule for negation in where, simplifies arange patterns
* 0 becomes 0.0 again
* Only if cond is bool
* ne is never None
* Add a test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-04-04 19:13:32 -04:00
chenyu
fe998798fb
linearizer failure test for OUT OF BOUNDS ACCESS ( #9738 )
2025-04-04 03:48:43 -04:00
George Hotz
8b5a523743
fix minimum length in pattern matcher ( #9736 )
2025-04-04 14:57:01 +08:00
George Hotz
926b0bcc57
cache folded upcast [pr] ( #9733 )
2025-04-04 11:23:19 +08:00
George Hotz
cac8bcf8b5
use Ops.REDUCE ( #9721 )
...
* decrease bert python time [pr]
* order copies
* Revert "order copies"
This reverts commit 3f62c8693b .
* rewrite count
* Ops.REDUCE
* acc first in the add chain
* Fix tensor core acc
* arange patterns look good
* fix multireduce gate
* reduce rewrite rule
* bump that to 15 minutes
* multiwmma isn't fusing
* gep through wmma is gep pushing
* bump that timeout too, it's all env setup
* add failing test
2025-04-04 10:14:34 +08:00
nimlgen
949459fdd6
jit: fix deallocate on unallocated buffers in free_intermediates ( #9699 )
2025-04-03 18:32:51 +03:00
geohotstan
ac713e04db
ONNX add output shape validation ( #9720 )
...
* add output shape validation and remove support for sequence_type
* nit better err msg
* add sequence_type back
* improve err msg
* Revert "improve err msg"
This reverts commit dc9eaea4bb .
* Revert "add sequence_type back"
This reverts commit 288170b2d9 .
* do explicit shape equality
* small nit
2025-04-03 05:44:53 -04:00
chenyu
79145e3d40
cleanup truncate_bf16 [pr] ( #9725 )
...
use torch bfloat16 for groundtruth in test. also a TODO for discrepancy
2025-04-03 05:43:49 -04:00
Ignacio Sica
bc2d86195e
increase test tolerance ( #9719 )
2025-04-03 15:24:09 +08:00
George Hotz
49dafe6d43
add gc tests [pr] ( #9718 )
...
* add gc tests [pr]
* del
* more gc tests
* add NullGraph
2025-04-03 14:08:32 +08:00
Ignacio Sica
bc91fffc5d
fix gated store with index in python backend ( #9703 )
...
* add default gate in index
* assert store
* add TestRendererFailures
- move test_gated_store_with_alu to new TestRenderFailures class for
tests that fail on multiple renderers
- add test_renderer_failures.py run on python CI
* add test for gated index in 2d
* test TestRenderFailures
2025-04-03 12:48:28 +08:00
geohotstan
e1d7e47cca
fix ONNX IsInf unintended dtype promotion ( #9711 )
...
* add IsInf
* add corresponding test
* that float16 is kinda silly
2025-04-02 22:46:15 -04:00