qazal
5171b098e5
merge_double_reduce without asserts [pr] ( #9650 )
2025-03-31 19:17:05 +08:00
Ignacio Sica
1444069c09
Uppercase K for dimension and lowercase k for kernel in linearizer tc helper test ( #9649 )
2025-03-31 19:05:36 +08:00
Ignacio Sica
baa67fd124
Uppercase N and M (standalone syntax change) ( #9647 )
2025-03-31 18:45:30 +08:00
chenyu
aca0f1befb
print idx when OUT OF BOUNDS ACCESS ( #9646 )
...
in some cases (if there's a where in idx) the vmin/vmax might not be tight
2025-03-31 06:12:44 -04:00
Priyank Patel
e2d9322d21
torch backend: partial fix for strided related test fails ( #9642 )
...
* partial fix for strided related test fails
* cleanup
* fix lint
2025-03-31 05:45:18 -04:00
qazal
76c1b1edf6
viz kernel list cleanup ( #9643 )
2025-03-31 15:53:39 +08:00
George Hotz
e4c545b396
linearizer fix from dsp branch ( #9641 )
...
* linearizer fix from dsp branch
* revert that
2025-03-31 14:26:39 +08:00
George Hotz
ec405b919f
Revert "Revert "do not block gc in UOp.toposort ( #9623 )" ( #9624 )" ( #9639 )
...
This reverts commit 7ef02d0e1c .
2025-03-31 14:03:38 +08:00
George Hotz
49b1c46d16
good changes from the dsp branch ( #9638 )
2025-03-31 13:02:53 +08:00
qazal
9d67d3a2f3
simpler viz codeblocks ( #9636 )
...
* simpler viz codeblocks
* err
2025-03-31 11:48:35 +08:00
chenyu
60eb0c4ed7
exclude slow tests on PYTHON ( #9634 )
2025-03-30 22:55:05 -04:00
chenyu
5012ba3f04
cumalu touchup [pr] ( #9632 )
2025-03-30 22:43:11 -04:00
chenyu
d8d7ac1bb1
fix bert free_intermediates ( #9633 )
...
fix when only run eval `TRAIN=0 BERT_SIZE=tiny examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_green/dev_beam.sh`
2025-03-30 22:42:52 -04:00
qazal
ff984c807d
hotfix: less lines for viz helpers ( #9631 )
2025-03-31 10:10:34 +08:00
qazal
c206a7ae6d
refactor viz state updates ( #9630 )
...
* refactor viz state updates
* onclick
2025-03-31 09:54:54 +08:00
Yvon Manzi
6652003839
Add cumprod to Tensor ( #9629 )
...
* probably how cumprod should look like
* update _cumalu to work with MUL
* shorter
* cumprod testing
* clean
* more cleanup
* add cumprod to torch backend.
* make it look like cumsum
* mypy fix
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-30 21:49:18 -04:00
geohotstan
d52e91db7b
ONNX ops clean ups ( #9622 )
...
* combine work from remove numpy and onnx ops tests
* clippy
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-30 21:39:22 -04:00
uuuvn
962c0f65f8
Fix generate_am ( #9626 )
...
This should be a comment
2025-03-31 01:15:44 +08:00
uuuvn
2a4247b8c2
RDNA 3.5 support ( #9627 )
2025-03-31 01:15:20 +08:00
geohotstan
a08b07b4da
Bump onnx==1.17.0 ( #9618 )
...
* bump
* remove resize tf_crop_and_resize
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-30 03:21:51 -04:00
qazal
7ef02d0e1c
Revert "do not block gc in UOp.toposort ( #9623 )" ( #9624 )
...
This reverts commit f1a35bbb54 .
2025-03-30 12:57:06 +08:00
qazal
f1a35bbb54
do not block gc in UOp.toposort ( #9623 )
2025-03-30 11:54:17 +08:00
nimlgen
54e1e59b44
am: rdna 4 support ( #9621 )
...
* hm
* fix
* return this
* fine
* g
* ruff
* fix
2025-03-29 23:16:27 +07:00
nimlgen
118bd1cbed
hotfix: amd imports ( #9620 )
2025-03-29 20:19:53 +07:00
uuuvn
5908b89f71
MI300X support (WIP) ( #9585 )
2025-03-29 19:46:42 +08:00
George Hotz
77f0d09ecf
hotfix: HIP supports parallel BEAM search
2025-03-29 11:49:53 +08:00
chenyu
162f286a0e
add a few Tensor method to doc ( #9614 )
...
* add a few Tensor method to doc
* clone
2025-03-28 13:47:16 -04:00
uuuvn
dd9aae02c3
Refactor ops_amd.py (MI300X prereq) ( #9428 )
2025-03-29 00:17:20 +07:00
uuuvn
3e1168ff5e
am: module import in common ( #9615 )
2025-03-28 21:29:34 +07:00
nimlgen
a8ff85369e
cpugraph for dsp ( #9601 )
...
* cpugraph init
* fixes
* no cpu for now
* mypy
* fix
2025-03-28 19:06:31 +07:00
nimlgen
fa0ebbd237
jit: optimize before pickle ( #9611 )
...
* jit: optimize before pickle
* optimize weights
* fix
* mypy
* mypy2
2025-03-28 19:06:09 +07:00
George Hotz
392a311312
Revert "add copy button in VIZ code-block ( #9605 )" ( #9610 )
...
This reverts commit d1e8598c81 .
2025-03-28 17:05:44 +08:00
Harsh Natuskar
d1e8598c81
add copy button in VIZ code-block ( #9605 )
...
* works
* only second block has copy
* better function
* better
* ...
* smol function
* update copy-btn css
* updates
2025-03-28 16:52:21 +08:00
qazal
b4ea45b4a6
fix viz recenter + worker cleanup ( #9607 )
2025-03-28 15:24:53 +08:00
Andrew Furey
50dee4a7b3
add test for checking const gradients ( #9598 )
2025-03-27 15:17:37 -04:00
chenyu
5358b0904b
update uop_given_valid if a node becomes const ( #9604 )
...
* update uop_given_valid if a node becomes const
* cleanup
2025-03-27 14:57:46 -04:00
chenyu
a187dfd3df
bert BEAM_UOPS_MAX 3000->4000 ( #9603 )
...
more stable for the final step time
green 410ms (master) -> 397ms (BEAM=4) -> 392ms (this)
red 561ms (master) -> 550ms (this)
2025-03-27 11:58:47 -04:00
qazal
088a677e25
rescale to fit viz graph [pr] ( #9599 )
...
* zoom to fit the graph in viz [pr]
* always on screen fit graph
* space key recenters
2025-03-27 23:33:51 +08:00
nimlgen
3737821b9e
prepare for clang graph ( #9600 )
...
* prepare for clang graph
* emu
* ops
* ops2
* better type
* fix
2025-03-27 20:09:37 +07:00
qazal
bf94924d5a
fix viz with nested graph_rewrite ( #9595 )
2025-03-27 13:14:28 +08:00
qazal
c011751b41
statically define viz arrow heads ( #9594 )
2025-03-27 12:22:04 +08:00
qazal
0877497bad
hotfix: use captured uops in viz render [pr] ( #9593 )
...
* hotfix: use captured uops in viz render [pr]
* better error
2025-03-27 11:52:12 +08:00
qazal
e5ff7b23d7
refactor to @track_matches + add failing test_nested_rewrite ( #9592 )
...
* test_nested_rewrite
* refactor to track_matches
* positional arg
2025-03-27 11:11:56 +08:00
chenyu
62888614f6
lower bert eval bs to 24 ( #9590 )
...
oom during eval
2025-03-26 21:25:23 -04:00
nimlgen
dc9da1d917
memplan into one buffer ( #9526 )
...
* new memplanner
* new should works
* fix
* VALIDATE_MEMORY_PLANNER
* hm?
* ugh
* fix alignment
* fix2
* rm
* tiny fixes
* test
* comments and fixes
* fix2
* liiiinetr
* t
* fix
2025-03-27 01:46:50 +07:00
qazal
8b717c345c
cache viz worker at launch ( #9589 )
2025-03-27 01:10:02 +08:00
George Hotz
d62ced8981
symbolic -> symbolic_flat ( #9588 )
2025-03-26 23:34:43 +08:00
George Hotz
8aaa5e1ec5
generate the individual indexes ( #9587 )
2025-03-26 22:32:06 +08:00
George Hotz
5c6cd884e3
multiple simplifies is faster [pr] ( #9586 )
...
* multiple simplifies is faster [pr]
* cleanup
* cleanup
2025-03-26 21:42:52 +08:00
George Hotz
1e6e75e39a
little changes from dsp branch ( #9582 )
...
* little changes from dsp branch
* not that one
* need the where
* Revert "need the where"
This reverts commit 140f89c878 .
2025-03-26 20:01:21 +08:00