uuuvn
2c32126fc8
am: AMRegister refactor ( #9572 )
2025-03-26 00:52:40 +07:00
chenyu
cddd750d68
add a failed test case for jit/nojit rand [pr] ( #9574 )
...
currently adding jit produced different rand values
2025-03-25 13:32:44 -04:00
nimlgen
4cf2b68ca8
am_smi: fix init for newer versions ( #9559 )
2025-03-25 23:48:05 +07:00
qazal
a6a5c0aec5
add NULL=1 backend ( #9573 )
...
* add NULL=1 backend
* NullAllocator
* line
* metadata should still work
* it shouldn't have memory usage
* Revert "it shouldn't have memory usage"
This reverts commit a9080fdd43 .
* back
* null flops
2025-03-25 22:20:52 +08:00
qazal
b60d9976b4
better yaxis formatting in viz memory graph ( #9570 )
...
* better bytes format
* pluralize
* 1 less line
2025-03-25 16:50:22 +08:00
qazal
faf3b5b245
display kernel metadata in memory viz ( #9569 )
...
* display kernel metadata in memory viz
* fix that
2025-03-25 13:14:54 +08:00
qazal
52301fe68e
move Buffer refcount increment out of schedule.py ( #9564 )
...
* move Buffer refcount increment out of schedule.py
* add TestGC.test_assign_refcount
* refcount refers to Ops.BUFFER UOps
2025-03-25 12:08:27 +08:00
qazal
262f5a2bd3
hotfix: replace link in viz/readme ( #9568 )
2025-03-25 10:24:49 +08:00
chenyu
6427272bf6
minor update to rand [pr] ( #9566 )
2025-03-24 18:49:50 -04:00
chenyu
b0e070e737
remove MOCKGPU workaround in rand ( #9565 )
...
also `requires_grad_` to save a line
2025-03-24 17:49:45 -04:00
qazal
d7c754ce49
failing test for UOp buffer ref count ( #9563 )
...
* failing test for UOp buffer ref count
* lint
2025-03-25 00:10:48 +08:00
b1tg
f90001e1a6
amd llvm render (no_comgr prereq) ( #9543 )
...
* amd llvm render
* skip test_div_rounding_mode
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-03-24 22:50:51 +08:00
Priyank Patel
4f5e03bd60
better fix inplace detach ( #9557 )
2025-03-24 22:50:28 +08:00
qazal
1c40873962
show buffer info in memory viz ( #9562 )
2025-03-24 22:12:30 +08:00
qazal
efaee75656
start viz of memory usage ( #9561 )
...
* start viz of memory usage
* polygons/bars + use d3
2025-03-24 19:05:35 +08:00
qazal
1cfe6d02fe
refactor uop_to_json to return a dict [pr] ( #9560 )
2025-03-24 16:38:17 +08:00
nimlgen
edf9e1bf8d
am: move out soc21 to a sep module ( #9551 )
...
* am: soc module is not part of am
* am: soc module is not part of am
2025-03-24 14:17:42 +07:00
George Hotz
74d98eafb8
add onnx frontend stub [pr] ( #9558 )
2025-03-24 12:24:34 +08:00
George Hotz
de7d6cec3a
hotfix: DEBUG 5 prints the ast
2025-03-24 11:43:11 +08:00
chenyu
ba41076e94
update embedding test to not use dtypes.long [pr] ( #9556 )
2025-03-23 21:33:38 -04:00
chenyu
c965f4c20b
update bert config ( #9555 )
...
BEAM 4->5 for green, 2% faster
use AMD driver instead of AM for red, 5% faster
2025-03-23 16:14:41 -04:00
chenyu
d734e24c01
minor WEBGPU_PATH cleanup [pr] ( #9552 )
...
also mypy recognizes `sys.platform == 'win32'` but does not recognizes it if wrapped inside a helper...
2025-03-23 09:10:02 -04:00
Ahmed Harmouche
7ce7fe0574
Refactor webgpu_dawn lib finding ( #9547 )
...
* Refactor webgpu_dawn lib finding
* Fix ruff
2025-03-23 08:23:29 -04:00
uuuvn
c631c72f22
HCQ: Increment timeline signal before submitting ( #9550 )
...
`AMDComputeQueue.__del__` frees `hw_page` which is safe because
`AMDAllocator._free` does `self.dev.synchronize()` which is supposed
to wait for execution of IB to finish, however that doesn't happen if
AMDComputeQueue is dropped right after submit before timeline signal is
incremented, which it is in most places leading to a race if .bind() is
also used (required for multi-xcc because bug in mec fw treats all
PACKET3_PRED_EXECs outside IBs as if they had EXEC_COUNT of zero).
2025-03-23 18:30:38 +07:00
nimlgen
d5667419af
am: move out pte creation logic ( #9548 )
...
* am: move out pte creation logic
* emu
* ops
2025-03-23 18:29:10 +07:00
geohotstan
309afa20b7
add Tensor.max_unpool2d ( #9518 )
...
* why does max_unpool2d feel slower than out.gradient ...
* slightly cleaner
* what happened to ruff
* need to think about this some more
* slightly faster now?
* clean up, 1 more failing edge case
* ok good
* working TINY_BACKEND
* nit doc wording
* retry CI
2025-03-22 12:11:33 -04:00
quortus
bdd44d4255
Fix DSP transcendentals ( #9542 )
2025-03-22 11:08:18 +08:00
Ignacio Sica
eddafb84e5
Bugfix for TC=3 ( #9464 )
...
* wrong but uses less shared
* for size 8 tc1 with devectorize in 0 loads into local before wmma and works
* improvements over tc1 devectorize
* fix tc=3
* works for handcoded tc opts
* clean bugfix tc=3
* fix
* revert changes
2025-03-21 16:43:42 -07:00
chenyu
6da78164f9
assert Kernel ast.op to be Ops.SINK [pr] ( #9539 )
...
rest of the code assumes self.ast is defined anyway
2025-03-21 18:09:44 -04:00
chenyu
c33679c47b
increase size in test_multinomial_counterexample ( #9540 )
...
should be less flaky
2025-03-21 17:46:52 -04:00
Francis Lata
1a1087e3a0
cleanups on losses and dataset tests ( #9538 )
2025-03-21 17:03:18 -04:00
Francis Lata
8cbe4009fc
RetinaNet losses ( #9536 )
...
* add sigmoid_focal_loss and l1_loss
* update ref implementation comment
2025-03-21 15:52:54 -04:00
Francis Lata
e6389184c5
update comment for retinanet dataloader implementations ( #9534 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-21 15:07:45 -04:00
chenyu
ee3d313b34
Revert "update ruff to 0.11.2 ( #9531 )" ( #9535 )
...
This reverts commit d8d65e2747 .
2025-03-21 14:52:25 -04:00
chenyu
b46b8ee15e
add a flag to log when beam surpassed max limit [pr] ( #9533 )
2025-03-21 13:37:02 -04:00
Francis Lata
eb95825eea
RetinaNet dataloader ( #9442 )
...
* retinanet dataloader
* remove batch_size from generate_anchors
* refactor kits19 dataset tests
* add tests for dataloader
* fix testing setup and cleanups
* remove unused import
2025-03-21 13:36:41 -04:00
b1tg
58206fa8a9
add amd llvm compiler ( #9519 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-21 23:13:27 +08:00
chenyu
d8d65e2747
update ruff to 0.11.2 ( #9531 )
...
0.11.2 fixed the false alert from 0.11.1. also pinned the version in setup for now to prevent broken CI from ruff upgrade
2025-03-21 10:32:59 -04:00
qazal
ee3ed73ed1
add reorder_view matcher to scheduler [pr] ( #9528 )
2025-03-21 17:46:20 +08:00
George Hotz
8e555c586c
switch quantization to unsigned/unsigned + add Ops.REDUCE ( #9527 )
...
* switch quantization to unsigned/unsigned + add Ops.REDUCE
* tests
* nhwc + replay pkl
2025-03-21 17:02:37 +08:00
nimlgen
a35b0a88bf
am: just rename and reorder ip init funcs ( #9504 )
2025-03-21 15:57:32 +08:00
nimlgen
8a131ab271
am: allow allocations as small as a page ( #9523 )
...
* am: fix allocs
* bettermsg
* comment
* next time
2025-03-21 15:53:32 +08:00
Sieds Lykles
3ad3ac4d1e
Change dtypes.int to dtypes.ints ( #9517 )
2025-03-20 17:24:26 -04:00
chenyu
b9fab9b914
pin ruff to 0.11.0 in CI ( #9520 )
...
0.11.1 had a bug https://github.com/astral-sh/ruff/issues/16874 that breaks ci
2025-03-20 13:12:50 -04:00
George Hotz
3c5161b4cb
add validation of the bounds of Ops.INDEX ( #9503 )
...
* add validation of the bounds of Ops.INDEX
* do mask properly
* more validation
* correct
* fix gated
* add CAST support to vmin/vmax
* fix ptx and image
* ptx no diff
* upat.index also stays
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-03-20 12:15:55 +08:00
qazal
0b20f91ce7
remove move_mask from the devectorizer ( #9511 )
...
* remove move_mask from the devectorizer
* add (wrong) ptx
* reason
* enable index addition in PTX, we won't have the INDEX anyways
* space
2025-03-20 11:53:12 +08:00
qazal
9302738263
hotfix: more consistent wgsl.py spacing + cleanups [pr] ( #9515 )
...
* hotfix: more consistent wgsl.py spacing + cleanups [pr]
* free things up
2025-03-20 11:07:15 +08:00
George Hotz
68053d0510
dsp stuff / sniff ioctls from snpe ( #9490 )
...
* sniff ioctls from snpe
* dump input buffers
* snpe logs from dsp
* NHWC support
* knum 3
* this run?
* revert those
---------
Co-authored-by: Comma Device <device@comma.ai >
2025-03-20 10:38:23 +08:00
qazal
2223b93338
add UPat.or_casted [pr] ( #9513 )
2025-03-20 10:08:32 +08:00
qazal
1839e8c9b3
place masks in INDEX for TestGatedStoreRewrite [pr] ( #9512 )
2025-03-20 09:46:53 +08:00