Commit Graph

9078 Commits

Author SHA1 Message Date
qazal
1ad8062591 more generic naming in VIZ [pr] (#10695)
* note

* rename kernel to ctx

* rename uop things to currentStep + expandSteps

* already destructured

* some things that were called ctx are steps

* still a kernel
2025-06-08 15:37:39 +03:00
qazal
c70486908e viz: clicking a KERNEL node can open codegen rewrite (#10683)
* work

* now it doesn't have 20% slowdown

* label like this

* closer

* ansiStrip

* remove

* better

* id is faster

* fix that
2025-06-08 13:11:03 +03:00
George Hotz
48eb7d76b1 use ALLOW_DEVICE_USAGE context variable instead of MainProcess check (#10693)
* use DISALLOW_DEVICE_OPEN context variable instead of MainProcess check

* device usage can be disallowed
2025-06-08 00:07:40 -07:00
geohotstan
dedff0e96c fix run huggingface onnx debug (#10679) 2025-06-08 00:59:20 -04:00
George Hotz
8c76250d31 speed up a few tests (#10692) 2025-06-07 20:39:25 -07:00
chenyu
e80870e27c BasicBlock2 -> BasicBlock [pr] (#10691) 2025-06-07 23:33:51 -04:00
George Hotz
7ff175c022 cache a venv to avoid pip usage (#10689)
* try built in pip caching

* try venv

* export venv

* set VIRTUAL_ENV

* revert that

* venv key

* fix

* ci cache hit?

* fix windows
2025-06-07 20:13:41 -07:00
ihar
40c1479267 added unit tests for 'argfix' (#10678) 2025-06-07 22:17:10 -04:00
ihar
74b849b5e1 remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape' (#10677)
* remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape'

* added the same set of unit tests for 'view' as for 'reshape' since 'view' is just an alias for 'reshape'

* improved tests for 'view' op
2025-06-07 22:15:31 -04:00
chenyu
e88fe41d37 update vits vctk model to use download from huggingface (#10688)
google drive points to a warning page that does not work
2025-06-07 20:47:28 -04:00
Sieds Lykles
c29a56dd51 Fix whisper OOB (#10685)
* fix whisper and test

* remove import
2025-06-07 20:23:50 -04:00
George Hotz
53ed64e133 ci speed work 1 (#10676)
* skip a few slow tests

* use a venv for python packages

* create venv

* no user, it's in venv

* ignore venv

* venv

* new cache key

* try that

* this

* version the python cache
2025-06-07 16:33:11 -07:00
George Hotz
db01c5a08a ramp.py file from stream (#10686) 2025-06-07 14:58:21 -07:00
Sieds Lykles
2f605eadf7 fix oob (#10666) 2025-06-07 11:32:03 -04:00
qazal
cb61774ab6 move shared viz fields out of serve.py [pr] (#10684)
* move shared viz fields out [pr]

* update javascript

* update test_viz
2025-06-07 17:18:18 +03:00
qazal
b515d796fb inline viz get_name [pr] (#10682)
* inline viz get_name [pr]

* changing name_fxn makes this simpler

* waitUntil dom
2025-06-07 11:16:16 +03:00
qazal
86a19e19e8 cleanup bits of viz [pr] (#10681) 2025-06-07 09:18:12 +03:00
wozeparrot
e3805171e2 feat: variable bs bitcast (#10674) 2025-06-06 17:21:53 -07:00
George Hotz
54db1f8ee8 prevent huge waste of multi ram (#10669)
* prevent huge waste of multi ram

* fix ram usage

* only define var

* add resolve

* fix tests

* fix cifar training

* remove that logic

* fix test without long
2025-06-06 17:17:21 -07:00
George Hotz
b68b7dbc2a test winograd is close to normal conv [pr] (#10557)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-06 19:11:49 -04:00
nimlgen
85cea23557 nv: original bw qmd (#10672)
* nv: original bw qmd

* forgot
2025-06-07 01:43:22 +03:00
George Hotz
5ef7c5923f docs: remove unused METAL_XCODE env var (#10421) 2025-06-06 18:39:54 -04:00
Sidharth N. Babu
ef14dfb277 compile fixes (#10442) 2025-06-06 18:38:37 -04:00
leopf
eb7305e6a4 Tensor.keccak("sha3_256") (#7186)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-06-06 15:24:05 -07:00
nimlgen
346b8542da nv: fix inval from gpu_get_id_info_v2 (#10670) 2025-06-07 00:54:32 +03:00
chenyu
bdede4924e fix odd number in get_test_global_size (#10671)
factor might not be a integer if input global_size has an odd number in it
2025-06-06 17:31:35 -04:00
George Hotz
bf4ffc054c mstack replaces scheduler complexity (#10654)
* mstack replaces scheduler complexity

* leave that one

* contiguous

* work

* upd

* minimal failing test

* simpler

* attention is broken

* fix transformer

* failing tests

* real fix for llama

* kv cache test

* jit multi assign test

* better tests

* comment

* fix jit issue

* traverse after buf_uop
2025-06-06 11:31:41 -07:00
George Hotz
7f0f97aa76 new test_multitensor tests (#10667)
* new test_multitensor tests

* cleanup scheduler
2025-06-06 10:26:28 -07:00
qazal
5170f387b3 remove UOp.metaop [pr] (#10664)
* little simpler UOp.const_like [pr]

* remove UOp.metaop

* bind

* remove

* min diff

* that comment is fine
2025-06-06 16:21:48 +03:00
chenyu
4a6d84c4c3 hotfix llama start_pos vmax is max_context-1 (#10659)
* hotfix llama start_pos vmax is max_context-1

fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0`

* hotfix: multitensor transformer test tests kv cache

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-06-06 00:41:25 -04:00
George Hotz
5eb6e1e65a Revert "hotfix: multitensor transformer test tests kv cache"
This reverts commit ad9f88419a.
2025-06-05 21:15:34 -07:00
George Hotz
ad9f88419a hotfix: multitensor transformer test tests kv cache 2025-06-05 21:08:57 -07:00
George Hotz
8325c4f192 tests for multi assign (#10658)
* tests for multi assign

* transformer tests

* add that assert
2025-06-05 20:56:40 -07:00
wozeparrot
0d86f8d375 fix failed threefry (#10646) 2025-06-05 17:17:42 -07:00
chenyu
e67642d430 update doc example for multinomial (#10657)
also added many `s` for consistency
2025-06-05 20:16:52 -04:00
Eitan Turok
61352b8aa2 Add some more docs (#10634)
* more docs

* Add multinomial to ops

* better doc
2025-06-05 19:40:37 -04:00
qazal
884b6cf288 remove gbarrier on const (#10656) 2025-06-06 02:36:52 +03:00
chenyu
ff1aad7b69 fix const float pow to int tensor (#10655)
was incorrectly casted into int
2025-06-05 19:15:12 -04:00
George Hotz
6619f17e26 force store to be contiguous (#10652) 2025-06-05 15:42:54 -07:00
wozeparrot
37e1ef1be3 feat: cleanup old AM processes (#10653) 2025-06-05 15:41:00 -07:00
George Hotz
baba274a76 minimal mstack pr to fix allreduce (#10649)
* minimal mstack pr to fix allreduce

* fix webgpu
2025-06-05 15:14:53 -07:00
George Hotz
4c315f8e17 MSTACK little non-functional changes (#10648) 2025-06-05 13:20:22 -07:00
b1tg
79d04d1baf AMD_LLVM: support mfma for mi300x (#10625)
* amd llvm: support mfma for mi300x

* don't pass self

* refactor wmma render

* arch as lambda arg

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-06-05 15:55:44 -04:00
chenyu
46811d0d3c minor external_model_benchmark cleanup (#10644) 2025-06-05 14:13:28 -04:00
qazal
26afbc954f delete redundant tests from test_schedule [pr] (#10643) 2025-06-05 20:08:39 +03:00
chenyu
80ebce421d remove metal buffer limit in external_model_benchmark [pr] (#10642)
not needed anymore
2025-06-05 13:00:51 -04:00
qazal
28c4997236 check for matching shape order in fused reduce (#10641)
* failing test

* shapes match with ones removed
2025-06-05 19:37:22 +03:00
qazal
1190062812 prevent grouper can_chase while fusing arange [pr] (#10623) 2025-06-05 18:50:21 +03:00
uuuvn
69f7778985 refactor renderer launch bounds [pr] (#10617) 2025-06-05 08:38:04 -07:00
qazal
8c5ea00522 push permutes through fused reduces (#10628)
* fix pushing reshapes through reduceops

* reduceop_view_right should assert on ndims mismatch

* update that, view.reshape asserts it
2025-06-05 16:14:04 +03:00