nimlgen
0d6fc0f571
jit: graphing in uops ( #15489 )
...
* jit: graphing as rewrite rule
* f
* +metal,cuda
* x
* cl
* x
* x
* simpler
* f
* m
* x
* revert?
* revert2
* back
* back
* t
* x
* m
* x
* c
* x
* l
* x
* comment
* smaller
* rv
* x
* x
2026-03-27 19:09:02 +03:00
chenyu
30ebbe7f17
few more fold valid tests ( #15509 )
...
from remove CORRECT_DIVMOD_FOLDING attempt
2026-03-27 10:38:42 -04:00
Christopher Milan
9e0cc5c6ae
create image buffers in late codegen ( #15493 )
2026-03-27 04:50:53 -04:00
chenyu
1198d6e908
move pow to mixin ( #15507 )
2026-03-27 03:16:40 -04:00
chenyu
323fcefd7d
Revert "DEV is a ContextVar ( #15505 )" ( #15506 )
...
This reverts commit fdb30cba96 .
2026-03-27 02:22:40 -04:00
Christopher Milan
fdb30cba96
DEV is a ContextVar ( #15505 )
2026-03-27 00:57:09 -04:00
wozeparrot
a65e958be9
llama: new apply_grad ( #15503 )
2026-03-26 19:39:25 -07:00
Christopher Milan
67a50fb738
move where on load with casts ( #15492 )
2026-03-26 22:11:27 -04:00
qazal
586c49642f
viz/cli: test in CI ( #15501 )
...
* viz cli work
* baseline test
* make cli test work without subprocess
* more checks
* check itrace
* s/return/return None
* change
* minimal
* colored
2026-03-27 06:47:15 +09:00
qazal
3f9f0fa846
viz: yield sqtt alt events ( #15500 )
...
* yield other
* less
* work
* less
2026-03-27 04:43:41 +09:00
qazal
237c25031f
sqtt: construct OTHER_SIMD op types with for loop ( #15495 )
...
* other-lds from amd_copy_matmul
* more other
* other simd work
2026-03-26 23:07:18 +09:00
nimlgen
7193f90746
test view input in jit ( #15497 )
...
* will anything fail?
* add test
2026-03-26 16:59:47 +03:00
nimlgen
de24b3fe37
jit: pass init params straight to base ( #15496 )
...
* jit: pass init params straight to base
* linter
2026-03-26 16:59:10 +03:00
qazal
ec5b7a249e
viz: refactor sqtt timeline builder ( #15494 )
...
* viz: refactor sqtt timeline builder
* barrier maps to waves
* clean up cli
2026-03-26 21:16:15 +09:00
Christopher Milan
313937ad6d
fix IMAGE TestEnd2End.test_linear_mnist ( #15488 )
2026-03-26 04:12:47 -04:00
Christopher Milan
bc180a963c
deprecate <dev>=1 in favor of DEV=<dev> ( #15467 )
...
* start work on target
* add test
* update actions to use DEV
* update docs
* update readmes
* tests need that too
* update example
* update tests (comments)
* fix that test
* ruff
* mypy
* oops
* remove getenvs
* don't add Target yet
* and the test
* lint
* and docs
* more stuff
* assert
* few more fixes
* test assert
2026-03-26 03:48:03 -04:00
chenyu
8426f820a1
Tensor.sub to mixin ( #15486 )
...
also _broadcasted skipped broadcasting shape if it does not have shape
2026-03-25 23:20:56 -04:00
wozeparrot
1ca178f379
llama: stochastic rounding ( #15456 )
2026-03-25 18:16:31 -07:00
chenyu
7c8f992894
move EXPAND dtype cast back to gradient.py ( #15481 )
...
only a concern for gradient, not mixin
2026-03-25 19:25:26 -04:00
nimlgen
9d2d0774b4
remote: disk copies ( #15482 )
...
* remote: disk copies
* lineter
* r
* nv
* x
2026-03-25 22:14:25 +03:00
qazal
7c2c8d3905
viz: small ux improvements ( #15483 )
...
* test
* better
* work
2026-03-26 03:18:25 +09:00
qazal
737d5f67f9
viz: compute canvas dims for auto zoom ( #15474 )
2026-03-26 00:05:23 +09:00
qazal
60bd546593
sqtt: add cycle count to rdna3 enums ( #15473 )
...
* update rdna3 sqtt enums to include cycle_count
* dispatch_to_exec
2026-03-25 23:19:54 +09:00
chenyu
142bf11926
logical_not to mixin [pr] ( #15472 )
...
also UPat.cast skips same dtype
2026-03-25 09:16:45 -04:00
George Hotz
25ff7146f2
add a status line to REMOTE with DEBUG=1 ( #15471 )
...
* python speedups of hot paths
* add a status line to REMOTE with DEBUG=1
* pc
* t
2026-03-25 20:54:56 +08:00
qazal
c973b508b8
viz/cli: pass ctrlc ( #15470 )
2026-03-25 21:13:28 +09:00
George Hotz
c1a7d90ccc
python speedups of hot paths ( #15469 )
2026-03-25 20:02:42 +08:00
George Hotz
ae7090b13b
print function timing with DEBUG=2 ( #15468 )
...
* add DEBUG=2 function timing
* remove those functions, they aren't useful
* fix spec
2026-03-25 19:07:32 +08:00
Christopher Milan
e7f389efda
fix height=1 images on macos ( #15460 )
2026-03-25 05:59:56 -04:00
George Hotz
789628df2e
hotfix: add USE_BOT flag to ASM24 USB
2026-03-25 15:00:08 +08:00
George Hotz
cd1a276f47
llm: support gguf path or url ( #15464 )
...
* llm: support gguf path or url
* one line
2026-03-25 14:43:19 +08:00
chenyu
713b322e70
add weakint to promo_lattice ( #15463 )
...
sits between bool and smallest int
2026-03-25 00:27:34 -04:00
chenyu
02878c5a2f
move _broadcasted to OpMixin ( #15461 )
...
it needs both ElementwiseMixin and MovementMixin
2026-03-24 23:56:01 -04:00
chenyu
519ba22470
more Tensor._broadcasted cleanup ( #15459 )
...
prep moving to mixin
2026-03-24 22:55:45 -04:00
George Hotz
fe2690399b
llm: support assistant prefill + refactor to TransformerConfig ( #15457 )
...
* llm: support assistant prefill
* refactor to ModelConfig
* TransformerConfig
* more
2026-03-25 10:50:48 +08:00
Christopher Milan
fd92aec094
cleanup unused image pitch code ( #15458 )
2026-03-24 22:47:16 -04:00
chenyu
f6ed4da268
Tensor.ufix ( #15452 )
...
* Tensor.ufix
prep moving _broadcasted to mixin
* remove backward_cast
2026-03-24 22:34:43 -04:00
qazal
1b3d00d6ac
viz/cli: remove --offset and --limit flags ( #15439 )
...
* work
* also no more no-color
* reorder
* update llama
* sqtt readme
* itertools
* rm that
* signals back
2026-03-25 09:52:27 +09:00
wozeparrot
da2031266a
llama: correct 8b init ( #15397 )
2026-03-24 13:41:41 -07:00
qazal
652bab8aad
viz: support nested track_rewrites ( #15454 )
...
* simple test
* stack active groups
2026-03-25 05:01:30 +09:00
qazal
41eb2cc41b
viz: preserve zoom between re renders ( #15451 )
2026-03-25 03:11:10 +09:00
Salman Chishti
84049fdc07
Upgrade GitHub Actions to latest versions ( #15446 )
...
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-03-24 10:28:49 -04:00
Salman Chishti
9567075e20
Upgrade GitHub Actions for Node 24 compatibility ( #15445 )
...
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-03-24 10:28:19 -04:00
chenyu
b7960841af
support shape broadcast in UOp.alu ( #15442 )
...
i think it can integrate tighter, but now Tensor also does ufix from UOp and implicit dtype upcast
2026-03-24 10:14:57 -04:00
George Hotz
a33ac869aa
llm server: temperature + test client ( #15444 )
...
* improvements to the llm server
* eval script
* eval llm
* better eval gets 58.71
* cleanups
* add temperature, but multinomial is absurdly slow
* claude is so smart
* lint
* remove slop
* no more stop
2026-03-24 21:07:15 +08:00
nimlgen
9db5d677c7
jit in viz ( #15447 )
2026-03-24 18:23:53 +08:00
Christopher Milan
2e4fbbcc9c
ir3: fix texture mapping and benchmark ( #15443 )
2026-03-24 04:52:54 -04:00
Christopher Milan
d5320a9ddf
QCOM cleanups ( #15435 )
2026-03-23 22:18:38 -04:00
George Hotz
85dee83f5d
amd flash attention cleanups + emulator fixes ( #15431 )
...
* amd flash attention cleanups
* simpler
* params
* fix emulator bugs
* fix idiv bug
* remove that test
* more emu fixes
2026-03-24 10:10:46 +08:00
chenyu
018a9e2d3c
remove match_dtype arg in Tensor._broadcasted ( #15440 )
...
reworked Tensor.where to not need it, also updated dtypes.from_py to use isinstance because ConstFloat issues
2026-03-23 22:10:39 -04:00