nimlgen
486d53d646
device: call free for external_ptr ( #14448 )
...
* device: call free for external_ptr
* lin
2026-01-30 23:53:17 +03:00
nimlgen
e0978498dc
amd: read_ptr/write_ptr/doorbells are not lists ( #14445 )
2026-01-30 23:11:57 +03:00
Christopher Milan
1803ee939d
EMULATED_DTYPES=long works with CPU_LLVM ( #14446 )
2026-01-30 13:54:43 -05:00
chenyu
03613e83ad
update TestTensorMetadata ( #14443 )
...
run with SCACHE=0 some more TODOs
2026-01-30 12:39:01 -05:00
George Hotz
cbb1eed57b
hotfix: partial revert of 9eb449f88, caused llama NaN
2026-01-30 17:19:27 +00:00
chenyu
26f5c00265
move TestTensorMetadata to unit ( #14442 )
2026-01-30 12:14:21 -05:00
chenyu
c05a0b85ae
flip unique const src order [pr] ( #14441 )
...
* flip unique const src order [pr]
matches buffer, simplifies replace_input_buffer
* combine rules
2026-01-30 11:44:18 -05:00
George Hotz
ee2c78709d
mlperf/llama: disable USE_ATOMICS for now
2026-01-31 00:42:08 +08:00
chenyu
beecac4d85
expand ranges -> unroll outer ranges [pr] ( #14440 )
2026-01-30 11:26:05 -05:00
chenyu
9eb449f882
clean up toposort sched_sink [pr] ( #14439 )
2026-01-30 10:18:28 -05:00
George Hotz
838cd078bc
use atomics for embedding backward ( #14400 )
...
* embedding is slow
* failing
* float is fine
* null
* it fails
* simplify embedding with broadcasting
* ATOMIC_ADD incoming
* min change
* simpler test
* better test
* fix test
* real test
* simpler
* cleanups
* types and names
* _zero_kernel
* grad multi
* hack
* none
* multi unshard
* more for call
* don't tag in call
* good
* call_multi
* call_multi wow claude is useless
* embedding backward mutli test
* test passes
* fix as_param
* shape_to_shape_arg
* add clip
* before cast
* fix spec=2, use atomics
2026-01-30 18:10:59 +08:00
nimlgen
1998e0bb28
nv: add prof props to dev ( #14437 )
2026-01-30 12:51:43 +03:00
George Hotz
7a9dee4e50
add call/param UOps ( #14433 )
...
* add call/param UOps
* resolve call
* skip that for now
* grad on call
* fix tests
2026-01-30 14:51:45 +08:00
qazal
66d6a68016
viz: sqtt work from cdna gemm ( #14434 )
...
* it's the tag
* initialize rows based on the disasm
* test_cfg with Ops.BINARY
* pyremu wants s_code_end?
* test_diamond
* diff cleanup
2026-01-30 14:00:56 +09:00
Christopher Milan
88caf57ef4
ci: unify python versions ( #14430 )
2026-01-29 21:42:03 -05:00
chenyu
86a204d22a
allow Tensor setitem input to be list/tuple ( #14432 )
...
matches assign, and generally matches numpy
2026-01-29 21:26:58 -05:00
chenyu
4a80319093
clean up split_store final logic [pr] ( #14429 )
...
explicitly check the structure
2026-01-29 18:40:07 -05:00
Christopher Milan
e47f12f671
ci: replace testing_minimal with testing_unit ( #14427 )
2026-01-29 18:02:43 -05:00
wozeparrot
c2fb8b208f
fa: 32 block size ( #14416 )
2026-01-29 13:59:13 -08:00
chenyu
a979fafae5
cleanup around disk buffer [pr] ( #14428 )
...
style change, prep for refactor
2026-01-29 16:18:44 -05:00
nimlgen
dc977a03b0
nv_pma: bw decoder ( #14424 )
...
* nv_pma: bw decoder
* decoder fix
* better
2026-01-30 00:12:39 +03:00
chenyu
ddc041854b
failed test case for disk setitem ( #14426 )
...
strided setitem is wrong
2026-01-29 14:54:19 -05:00
chenyu
31706bf6bc
add few more types [pr] ( #14425 )
2026-01-29 14:04:09 -05:00
nimlgen
2d5c24879f
nv: pma for 5090 ( #14420 )
...
* nv: pma for 5090
* hm
* 4090
2026-01-29 20:06:01 +03:00
nimlgen
c8dc6332d2
memory: read_fields is not universal ( #14348 )
2026-01-29 20:00:00 +03:00
chenyu
dbe8f034a7
pass z3.Context in validate ctx [pr] ( #14423 )
...
does not need to pass the whole solver
2026-01-29 11:11:47 -05:00
chenyu
033ce1b885
types for validate.py ( #14422 )
2026-01-29 10:56:50 -05:00
nimlgen
230d08ec70
test for am recovery and faults handling ( #14421 )
...
* test for am recovery and faults handling
* linter
2026-01-29 17:11:24 +03:00
George Hotz
793afbd473
simplify nn.Embedding, support AFTER in CUSTOM_KERNEL ( #14419 )
2026-01-29 17:22:13 +08:00
Christopher Milan
0c855d6149
ci: remove unused pydeps ( #14418 )
2026-01-29 01:51:26 -05:00
wozeparrot
4845e42135
llama3 gradacc fixes ( #14414 )
2026-01-28 19:12:39 -08:00
chenyu
37cde4a01a
add one line mypy report ( #14415 )
2026-01-28 20:39:32 -05:00
chenyu
15aed51544
return types for all math.py function ( #14413 )
...
calling int() on sint -> int, i think it's better support since some UOp can be safely cast to int
2026-01-28 20:10:11 -05:00
nimlgen
aec1ae0de1
llama: set manual_seed ( #14409 )
2026-01-28 14:40:00 -08:00
chenyu
0870ed28b1
add Self type to MathMixin ( #14411 )
...
these don't cause error
2026-01-28 16:59:38 -05:00
chenyu
079f33c208
fix type in Tensor.mean and Tensor.var ( #14410 )
...
use Tensor.from_uop to wrap UOp from symbolic shape, kernels are the same
2026-01-28 15:24:02 -05:00
chenyu
2b5e99ccc1
minor type cleanups [pr] ( #14408 )
...
mypy --warn-redundant-casts has false negative
2026-01-28 14:11:50 -05:00
chenyu
726415dbc8
import sint directly in movement.py TYPE_CHECKING ( #14406 )
...
avoid creating string TypeAlias, fixed warning in `TYPED=1 python test/test_tiny.py`
2026-01-28 12:47:26 -05:00
nimlgen
acb2fc36ba
nv_pma: add decoder ( #14404 )
...
* nv_pma: add decoder
* cl
2026-01-28 20:44:02 +03:00
chenyu
7b9bc1d8cf
_MockMemoryviewMeta for mockgpu ( #14405 )
...
fixed `PYTHONPATH=. TYPED=1 DEV=AMD MOCKGPU=1 python test/test_tiny.py`. basically make `isinstance(TrackedMemoryView_instance, memoryview)` true
2026-01-28 11:59:00 -05:00
chenyu
93793a645b
use cl.cl_mem instead of internal ctypes._CData ( #14403 )
...
fixed `CHECK_OOB=0 DEV=CL TYPED=1 python test/test_tiny.py`
2026-01-28 10:56:41 -05:00
chenyu
a9b44070a8
fix webgpu runtime types ( #14402 )
...
`CHECK_OOB=0 DEV=WEBGPU TYPED=1 python test/test_tiny.py` passed, also skip tests that failed locally
2026-01-28 10:37:25 -05:00
George Hotz
0c6b3f50aa
add marker to llama training ( #14401 )
2026-01-28 22:44:28 +08:00
Jakob Sachs
2b7c00d3d2
fix sd-example dtype for CLIP embeddings ( #14397 )
2026-01-28 09:07:19 -05:00
qazal
a5a9ce3fdf
viz: disasm cleanups from null emulate ( #14399 )
...
* it's AMDHIPRenderer
* don't need that indent
* less assignment stuff
* that arg order did not make sense
* pmc
2026-01-28 22:03:30 +09:00
nimlgen
544928766d
hcq_smi: kill mac pids ( #14398 )
2026-01-28 15:00:28 +03:00
George Hotz
202b74b369
assembly/amd: continue refactors ( #14386 )
...
* simpler
* merge
* flat
* no ctx
* use the correct apis
* dup code
* write clean code
* remove bad helpers
* bits junk remove
* junk remove
* smem test
* fix tests
* correct fix + tests
* Fmt matters it seems
* wmma refactor
* a lil more
* kimi cleanups
* line
2026-01-28 17:33:03 +08:00
qazal
5bffa17f82
llama train: better NULL=1 EMULATE=AMD_CDNA4 dev experience ( #14395 )
...
* beam opens devices
* switch to hip renderer
* amd: true?
* llvm true is for test_autogen
2026-01-28 17:31:22 +09:00
qazal
0294014108
fix bufferize cost function for multi, improve VIZ=-1 cli ( #14394 )
...
* improve cli
* remove_bufferize change
2026-01-28 15:53:18 +09:00
qazal
c158acea29
failing multi ram usage test from llama gemm ( #14392 )
2026-01-28 14:32:32 +09:00