nimlgen
abd830b260
am: setup_rinf returns only doorbell ( #15112 )
2026-03-03 19:27:41 +03:00
nimlgen
4b42bb54aa
am: reset sdma to start from 0 ( #15109 )
2026-03-03 18:14:46 +03:00
George Hotz
01ddb4c267
add precompile to call ( #15099 )
...
* add precompile to call
* put get back
* something
* after structure
* alt
* keep it call
* resolve call
* resolve linear call
* precompile works with llm
* revert rangeify
* color for debugging
* getenv PRECOMPILE
* clean up deco pattern
* fully recursive sink scheduling
* revert llama
* fix SPEC=2
2026-03-03 22:32:42 +08:00
qazal
c7f908b788
sqtt: fix rdna4 structs ( #15111 )
...
* work
* DEBUG=2
2026-03-03 23:32:14 +09:00
qazal
8dd691761d
sqtt: remove old files ( #15108 )
2026-03-03 22:43:24 +09:00
Christopher Milan
de043226ba
benchmark comma usbgpu driving_vision step and load time ( #15103 )
...
Co-authored-by: Comma Device <device@comma.ai >
2026-03-03 06:08:03 -05:00
Christopher Milan
5f6b610da1
FLOAT16 logic for IMAGE==1 goes back to image_conv2d ( #15105 )
2026-03-03 05:37:57 -05:00
wozeparrot
529318259c
fix: fix null tests to actually use null device ( #15104 )
2026-03-03 02:05:47 -08:00
George Hotz
7d025089e3
no after removal ( #15102 )
...
* no after removal
* we are using walk
* null schedule test
* pytest deps
* Revert "pytest deps"
This reverts commit 5e1c5304ec .
* Revert "null schedule test"
This reverts commit 02da66053e .
* clean null tests
2026-03-03 17:50:31 +08:00
wozeparrot
92c16810ac
feat: per device mem_used ( #15100 )
2026-03-03 01:31:28 -08:00
qazal
e3a0598d0b
viz: the whole pc should be in view ( #15101 )
2026-03-03 17:17:53 +09:00
b1tg
a9ea36de79
assembly/amd: v_cmp_lg_f32 is ordered not-equal ( #14982 )
2026-03-03 15:37:48 +08:00
wozeparrot
c35de9bd68
asm_gemm: support more sharding ( #15002 )
2026-03-02 23:16:37 -08:00
wozeparrot
824ba4386a
llama3 dp fix ( #15098 )
2026-03-02 22:43:07 -08:00
chenyu
5dcf29b1a0
use clone in test_swap_slices ( #15096 )
2026-03-02 22:05:12 -05:00
Christopher Milan
c70e8af068
move IMAGE FLOAT16 logic to allocations ( #15095 )
...
* FLOAT16 logic in allocations
* cleanup
* separate that
* only apply when IMAGE == 1
* test passing now
* create image buffers earlier
2026-03-02 22:00:05 -05:00
George Hotz
d483e4153a
buffer view is like buffer ( #15082 )
...
* buffer view is like buffer
* fix
* swap_reshape_shrink
* contiguous on gguf, fix overlap
* revert that
* _device_supports_view
* this
* fix that test
* 0 buffers
* that test was wrong
* this
* check correct size
* contig BUFFER_VIEW
* this
* fix tests
* buffer view tests
* om
* fix torch
* no MOCKGPU
* skip
2026-03-03 09:52:33 +08:00
qazal
62ee976c1b
gemm/asm: cleanup repeated patterns to helper functions ( #15094 )
2026-03-03 08:14:47 +09:00
qazal
848f5cea96
viz: sqtt instruction packet trace ( #15065 )
2026-03-03 07:55:04 +09:00
chenyu
14d1c5fdfd
assign fusion tests on detach and contiguous_backward ( #15092 )
2026-03-02 15:21:51 -05:00
nimlgen
dfa180413d
tbgpu: sign nv ( #15087 )
2026-03-02 22:58:30 +03:00
chenyu
71f228f80f
test exact kernel count in torch_backend/test_kernel_fusion ( #15091 )
2026-03-02 14:26:32 -05:00
chenyu
f80b1033c5
simpler Tensor.all ( #15089 )
...
same generated kernel
2026-03-02 11:08:55 -05:00
chenyu
4008f7d4e8
move Tensor.one_hot +1 to python ( #15088 )
2026-03-02 10:56:41 -05:00
nimlgen
dafbe9733a
am: cleanup ( #15086 )
2026-03-02 17:06:21 +03:00
qazal
f7aeff6061
viz: cli.py cleanups, do not require PYTHONPATH ( #15085 )
...
* cleanup the print
* sys.exit
* equal check
* cleanup unpacker
* cli doesn't need PYTHONPATH
* no semicolons
* %s/PYTHONPATH=. //g
2026-03-02 19:24:38 +09:00
George Hotz
5ff278446c
add contiguous_view_offset ( #15084 )
...
* add contiguous_view_offset
* no int
2026-03-02 18:05:04 +08:00
Christopher Milan
977c270774
IMAGE=1 kernel count failing tests ( #15083 )
2026-03-02 04:35:26 -05:00
George Hotz
3539693555
Support triu variable on diagonal + SDPA symbolic ( #15081 )
...
* triu variable
* fails
* dumbbb
* no commutative in reshape
* real fix
* revert that
* sdpa symbolic tests
2026-03-02 12:19:48 +08:00
wozeparrot
a4f6365929
llama3: fstep takes grads ( #15069 )
2026-03-01 20:05:07 -08:00
Nick
8e8e9f6ff6
assert removal for _tri() + tests ( #15073 )
...
* assert removal for _tri() and tests
* removed import
* tests triu/tril like in prefill
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-03-02 10:34:28 +08:00
nimlgen
ccbbca05ef
beam: add dev_timeout for am ( #15063 )
...
* beam: add dev_timeout for am
* all covered
* fk
* x
* fuzz
* reset
* f
2026-03-01 16:57:29 +03:00
chenyu
8cb4368967
delete unused END NOOP rule [pr] ( #15077 )
2026-03-01 00:09:05 -05:00
chenyu
efce99adc9
skip isComposing key press in llm.py ( #15076 )
...
for the CJK input user
2026-02-28 20:31:53 -05:00
chenyu
103ea16ec0
add contiguous back to svd ( #15074 )
...
can cause infinite loop
2026-02-28 16:49:26 -05:00
chenyu
fe0fa8333b
Revert "improve Tensor.sort indices ( #15070 )" ( #15072 )
...
This reverts commit e3003631f2 .
2026-02-28 14:40:30 -05:00
chenyu
e3003631f2
improve Tensor.sort indices ( #15070 )
...
* improve Tensor.sort indices
instead of N^2 match at the end, have an arange to start and go through the same N(logN)^2 path
* contiguous
2026-02-28 14:16:16 -05:00
wozeparrot
cfc5cf65ad
llama3: vocab padding fix + jit copies on fakedata ( #15067 )
2026-02-28 08:44:55 -08:00
chenyu
76170d035a
relax atol for test_xlm_roberta_large ( #15066 )
2026-02-28 11:22:35 -05:00
qazal
cfb8e6922d
viz: arrow keys move through time ( #15064 )
...
* work
* automatic zoom, keeping scale
* the whole shape should be out of view
2026-02-28 23:52:36 +09:00
nimlgen
9b3450c9da
test gpu crash on cdna ( #15062 )
2026-02-28 13:17:59 +03:00
nimlgen
6bbf813dd3
ci: switch to tinygrad/amdcomgr_dylib ( #15061 )
2026-02-28 13:09:39 +03:00
nimlgen
77846300b2
am: reset vm fault ( #15060 )
2026-02-28 12:58:56 +03:00
George Hotz
dc54441e1f
add better printing to tinygrad.apps.llm ( #15059 )
...
* add better printing to tinygrad.apps.llm
* add gc.collect
* comment
2026-02-28 16:38:50 +08:00
George Hotz
bb84e389cf
functions for llama trainer ( #15045 )
...
* functions for llama trainer
* function there
* axis match
* fix multi
* lil cleaner
* there's a bug with HK_FLASH_ATTENTION
* training functions
* for commit
2026-02-28 12:15:18 +08:00
chenyu
9b4ba3f838
remove ReduceContext.range_to_ends [pr] ( #15055 )
...
* remove ReduceContext.range_to_ends [pr]
make merge_reduce_ends pure. this state is causing issue when introducing more reduce merging rewrites
* tag
2026-02-27 22:15:44 -05:00
chenyu
151608aa90
update test_multiple_to_single_device ( #15056 )
...
follow up to #14482 , add SCACHE=0 to the test
2026-02-27 21:44:33 -05:00
chenyu
5fd06f4f02
differentiable setitem ( #15054 )
...
* differentiable setitem
go through the where path for bw
* no return
2026-02-27 17:25:15 -05:00
chenyu
db6b3e1edc
fix mixed setitem with both basic and tensor indexing ( #15050 )
2026-02-27 15:35:48 -05:00
chenyu
c9f6d8751b
don't remove_bufferize for Invalid ( #15053 )
...
* don't remove_bufferize for Invalid
* replaced
2026-02-27 15:16:09 -05:00