qazal
4976544bf9
multi ram usage tests on the NULL device ( #14457 )
2026-01-31 14:14:53 +09:00
chenyu
99b44121bc
failed test case for non-consecutive disk read ( #14455 )
...
silently fail now
2026-01-30 23:44:04 -05:00
Christopher Milan
e575dd8275
prevent UB in long decomp and more emulated tests ( #14447 )
2026-01-30 19:38:41 -05:00
chenyu
3204f94454
correct var_vals schedule filter ( #14451 )
...
complete_create_schedule_with_vars returns var_vals that's used in schedule
2026-01-30 17:10:07 -05:00
chenyu
cfcd1debb5
test schedule with multiple AFTER ( #14449 )
2026-01-30 15:59:00 -05:00
chenyu
03613e83ad
update TestTensorMetadata ( #14443 )
...
run with SCACHE=0 some more TODOs
2026-01-30 12:39:01 -05:00
chenyu
26f5c00265
move TestTensorMetadata to unit ( #14442 )
2026-01-30 12:14:21 -05:00
George Hotz
838cd078bc
use atomics for embedding backward ( #14400 )
...
* embedding is slow
* failing
* float is fine
* null
* it fails
* simplify embedding with broadcasting
* ATOMIC_ADD incoming
* min change
* simpler test
* better test
* fix test
* real test
* simpler
* cleanups
* types and names
* _zero_kernel
* grad multi
* hack
* none
* multi unshard
* more for call
* don't tag in call
* good
* call_multi
* call_multi wow claude is useless
* embedding backward mutli test
* test passes
* fix as_param
* shape_to_shape_arg
* add clip
* before cast
* fix spec=2, use atomics
2026-01-30 18:10:59 +08:00
George Hotz
7a9dee4e50
add call/param UOps ( #14433 )
...
* add call/param UOps
* resolve call
* skip that for now
* grad on call
* fix tests
2026-01-30 14:51:45 +08:00
qazal
66d6a68016
viz: sqtt work from cdna gemm ( #14434 )
...
* it's the tag
* initialize rows based on the disasm
* test_cfg with Ops.BINARY
* pyremu wants s_code_end?
* test_diamond
* diff cleanup
2026-01-30 14:00:56 +09:00
chenyu
86a204d22a
allow Tensor setitem input to be list/tuple ( #14432 )
...
matches assign, and generally matches numpy
2026-01-29 21:26:58 -05:00
chenyu
ddc041854b
failed test case for disk setitem ( #14426 )
...
strided setitem is wrong
2026-01-29 14:54:19 -05:00
nimlgen
230d08ec70
test for am recovery and faults handling ( #14421 )
...
* test for am recovery and faults handling
* linter
2026-01-29 17:11:24 +03:00
chenyu
2b5e99ccc1
minor type cleanups [pr] ( #14408 )
...
mypy --warn-redundant-casts has false negative
2026-01-28 14:11:50 -05:00
chenyu
7b9bc1d8cf
_MockMemoryviewMeta for mockgpu ( #14405 )
...
fixed `PYTHONPATH=. TYPED=1 DEV=AMD MOCKGPU=1 python test/test_tiny.py`. basically make `isinstance(TrackedMemoryView_instance, memoryview)` true
2026-01-28 11:59:00 -05:00
chenyu
a9b44070a8
fix webgpu runtime types ( #14402 )
...
`CHECK_OOB=0 DEV=WEBGPU TYPED=1 python test/test_tiny.py` passed, also skip tests that failed locally
2026-01-28 10:37:25 -05:00
qazal
0294014108
fix bufferize cost function for multi, improve VIZ=-1 cli ( #14394 )
...
* improve cli
* remove_bufferize change
2026-01-28 15:53:18 +09:00
qazal
c158acea29
failing multi ram usage test from llama gemm ( #14392 )
2026-01-28 14:32:32 +09:00
Christopher Milan
5e36482314
decompose long to ints where unsupported, try 2 ( #14383 )
2026-01-27 23:20:43 -05:00
George Hotz
88bc5ee212
assembly/amd: rename to better names ( #14384 )
...
* assembly/amd: rename to better names
* might help fuzzing segfault
* emu2 -> emu
2026-01-28 10:00:54 +08:00
George Hotz
065b95cfb0
Revert "add retry to fetch ( #14370 )" ( #14385 )
...
This reverts commit dc4d7f2d55 .
2026-01-28 09:35:37 +08:00
Eitan Turok
dc4d7f2d55
add retry to fetch ( #14370 )
2026-01-27 14:04:25 -08:00
chenyu
8d1f3c8885
fix copysign for inf input ( #14381 )
...
* fix copysign for inf input
* llvm olt
2026-01-27 16:45:48 -05:00
Christopher Milan
289a3e415e
also skip test_nonoverlapping_shrink_assignment ( #14382 )
2026-01-27 16:26:26 -05:00
Christopher Milan
f34efc1ad1
DISABLE_FAST_IDIV actually works as a ContextVar ( #14378 )
2026-01-27 16:12:42 -05:00
chenyu
8c899e4aaf
fix copysign for -0 ( #14380 )
...
test both x and 1/x < 0 work too. and found another big with the * 0 hack
2026-01-27 15:44:58 -05:00
chenyu
62884585a7
failed test case for copysign -0.0 ( #14379 )
...
* failed test case for copysign -0.0
* skip those
2026-01-27 14:37:17 -05:00
chenyu
db010a31be
IGNORE_OOB -> CHECK_OOB [pr] ( #14374 )
...
flip the meaning
2026-01-27 12:20:59 -05:00
chenyu
c22667b0c4
also skip test_overlapping_shrink_assignment_reverse ( #14375 )
...
crashing
2026-01-27 12:20:39 -05:00
George Hotz
0ced258726
HOTFIX: skip crashing assign test
2026-01-27 20:35:17 +08:00
imaolo
14574c68fa
Add ContextVar to disable the scheduler cache ( #14257 )
...
* add scheduler cache ContextVar
* test scheduler cache context var
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-01-27 19:55:29 +08:00
George Hotz
bfc88bcfb8
assembly/amd: emu refactors + enable PYTHON_REMU by default ( #14361 )
...
* assembly/amd: start refactors
* cleanups
* those are global
* methods on ctx
* const cleanup
* range helper
* types and imports
* cleanups
* cleanups
* remove stale name
* fix emu2 types
* more typing
* more mypy
* cleanups
* fxns
* scc cleanup
* cleanups
* cleanups
* simpler parse_pcode
* laneid
* no defaults for pcode
* pcode is not optional
* cleanups
* functions cleanup
* splat
* expr_parser functions
* single tok
* invert global loops
* try_eat
* minor
* run parser on all
* no silent 0
* tests
2026-01-27 17:42:24 +08:00
Christopher Milan
2e72625652
Revert "decompose dtypes.long to ints where unsupported ( #14261 )" ( #14362 )
2026-01-27 02:04:59 -05:00
Christopher Milan
0793319929
decompose dtypes.long to ints where unsupported ( #14261 )
...
* add works
* use carry not overflow
* bitwise ops
* use tag instead of vec
* cleaner
* mul somewhat works
* mul actually works
* SUB and NEG work
* SHL/SHR
* ulong support
* this should work?
* oops
* fix indexing
* all ALU mostly works
* refactor
* test_dtype passing
* signed division works
* format
* clean
* some tests
* ruff
2026-01-26 18:34:13 -05:00
chenyu
d641e63189
improve min/max for AND ( #14356 )
2026-01-26 15:44:18 -05:00
chenyu
f16372487a
fix assign hazard on shrink ( #14355 )
...
* fix assign hazard on shrink
possible to have race if both assign src and dest are shrink
* test_nonoverlapping_shrink_assignment
2026-01-26 14:46:30 -05:00
chenyu
823bc17fb5
failed test case for shrink overlap assigns ( #14350 )
...
* failed test case for shrink overlap assigns
current logic can create a race resulted in wrong output
* skip for now
2026-01-26 11:58:45 -05:00
George Hotz
204f51e739
assembly/amd: bug fixes for PYTHON_REMU ( #14347 )
...
* default PYTHON_REMU to 1
* mockgpu
* less size
* normal compile path
* uniqie
* more
* fix clamp
* Change PYTHON_REMU default to 0 in _try_dlopen_remu
2026-01-27 00:48:22 +08:00
chenyu
231305603d
remove REAL_DEV [pr] ( #14337 )
...
it's just Device.DEFAULT now
2026-01-26 10:08:16 -05:00
George Hotz
3b43d26f10
assembly/amd: emu speed ( #14344 )
...
* assembly/amd: emu speed
* fix spec
* go
* don't do this
* simpler
* no stupid consts
* hack
* simpler
* no index
* no where
* faster linearizer
* fix spec
* no index dtype
2026-01-26 22:21:34 +08:00
George Hotz
774a454bb5
assembly/amd: fix scratch SVE ( #14340 )
...
* assembly/amd: default python REMU
* mem_used
* no lane
* sve
* remove that
* needs s_code_end in tests
2026-01-26 21:03:51 +08:00
George Hotz
be23776ba7
assembly/amd: replace pcode with ucode ( #14002 )
...
* a bunch of todos for my boy claude
* uops have types
* lil cleanups
* simpler ucode
* isNAN
* calls
* move more
* cleanup pcode_parse
* cvt functions
* fix parser bugs
* no void
* minmax
* more pcode parse
* pretty print
* transform
* comments
* move to transform
* assign/declare
* simpler norm
* single PM
* just Uops
* simpler
* more typed
* all rewrite
* less verbose
* work
* spec
* transform
* work
* simpler spec
* less spec
* bitcast
* simpler
* simp ucode
* work
* more in pcode_transform
* remove junk
* more functions
* bug
* no void assign
* load/store
* wave
* fixes
* move denorm
* move more functions
* tests
* cat is shape None
* uop syntax
* move a few more
* program_spec
* cat stuff
* assign fix clear
* unused
* nans
* fp bits
* works with simplify
* remove junk
* special
* meh
* more
* more
* update test pcode parse
* improve parser
* parse some for loops
* merge master
* dead files
* tests pass
* emu2
* better emu2
* test_plus works
* uselessly write more instructions
* use pcode
* something
* something
* bench_emu
* progress
* ds works
* work
* work
* more passing
* run compare
* bench_emu
* more pcode
* a few more
* bugfixes
* bugfix
* test fixes
* tests pass without USE_HW
* all hw tests pass
* add more hw tests
* new hw tests
* bit
* less handcode
* parse more
* consolidate pcode
* fixes
* rsrc
* lane pcode
* cleanups
* simpler
* emu bugs
* one cmp test fails
* fix decode and upd name
* fix name and test harness
* _ftz_f32
* fix denorm
* fix VOPD and use load
* fix carry bug
* no load where / just invalid
* clean
* simpler
* merge sops
* refactoring
* simplifications
* bugfixes
* new tests
* f16 sin fix
* assertion and hw tests
* cvt functions
* one more failure
* bugfixes
* bugfix + regression
* more tests
* fmac
* no manual unrolling
* ordering
* LLVM backend is a lot faster
* compile inst
* more bugs
* f16
* bugfix
* fix regression
* one clang call
* 1M inst
* scratch works
* do scratch correctly
* cleanup
* regression
* cmp
* fmamk fixes
* merge
* fix vcmpx
* unify memory
* remove unused code
* ignore oob for test
* cleanups
* fix mbs
* unify cmp
* test
* minor cleanups
* bump timeout
* fix tests
* revert the CMPLE stuff
* remove opt
* less diff
* simpler
* revert
* support multiple backends
* memset is a lot faster
* split out in bench emu
* improve timing
* timing
* cache that
* cache that
* simpler and faster
* tokenize
* binop table
* simpler
* move to parser
* tok for lambda
* refactor
* expr_parser
* delete emu2_pcode
* import cleanup
* lil
* if parse
* work
* simpler
* no v
* trig preop is faster
* durations for tests
* fix cmp bug
* sdst
* remove scartch_size hack
* null behavior
* _MXCSRContext
* bugfixes
* DEBUG >= 3
* test smem crashes my gpu
* debug
* test
* test smem
* profiler
* full inst
* bugfix
* rtag(1)
* pc is 64-bit and word
* pc is real code now
* dynamic
* more dynamic
* fix oob access
* fix crash, more dyn
* all dyn
* really all dyn
* correct null mask
* lit + format
* 21s on the tests
* 13s on the tests
* canonical name
* simm16
* more dyn
* 14s
* proper saddr dedup
* dyn
* debug 5
* better 5
* revert dynamic stuff
* that can be dyn
* negative offsets
* dyn wmma
* f16 wmma support / ops / dtype / dtype_alu
* symbolic changes not needed
* ConstFloat
* more uop.const
* __eq__
* uop tests
* fix f16
* bf16 tensor cores
* whitespace
* remove cast roundtrip
* Revert "remove cast roundtrip"
This reverts commit c5bb0381c3 .
* just the fix
* remove dead paths
* llvm runs
2026-01-26 18:04:29 +08:00
George Hotz
984cdc4840
add wrapper class for the -0.0 != 0.0 issue ( #14339 )
...
* add wrapper class for the -0.0 != 0.0 issue
* fixes
* spec fix
* missed one
2026-01-26 16:52:37 +08:00
George Hotz
cc49e47ea2
tinygrad changes from ucode ( #14336 )
...
* tinygrad changes from ucode
* dtype
2026-01-26 11:30:18 +08:00
nimlgen
21ab23ae18
nv: add pma for ada ( #14328 )
...
* nv: add pma for ada
* um
* fix
* shorter
* mock
2026-01-25 17:33:37 +03:00
qazal
bf2d9d138f
viz: simplify amdgpu cfg ( #14326 )
...
* viz: replace llvm disasm with our disasm
* it starts with more code
* then it becomes less
* simpler, cdna disassembles with decimal simm16
* s_branch is upper case, add test
* simm16s and others
2026-01-25 15:21:45 +09:00
chenyu
cb69b7b2b2
comment out fold_where_closure ( #14316 )
2026-01-24 10:15:42 -05:00
wozeparrot
d74587f16d
fa multi fix 2 ( #14314 )
2026-01-23 23:35:02 -08:00
Christopher Milan
e782d44918
WEBGPU/NIR truncates ints ( #14307 )
...
* WEBGPU truncates ints
* nir has this bug too
2026-01-23 19:28:06 -05:00
nimlgen
26220a472e
no core_id ( #14265 )
...
* no core_id
* kwargs
* est
* linters
* ugh
* revert this
* deps
* glb
* should work?
* nn
* line
* fx
* ym
* z
* d
* um?
* revert
* this one?
* first half
* um p2
* all?
* um
* cleaner
* um
2026-01-23 21:30:12 +03:00