Commit Graph

4954 Commits

Author SHA1 Message Date
qazal
4976544bf9 multi ram usage tests on the NULL device (#14457) 2026-01-31 14:14:53 +09:00
chenyu
99b44121bc failed test case for non-consecutive disk read (#14455)
silently fail now
2026-01-30 23:44:04 -05:00
Christopher Milan
e575dd8275 prevent UB in long decomp and more emulated tests (#14447) 2026-01-30 19:38:41 -05:00
chenyu
3204f94454 correct var_vals schedule filter (#14451)
complete_create_schedule_with_vars returns var_vals that's used in schedule
2026-01-30 17:10:07 -05:00
chenyu
cfcd1debb5 test schedule with multiple AFTER (#14449) 2026-01-30 15:59:00 -05:00
chenyu
03613e83ad update TestTensorMetadata (#14443)
run with SCACHE=0 some more TODOs
2026-01-30 12:39:01 -05:00
chenyu
26f5c00265 move TestTensorMetadata to unit (#14442) 2026-01-30 12:14:21 -05:00
George Hotz
838cd078bc use atomics for embedding backward (#14400)
* embedding is slow

* failing

* float is fine

* null

* it fails

* simplify embedding with broadcasting

* ATOMIC_ADD incoming

* min change

* simpler test

* better test

* fix test

* real test

* simpler

* cleanups

* types and names

* _zero_kernel

* grad multi

* hack

* none

* multi unshard

* more for call

* don't tag in call

* good

* call_multi

* call_multi wow claude is useless

* embedding backward mutli test

* test passes

* fix as_param

* shape_to_shape_arg

* add clip

* before cast

* fix spec=2, use atomics
2026-01-30 18:10:59 +08:00
George Hotz
7a9dee4e50 add call/param UOps (#14433)
* add call/param UOps

* resolve call

* skip that for now

* grad on call

* fix tests
2026-01-30 14:51:45 +08:00
qazal
66d6a68016 viz: sqtt work from cdna gemm (#14434)
* it's the tag

* initialize rows based on the disasm

* test_cfg with Ops.BINARY

* pyremu wants s_code_end?

* test_diamond

* diff cleanup
2026-01-30 14:00:56 +09:00
chenyu
86a204d22a allow Tensor setitem input to be list/tuple (#14432)
matches assign, and generally matches numpy
2026-01-29 21:26:58 -05:00
chenyu
ddc041854b failed test case for disk setitem (#14426)
strided setitem is wrong
2026-01-29 14:54:19 -05:00
nimlgen
230d08ec70 test for am recovery and faults handling (#14421)
* test for am recovery and faults handling

* linter
2026-01-29 17:11:24 +03:00
chenyu
2b5e99ccc1 minor type cleanups [pr] (#14408)
mypy --warn-redundant-casts has false negative
2026-01-28 14:11:50 -05:00
chenyu
7b9bc1d8cf _MockMemoryviewMeta for mockgpu (#14405)
fixed `PYTHONPATH=. TYPED=1 DEV=AMD MOCKGPU=1 python test/test_tiny.py`. basically make `isinstance(TrackedMemoryView_instance, memoryview)` true
2026-01-28 11:59:00 -05:00
chenyu
a9b44070a8 fix webgpu runtime types (#14402)
`CHECK_OOB=0 DEV=WEBGPU TYPED=1 python test/test_tiny.py` passed, also skip tests that failed locally
2026-01-28 10:37:25 -05:00
qazal
0294014108 fix bufferize cost function for multi, improve VIZ=-1 cli (#14394)
* improve cli

* remove_bufferize change
2026-01-28 15:53:18 +09:00
qazal
c158acea29 failing multi ram usage test from llama gemm (#14392) 2026-01-28 14:32:32 +09:00
Christopher Milan
5e36482314 decompose long to ints where unsupported, try 2 (#14383) 2026-01-27 23:20:43 -05:00
George Hotz
88bc5ee212 assembly/amd: rename to better names (#14384)
* assembly/amd: rename to better names

* might help fuzzing segfault

* emu2 -> emu
2026-01-28 10:00:54 +08:00
George Hotz
065b95cfb0 Revert "add retry to fetch (#14370)" (#14385)
This reverts commit dc4d7f2d55.
2026-01-28 09:35:37 +08:00
Eitan Turok
dc4d7f2d55 add retry to fetch (#14370) 2026-01-27 14:04:25 -08:00
chenyu
8d1f3c8885 fix copysign for inf input (#14381)
* fix copysign for inf input

* llvm olt
2026-01-27 16:45:48 -05:00
Christopher Milan
289a3e415e also skip test_nonoverlapping_shrink_assignment (#14382) 2026-01-27 16:26:26 -05:00
Christopher Milan
f34efc1ad1 DISABLE_FAST_IDIV actually works as a ContextVar (#14378) 2026-01-27 16:12:42 -05:00
chenyu
8c899e4aaf fix copysign for -0 (#14380)
test both x and 1/x < 0 work too. and found another big with the * 0 hack
2026-01-27 15:44:58 -05:00
chenyu
62884585a7 failed test case for copysign -0.0 (#14379)
* failed test case for copysign -0.0

* skip those
2026-01-27 14:37:17 -05:00
chenyu
db010a31be IGNORE_OOB -> CHECK_OOB [pr] (#14374)
flip the meaning
2026-01-27 12:20:59 -05:00
chenyu
c22667b0c4 also skip test_overlapping_shrink_assignment_reverse (#14375)
crashing
2026-01-27 12:20:39 -05:00
George Hotz
0ced258726 HOTFIX: skip crashing assign test 2026-01-27 20:35:17 +08:00
imaolo
14574c68fa Add ContextVar to disable the scheduler cache (#14257)
* add scheduler cache ContextVar

* test scheduler cache context var

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-01-27 19:55:29 +08:00
George Hotz
bfc88bcfb8 assembly/amd: emu refactors + enable PYTHON_REMU by default (#14361)
* assembly/amd: start refactors

* cleanups

* those are global

* methods on ctx

* const cleanup

* range helper

* types and imports

* cleanups

* cleanups

* remove stale name

* fix emu2 types

* more typing

* more mypy

* cleanups

* fxns

* scc cleanup

* cleanups

* cleanups

* simpler parse_pcode

* laneid

* no defaults for pcode

* pcode is not optional

* cleanups

* functions cleanup

* splat

* expr_parser functions

* single tok

* invert global loops

* try_eat

* minor

* run parser on all

* no silent 0

* tests
2026-01-27 17:42:24 +08:00
Christopher Milan
2e72625652 Revert "decompose dtypes.long to ints where unsupported (#14261)" (#14362) 2026-01-27 02:04:59 -05:00
Christopher Milan
0793319929 decompose dtypes.long to ints where unsupported (#14261)
* add works

* use carry not overflow

* bitwise ops

* use tag instead of vec

* cleaner

* mul somewhat works

* mul actually works

* SUB and NEG work

* SHL/SHR

* ulong support

* this should work?

* oops

* fix indexing

* all ALU mostly works

* refactor

* test_dtype passing

* signed division works

* format

* clean

* some tests

* ruff
2026-01-26 18:34:13 -05:00
chenyu
d641e63189 improve min/max for AND (#14356) 2026-01-26 15:44:18 -05:00
chenyu
f16372487a fix assign hazard on shrink (#14355)
* fix assign hazard on shrink

possible to have race if both assign src and dest are shrink

* test_nonoverlapping_shrink_assignment
2026-01-26 14:46:30 -05:00
chenyu
823bc17fb5 failed test case for shrink overlap assigns (#14350)
* failed test case for shrink overlap assigns

current logic can create a race resulted in wrong output

* skip for now
2026-01-26 11:58:45 -05:00
George Hotz
204f51e739 assembly/amd: bug fixes for PYTHON_REMU (#14347)
* default PYTHON_REMU to 1

* mockgpu

* less size

* normal compile path

* uniqie

* more

* fix clamp

* Change PYTHON_REMU default to 0 in _try_dlopen_remu
2026-01-27 00:48:22 +08:00
chenyu
231305603d remove REAL_DEV [pr] (#14337)
it's just Device.DEFAULT now
2026-01-26 10:08:16 -05:00
George Hotz
3b43d26f10 assembly/amd: emu speed (#14344)
* assembly/amd: emu speed

* fix spec

* go

* don't do this

* simpler

* no stupid consts

* hack

* simpler

* no index

* no where

* faster linearizer

* fix spec

* no index dtype
2026-01-26 22:21:34 +08:00
George Hotz
774a454bb5 assembly/amd: fix scratch SVE (#14340)
* assembly/amd: default python REMU

* mem_used

* no lane

* sve

* remove that

* needs s_code_end in tests
2026-01-26 21:03:51 +08:00
George Hotz
be23776ba7 assembly/amd: replace pcode with ucode (#14002)
* a bunch of todos for my boy claude

* uops have types

* lil cleanups

* simpler ucode

* isNAN

* calls

* move more

* cleanup pcode_parse

* cvt functions

* fix parser bugs

* no void

* minmax

* more pcode parse

* pretty print

* transform

* comments

* move to transform

* assign/declare

* simpler norm

* single PM

* just Uops

* simpler

* more typed

* all rewrite

* less verbose

* work

* spec

* transform

* work

* simpler spec

* less spec

* bitcast

* simpler

* simp ucode

* work

* more in pcode_transform

* remove junk

* more functions

* bug

* no void assign

* load/store

* wave

* fixes

* move denorm

* move more functions

* tests

* cat is shape None

* uop syntax

* move a few more

* program_spec

* cat stuff

* assign fix clear

* unused

* nans

* fp bits

* works with simplify

* remove junk

* special

* meh

* more

* more

* update test pcode parse

* improve parser

* parse some for loops

* merge master

* dead files

* tests pass

* emu2

* better emu2

* test_plus works

* uselessly write more instructions

* use pcode

* something

* something

* bench_emu

* progress

* ds works

* work

* work

* more passing

* run compare

* bench_emu

* more pcode

* a few more

* bugfixes

* bugfix

* test fixes

* tests pass without USE_HW

* all hw tests pass

* add more hw tests

* new hw tests

* bit

* less handcode

* parse more

* consolidate pcode

* fixes

* rsrc

* lane pcode

* cleanups

* simpler

* emu bugs

* one cmp test fails

* fix decode and upd name

* fix name and test harness

* _ftz_f32

* fix denorm

* fix VOPD and use load

* fix carry bug

* no load where / just invalid

* clean

* simpler

* merge sops

* refactoring

* simplifications

* bugfixes

* new tests

* f16 sin fix

* assertion and hw tests

* cvt functions

* one more failure

* bugfixes

* bugfix + regression

* more tests

* fmac

* no manual unrolling

* ordering

* LLVM backend is a lot faster

* compile inst

* more bugs

* f16

* bugfix

* fix regression

* one clang call

* 1M inst

* scratch works

* do scratch correctly

* cleanup

* regression

* cmp

* fmamk fixes

* merge

* fix vcmpx

* unify memory

* remove unused code

* ignore oob for test

* cleanups

* fix mbs

* unify cmp

* test

* minor cleanups

* bump timeout

* fix tests

* revert the CMPLE stuff

* remove opt

* less diff

* simpler

* revert

* support multiple backends

* memset is a lot faster

* split out in bench emu

* improve timing

* timing

* cache that

* cache that

* simpler and faster

* tokenize

* binop table

* simpler

* move to parser

* tok for lambda

* refactor

* expr_parser

* delete emu2_pcode

* import cleanup

* lil

* if parse

* work

* simpler

* no v

* trig preop is faster

* durations for tests

* fix cmp bug

* sdst

* remove scartch_size hack

* null behavior

* _MXCSRContext

* bugfixes

* DEBUG >= 3

* test smem crashes my gpu

* debug

* test

* test smem

* profiler

* full inst

* bugfix

* rtag(1)

* pc is 64-bit and word

* pc is real code now

* dynamic

* more dynamic

* fix oob access

* fix crash, more dyn

* all dyn

* really all dyn

* correct null mask

* lit + format

* 21s on the tests

* 13s on the tests

* canonical name

* simm16

* more dyn

* 14s

* proper saddr dedup

* dyn

* debug 5

* better 5

* revert dynamic stuff

* that can be dyn

* negative offsets

* dyn wmma

* f16 wmma support / ops / dtype / dtype_alu

* symbolic changes not needed

* ConstFloat

* more uop.const

* __eq__

* uop tests

* fix f16

* bf16 tensor cores

* whitespace

* remove cast roundtrip

* Revert "remove cast roundtrip"

This reverts commit c5bb0381c3.

* just the fix

* remove dead paths

* llvm runs
2026-01-26 18:04:29 +08:00
George Hotz
984cdc4840 add wrapper class for the -0.0 != 0.0 issue (#14339)
* add wrapper class for the -0.0 != 0.0 issue

* fixes

* spec fix

* missed one
2026-01-26 16:52:37 +08:00
George Hotz
cc49e47ea2 tinygrad changes from ucode (#14336)
* tinygrad changes from ucode

* dtype
2026-01-26 11:30:18 +08:00
nimlgen
21ab23ae18 nv: add pma for ada (#14328)
* nv: add pma for ada

* um

* fix

* shorter

* mock
2026-01-25 17:33:37 +03:00
qazal
bf2d9d138f viz: simplify amdgpu cfg (#14326)
* viz: replace llvm disasm with our disasm

* it starts with more code

* then it becomes less

* simpler, cdna disassembles with decimal simm16

* s_branch is upper case, add test

* simm16s and others
2026-01-25 15:21:45 +09:00
chenyu
cb69b7b2b2 comment out fold_where_closure (#14316) 2026-01-24 10:15:42 -05:00
wozeparrot
d74587f16d fa multi fix 2 (#14314) 2026-01-23 23:35:02 -08:00
Christopher Milan
e782d44918 WEBGPU/NIR truncates ints (#14307)
* WEBGPU truncates ints

* nir has this bug too
2026-01-23 19:28:06 -05:00
nimlgen
26220a472e no core_id (#14265)
* no core_id

* kwargs

* est

* linters

* ugh

* revert this

* deps

* glb

* should work?

* nn

* line

* fx

* ym

* z

* d

* um?

* revert

* this one?

* first half

* um p2

* all?

* um

* cleaner

* um
2026-01-23 21:30:12 +03:00