Commit Graph

12148 Commits

Author SHA1 Message Date
George Hotz
befc1e800c assembly/amd: disasm is test only (#14694)
* assembly/amd: disasm is test only

* viz uses str
2026-02-12 12:33:46 +08:00
George Hotz
c331798201 move tests to test/backend (#14691)
* move tests to test/backend

* fix imports

* fix CI

* revert that one

* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
wozeparrot
4b5d3bda1f llama3: data seed (#14681) 2026-02-11 19:04:40 -08:00
chenyu
0c63f63ee4 recursive resolve assign dependency (#14688)
remove the .realize in llm.py
2026-02-11 17:41:05 -05:00
nimlgen
869083e373 nv: pciiface pma (#14686)
* x

* w

* z

* clean

* o

* r

* x

* c

* r

* list

* deanon

* b
2026-02-11 23:29:07 +03:00
chenyu
cbbc2fdea5 update test_assign_slice_then_read (#14687)
passes locally now
2026-02-11 15:02:44 -05:00
chenyu
7465b22ba0 handle setitem target in rangeify (#14685) 2026-02-11 11:38:59 -05:00
chenyu
0d215b962e few setitem test cases diff from numpy (#14684)
have claude fuzzed frontend and found some real bugs
2026-02-11 08:41:03 -05:00
nimlgen
df8b21eeb5 add real self assign test (#14683)
* self assign fix

* no
2026-02-11 12:41:53 +03:00
wozeparrot
a60220bed9 llama3: move dl to numpy & jit more (#14677)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-02-10 18:16:40 -08:00
George Hotz
4565958792 some lil speedups (#14679) 2026-02-11 10:01:58 +08:00
George Hotz
2d4ad9e739 add a waitlist for graph rewrite (#14678)
* add a waitlist for graph rewrite

* cleaner

* one context on spec check
2026-02-11 09:30:13 +08:00
Christopher Milan
389e2eeda1 Revert "transcendental works with long decomp" (#14676) 2026-02-10 19:46:34 -05:00
Christopher Milan
0662c8037d transcendental works with long decomp (#14672) 2026-02-10 19:30:24 -05:00
George Hotz
3fab43c57c add cache to asm gemm (#14675) 2026-02-11 08:26:30 +08:00
chenyu
ebef63dba0 update test_self_assign_same_device_copy (#14673)
that test would have passed without the optimization because .to shortcut
2026-02-10 17:23:43 -05:00
nimlgen
aafa9dcb5b eliminate same-device copy self-assigns (#14671)
* eliminate same-device copy self-assigns

* ugh
2026-02-10 22:54:51 +03:00
chenyu
494eec2694 test_setitem_const_fused (#14668)
did not realize #14640 also fixed #10690, so added a test for it
2026-02-10 08:33:02 -05:00
nimlgen
42ded7c34d amd: bind aql (#14666)
* amd: bind to aql

* bind

* x

* f
2026-02-10 16:28:11 +03:00
George Hotz
82974929b7 use PARAM in schedule (#14665)
* use PARAM in schedule

* create_new_buffer
2026-02-10 19:18:40 +08:00
George Hotz
8dc46dde07 everything has dtype.long now (#14661)
* everything has dtype.long now

* int64/uint64 are everywhere now

* that doesn't work
2026-02-10 15:08:50 +08:00
Christopher Milan
cdb78954cb better cl compiler name (#14660)
cl_compiler instead of compiler because overriding Compiled.compiler seems more confusing
2026-02-10 01:03:46 -05:00
George Hotz
cc9bf8ccbc move more to null/unit tests (#14658)
* move more to null tests

* move test_gc

* no test fusion op
2026-02-10 13:35:17 +08:00
chenyu
83f6d28579 two less realize in setitem (#14655) 2026-02-09 23:45:24 -05:00
wozeparrot
69574542ab fix: use correct fa implementation in eval (#14651) 2026-02-09 18:20:44 -08:00
chenyu
0dedf4063c minor test_setitem cleanup (#14654) 2026-02-09 20:40:29 -05:00
Christopher Milan
b36b62eb59 don't push docker cache for PRs (#14652) 2026-02-09 19:55:55 -05:00
Christopher Milan
e6562a5061 remove CompilerPair (#14638) 2026-02-09 19:51:18 -05:00
Christopher Milan
396e1320fb bump cache version for z3 (#14650) 2026-02-09 19:32:07 -05:00
chenyu
9e3f24db9f assign realize fix (#14649)
fix the need for explicit assign. track pending assigns for each buffer, and run those before the main realize in order
2026-02-09 17:46:46 -05:00
chenyu
0913c068ea clean up setitem disk path (#14648) 2026-02-09 15:58:04 -05:00
chenyu
205a1212b7 delegate non Tensor src setitem to assign (#14647)
cannot do this for DISK in the unified path
2026-02-09 13:53:20 -05:00
chenyu
e9f40f49d4 explicitly check advanced setitem (#14644)
advanced setitem DISK would failed in rangeify with bad error, now it's checked directly in setitem. eventully DISK can use regular setitem path
2026-02-09 13:36:46 -05:00
chenyu
20a132b1c4 relax atol for test_uop_scan_matmul (#14646)
flaky, also log max diff
2026-02-09 13:25:19 -05:00
qazal
50d3f6cea5 EVAL_BS=0 in llama profile (#14643) 2026-02-10 00:49:43 +09:00
chenyu
8a2c23d3dc raise RuntimeError for setitem dtype mismatch (#14642) 2026-02-09 10:37:08 -05:00
qazal
80b0119cef llama: add new asm gemm shape (#14611)
* llama: add new asm gemm shape

* work

* cleanup

* half dtype

* more comment
2026-02-10 00:34:29 +09:00
chenyu
a49e038c0c dont manually broadcast in setitem (#14641)
handled by assign
2026-02-09 09:34:09 -05:00
chenyu
2c3e3559eb remove a contiguous in basic setitem (#14640)
handled in rangeify
2026-02-09 09:19:46 -05:00
chenyu
6c0c8e2ac3 setitem push a realize to basic setitem (#14637)
advanced setitem does not need it
2026-02-09 08:54:07 -05:00
nimlgen
e087c58ae0 print tables in llama/profile.sh (#14639) 2026-02-09 12:32:54 +03:00
Christopher Milan
27f7ea478b new style DSP renderer (#14636)
* new style DSP renderer

* cleanup
2026-02-09 00:39:03 -05:00
Christopher Milan
efac5b9ef6 new style NV/CUDA renderers, try 2 (#14634)
* new style NV/CUDA renderers, try 2

* fix diskcache
2026-02-08 22:58:48 -05:00
Christopher Milan
0ebb508b85 new style metal compiler (#14632) 2026-02-08 21:58:25 -05:00
Christopher Milan
9eef9f38ad new style python renderer (#14631) 2026-02-08 21:45:07 -05:00
Christopher Milan
5f2f2cc956 Revert "new style NV/CUDA renderers (#14627)" (#14633)
This reverts commit 0e505951b0.
2026-02-08 21:16:03 -05:00
Christopher Milan
4ad787ece2 new style CPULLVMRenderer (#14629) 2026-02-08 21:05:01 -05:00
Christopher Milan
0e505951b0 new style NV/CUDA renderers (#14627)
* new style NV/CUDA renderers

* fix pickle

* oops

* fix CUDA_CC=NVCC

* mockgpu uses PTXCompiler

* oops

* ruff

* dont discard stderr

* ugh
2026-02-08 21:04:51 -05:00
Filip Brzek
1667669c46 fix: python3 -m tinygrad.device reporting on AMD/CPU (#14622)
* test: device module expects PASS in -m tinygrad.device for CPU

* fix: use device._compiler_name instead of unwrap_class_type(compiler).__name__ in enumerate_devices_str
2026-02-08 20:22:35 +03:00
nimlgen
01a4ee4d66 do not hive_reset when amdgpu (#14624) 2026-02-08 19:14:13 +03:00