Christopher Milan
396e1320fb
bump cache version for z3 ( #14650 )
2026-02-09 19:32:07 -05:00
chenyu
9e3f24db9f
assign realize fix ( #14649 )
...
fix the need for explicit assign. track pending assigns for each buffer, and run those before the main realize in order
2026-02-09 17:46:46 -05:00
chenyu
0913c068ea
clean up setitem disk path ( #14648 )
2026-02-09 15:58:04 -05:00
chenyu
205a1212b7
delegate non Tensor src setitem to assign ( #14647 )
...
cannot do this for DISK in the unified path
2026-02-09 13:53:20 -05:00
chenyu
e9f40f49d4
explicitly check advanced setitem ( #14644 )
...
advanced setitem DISK would failed in rangeify with bad error, now it's checked directly in setitem. eventully DISK can use regular setitem path
2026-02-09 13:36:46 -05:00
chenyu
20a132b1c4
relax atol for test_uop_scan_matmul ( #14646 )
...
flaky, also log max diff
2026-02-09 13:25:19 -05:00
qazal
50d3f6cea5
EVAL_BS=0 in llama profile ( #14643 )
2026-02-10 00:49:43 +09:00
chenyu
8a2c23d3dc
raise RuntimeError for setitem dtype mismatch ( #14642 )
2026-02-09 10:37:08 -05:00
qazal
80b0119cef
llama: add new asm gemm shape ( #14611 )
...
* llama: add new asm gemm shape
* work
* cleanup
* half dtype
* more comment
2026-02-10 00:34:29 +09:00
chenyu
a49e038c0c
dont manually broadcast in setitem ( #14641 )
...
handled by assign
2026-02-09 09:34:09 -05:00
chenyu
2c3e3559eb
remove a contiguous in basic setitem ( #14640 )
...
handled in rangeify
2026-02-09 09:19:46 -05:00
chenyu
6c0c8e2ac3
setitem push a realize to basic setitem ( #14637 )
...
advanced setitem does not need it
2026-02-09 08:54:07 -05:00
nimlgen
e087c58ae0
print tables in llama/profile.sh ( #14639 )
2026-02-09 12:32:54 +03:00
Christopher Milan
27f7ea478b
new style DSP renderer ( #14636 )
...
* new style DSP renderer
* cleanup
2026-02-09 00:39:03 -05:00
Christopher Milan
efac5b9ef6
new style NV/CUDA renderers, try 2 ( #14634 )
...
* new style NV/CUDA renderers, try 2
* fix diskcache
2026-02-08 22:58:48 -05:00
Christopher Milan
0ebb508b85
new style metal compiler ( #14632 )
2026-02-08 21:58:25 -05:00
Christopher Milan
9eef9f38ad
new style python renderer ( #14631 )
2026-02-08 21:45:07 -05:00
Christopher Milan
5f2f2cc956
Revert "new style NV/CUDA renderers ( #14627 )" ( #14633 )
...
This reverts commit 0e505951b0 .
2026-02-08 21:16:03 -05:00
Christopher Milan
4ad787ece2
new style CPULLVMRenderer ( #14629 )
2026-02-08 21:05:01 -05:00
Christopher Milan
0e505951b0
new style NV/CUDA renderers ( #14627 )
...
* new style NV/CUDA renderers
* fix pickle
* oops
* fix CUDA_CC=NVCC
* mockgpu uses PTXCompiler
* oops
* ruff
* dont discard stderr
* ugh
2026-02-08 21:04:51 -05:00
Filip Brzek
1667669c46
fix: python3 -m tinygrad.device reporting on AMD/CPU ( #14622 )
...
* test: device module expects PASS in -m tinygrad.device for CPU
* fix: use device._compiler_name instead of unwrap_class_type(compiler).__name__ in enumerate_devices_str
2026-02-08 20:22:35 +03:00
nimlgen
01a4ee4d66
do not hive_reset when amdgpu ( #14624 )
2026-02-08 19:14:13 +03:00
nimlgen
a615b9d781
am: f8_mode for gfx94x only ( #14620 )
2026-02-08 17:38:48 +03:00
chenyu
c28f7d0167
remove realize in Tensor.svd ( #14623 )
2026-02-08 09:36:31 -05:00
qazal
087dab4c3b
gemm/asm: split out cdna tests from CI ( #14619 )
...
* gemm/asm: split out cdna tests from CI
* reorder
* work
2026-02-08 21:33:42 +09:00
George Hotz
183d38b128
remove CUSTOM_KERNEL / directly construct it ( #14604 )
...
* remove CUSTOM_KERNEL / directly construct it
* clean that up
* simpler multi
* custom kernel spec
* remove Kernel
* fix multi
* use sharded shape
* explicit regression test
2026-02-08 18:43:33 +08:00
nimlgen
e29a88ca09
hive_reset respects lock ( #14618 )
2026-02-08 10:47:25 +03:00
qazal
b10802eb53
use existing VIZ ContextVar instead of getenv ( #14610 )
2026-02-08 15:37:55 +09:00
chenyu
510b65489e
style change rangeify assign [pr] ( #14616 )
...
consistent naming, also a standalone fucntion to replace complicated lambda
2026-02-07 15:47:32 -05:00
chenyu
b7afd4471c
use arg instead of 3rd op for ASSIGN [pr] ( #14613 )
2026-02-07 14:17:10 -05:00
nimlgen
88c3022223
amd: kfd iface early exit ( #14612 )
...
* amd: kfd iface early exit
* l
* revert
2026-02-07 18:57:10 +03:00
nimlgen
ce7bfc6ce8
nv: use nv_flags for all fields ( #14607 )
2026-02-07 15:01:38 +03:00
qazal
c2544e2252
viz: remove outdated comment ( #14608 )
2026-02-07 20:05:24 +09:00
nimlgen
6838b35cff
mockgpu: hevc ( #14606 )
...
* mockgpu: hevc
* eng
2026-02-07 12:27:55 +03:00
chenyu
884592f6c8
pin z3-solver version ( #14605 )
...
found exact input that crashes z3 4.15.4
2026-02-06 22:49:31 -05:00
George Hotz
7a2a3b5c71
Remove Ops.KERNEL, it's all Ops.CALL now ( #14603 )
2026-02-07 10:21:54 +08:00
George Hotz
ca6604eae2
kernel is call ( #14577 )
...
* call is kernel
* closer
* fix bugs
* dedup
* pm_gate_kernel_sink
* better
* Revert "better"
This reverts commit b4c799b810 .
* Reapply "better"
This reverts commit e53f094ce7 .
* cleanups
* work
* remove junk
* subtle fix
* index
* viz cleanups
* disable assert for now
2026-02-07 10:10:14 +08:00
wozeparrot
d87ae1c84c
feat: tinyfs load test in benchmark ( #14602 )
2026-02-06 18:00:00 -08:00
ttomsa
462b455562
cleanup linearize ( #14523 )
2026-02-07 08:54:02 +08:00
ttomsa
d5652e4da2
new dtype aliases ( #14596 )
2026-02-07 08:53:35 +08:00
Christopher Milan
ad9e2f0de7
decompose bf16 ( #14601 )
2026-02-06 19:24:09 -05:00
Christopher Milan
7bb45e7df0
decompose fp8 to bigger floats [skip_process_replay] ( #14554 )
...
* decompose fp8 also
* it works
* cleanup
* no shift required
* default to float
* cleanup
* fixes
* fp8e5m2
* don't rely on behavior comparing nans
* cleanup
2026-02-06 19:05:40 -05:00
chenyu
81f6cdb4ab
delete realize_assign [pr] ( #14575 )
...
use realize and realize_srcs like COPY and STORE. src[0] always has BUFFER for base
2026-02-06 17:12:33 -05:00
chenyu
7d193a6e26
fix wgsl bitcast ( #14600 )
...
was wrong for signed int
2026-02-06 16:57:36 -05:00
chenyu
b9fe8b7591
fix opt in process replay [pr] ( #14599 )
2026-02-06 16:49:56 -05:00
chenyu
197ebcbbbc
log seed with flush=True in fuzz_symbolic ( #14597 )
...
* log seed with flush=True in fuzz_symbolic
i think z3 can crash. added reading seed from argv to see if we repro later
* fuzz_symbolic_symbolic_div
2026-02-06 15:03:57 -05:00
nimlgen
fbb67a3f95
am_smi: fix after regen ( #14594 )
2026-02-06 20:57:41 +03:00
qazal
a80fb4e641
viz: better ordering of device engines in profiler ( #14590 )
2026-02-06 23:08:09 +09:00
qazal
b7e3fbe07e
llama: add VIZ=-1 to dev_run ( #14583 )
...
* llama: add VIZ=-1 to dev_run
* readme
* cleaner
* add profile.sh script
* better grouping of options
* add other row
* readme edits
* work
2026-02-06 22:59:22 +09:00
nimlgen
fbeb978170
diff devices for sdma ( #14589 )
...
* start
* x
* fix
* sdma
* c
* clean
* x
* hm
* cleaer
2026-02-06 16:39:12 +03:00