Commit Graph

10417 Commits

Author SHA1 Message Date
George Hotz
550cf2ca7f tests from postopt (#11964)
* tests from postopt

* reraise is fine
2025-09-02 13:34:17 -07:00
qazal
b977ec0813 viz: axes domains cleanup (#11962) 2025-09-02 19:30:45 +03:00
nimlgen
897254ad6c ci: add dev<->cpu copy speeds (#11959) 2025-09-02 15:22:44 +03:00
George Hotz
74040663bf make ptrdtype a UOp property (#11955) 2025-09-01 16:35:43 -07:00
George Hotz
0dfca4e74b add failing test for rangeify setitem (#11954) 2025-09-01 16:24:35 -07:00
wozeparrot
7c21271a5f feat: end_lr envvar (#11953) 2025-09-01 14:53:07 -07:00
chenyu
6a40216724 correct bf16 fuzz input in test_dtype_alu (#11933)
it was using float16 inputs, now it's uint16 then convert to bf16
2025-09-01 10:52:26 -04:00
chenyu
965ea59b16 test_dtype_alu use AMD_LLVM from helpers (#11950) 2025-09-01 10:03:17 -04:00
b1tg
a9f07c31bc fix amd llvm sqrt (#11936)
* fix amd llvm sqrt

* lint

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-09-01 09:31:14 -04:00
qazal
0a53e72f70 viz: fix trace duration in python test decoder (#11949) 2025-09-01 14:32:25 +03:00
qazal
27c9ed5a84 viz: more consistent naming of events (#11948)
* s/shapes/events in test_viz

* s/bufs/events in the memory packer
2025-09-01 14:16:47 +03:00
qazal
c7bb561ef9 remu: add v_rsq_f32_e32 instruction (#11947)
https://github.com/tinygrad/tinygrad/pull/11936 introduces a change to
the AMD LLVM renderer that outputs this instruction. Adding both 32 and
64 bit variants.
2025-09-01 11:29:31 +03:00
Sieds Lykles
d9560a631c remove cast between ints if safe (#11946) 2025-09-01 05:56:49 +02:00
Sieds Lykles
a19d689481 fix vec dtype _min_max (#11944) 2025-09-01 03:24:07 +02:00
Sieds Lykles
f32f3464d6 Can safe cast from certain ints to floats (#11941)
* add rule

* add some tests

* prevent infinite loop with bfloat16

* add some ints to double and float can_safe_cast

* add tests
2025-09-01 00:51:24 +02:00
Sieds Lykles
1c6e43c203 Double cast is one cast if intermediate cast is safe (#11939)
* add rule

* add some tests

* prevent infinite loop with bfloat16

* prevent more infinite rewrite
2025-09-01 00:36:29 +02:00
wozeparrot
7e68045fb2 feat: small llama3 training (#11829) 2025-08-31 13:41:47 -07:00
nimlgen
020abe0556 hcq: finalize without synchronization when in error state (#11872)
* hcq: finalize without synchronization when in error state

* ooops

* fix

* fix

* fix
2025-08-31 18:39:13 +03:00
qazal
2004c9757d tracing: add default clock (#11935) 2025-08-31 18:24:44 +03:00
b1tg
c1eeb3b99c only skip AMD_LLVM (#11934)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-31 18:15:47 +03:00
b1tg
75d380a77c fix transcendentals in python renderer (#11932)
* fix transcendentals in python renderer

* add test

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-31 09:37:17 -04:00
Sieds Lykles
61e4dc6ad5 render special arg in cstyle if arg is UOp (#11931) 2025-08-31 07:01:29 +02:00
Sieds Lykles
d3252ccd85 fix special vmax when arg is UOp (#11930) 2025-08-31 06:54:39 +02:00
qazal
0bacd9fc9b viz: give disassembly its own node (#11927) 2025-08-31 00:28:52 +03:00
chenyu
af89be317e relax rtol for bfloat16 test_dtype_alu (#11926) 2025-08-30 17:16:08 -04:00
George Hotz
632c2fb119 lowerer works on rangeifed + print exception (#11925) 2025-08-30 12:05:44 -07:00
qazal
c27b99d68f viz: refactor to indexed rewrite traces (#11923) 2025-08-30 20:01:47 +03:00
qazal
9aff00a6ea switch viz command line args to pathlib (#11922) 2025-08-30 18:13:47 +03:00
qazal
c86ee5bfaf viz: canonicalize device name colors (#11921) 2025-08-30 18:12:30 +03:00
nimlgen
a4f05ebd1a ci: rebuild gpuocelot with boost libs (#11920) 2025-08-30 17:24:19 +03:00
qazal
bf0d055b39 viz: color by name (#11919) 2025-08-30 16:04:58 +03:00
Sieds Lykles
0bc34c000f simplify range mod its own upper bound (#11917)
* add rules

* add tests
2025-08-30 08:37:35 +02:00
chenyu
561318fea7 Tensor.cos in test_stype_alu (#11916)
* Tensor.cos in test_stype_alu

* need this fix anyway
2025-08-29 20:26:36 -04:00
NoahKusaba
0838021753 remove np from beautiful_cifar (#10988)
* remove np from beautiful_cifar

* remove np from cifar

* rename variable and rename tensor.arrange to just tensor.randperm

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-29 19:34:16 -04:00
nimlgen
cf9d8c8142 ci: pin boost for macos runners (#11910) 2025-08-30 01:38:06 +03:00
nimlgen
c6e342cdac mockgpu: no hang if gpuocelot failed (#11915) 2025-08-30 00:44:49 +03:00
chenyu
26d03a86a1 test_symbolic_ops.py cleanup (#11895) 2025-08-29 17:11:59 -04:00
b1tg
b2cc06218a python bfloat16 (#11912)
* python bf16

* _to_torch_storage_type

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-29 15:18:02 -04:00
George Hotz
afad7d0cd1 remove dtype from range, it will be dtypes.index soon [pr] (#11914)
* remove dtype from range, it will be dtypes.index soon [pr]

* a few more
2025-08-29 09:52:07 -07:00
qazal
30e72d5820 multi device and copy tracing for NULL device (#11913)
* add device name to NULL programs

* trace transfers
2025-08-29 15:31:00 +03:00
qazal
d8e1e4dc61 tracing: show NULL programs (#11911) 2025-08-29 14:09:33 +03:00
nimlgen
75678b2cbe amd: retire pm4 xcc sync (#11835)
* amd: aql default when several xccs

* amd: retire om4 xcc sync

* remove more

* more

* more
2025-08-29 09:56:27 +03:00
George Hotz
394c2d1db1 update Kernel API in tests + move optimize_local_size (#11907) 2025-08-28 15:12:47 -07:00
nimlgen
fa695ac1ce ci: mac gpuocelot (#11906)
* gm

* fix?

* ops

* imp

* xx

* add file
2025-08-28 23:29:43 +03:00
George Hotz
b9b438c516 small updates from postopt (#11903)
* tests from postopt

* modernize

* skip lin tests

* that's fixed?

* skip, not failure
2025-08-28 12:34:52 -07:00
nimlgen
bb55a3001f nv: flush reset message (#11897) 2025-08-28 22:17:20 +03:00
nimlgen
e8289c75b1 ci: do not reinstall existing pkgs in macos (#11900) 2025-08-28 21:20:15 +03:00
chenyu
134cf56904 update cache name for gpuocelot (#11896) 2025-08-28 13:11:10 -04:00
Ben Waldron
ea1be2e4cd [bounty] Remove using reshape to register symbolic shape (#11771)
* Modify tests and start work towards removing symbolic reshape

* Refactor symbolic reshape

* fix small error

* much cleaner + fix more tests

* Can remove this now

* Update test_symbolic_ops and test_tiny

* Couple more tests

* Unused import

* More tests and add EXPAND to Tensor.empty

* Fix test beam search

* all int

* Fix rangeify by adding shrink

* Remove OOB check and so fix test_symbolic_jit

* test_symbolic_jit doesn't need OOB Context anymore either

* Should remove that test now

* Cleanups part 1

* fix linters

* Final cleanups

* Don't reassign inside for loop

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 12:30:49 -04:00
qazal
53853ae49b viz: switch to Path2D (#11892) 2025-08-28 18:58:16 +03:00