Commit Graph

7098 Commits

Author SHA1 Message Date
George Hotz
1d0b114a7b flip that 2025-10-06 19:25:05 +08:00
George Hotz
51301c3b22 no locals 2025-10-06 19:18:26 +08:00
George Hotz
17644fc304 gate pipeline 2025-10-06 19:02:21 +08:00
George Hotz
bf59379741 pipeline works on tc 2025-10-06 18:59:51 +08:00
George Hotz
97f122b591 tensor core works 2025-10-06 18:34:12 +08:00
George Hotz
afe31cc92a it works 2025-10-06 18:34:12 +08:00
George Hotz
3444e414f6 Merge branch 'master' into add_local_buffer 2025-10-06 16:17:08 +08:00
George Hotz
0c015a24fe use recursive_property to prevent RecursionError (#12465)
* use recursive_property to prevent RecursionError

* not slower

* fix tests

* faster

* simpler
2025-10-06 15:59:18 +08:00
George Hotz
39d8459ff2 comments 2025-10-06 14:13:31 +08:00
George Hotz
fdc0489e18 pipelining wip 2025-10-06 12:31:20 +08:00
qazal
1b1978b9c0 early copy fixup (#12463)
* simple failing test

* early copy fixup
2025-10-06 06:38:29 +03:00
chenyu
1823a5043f don't check MAX_BUFFER_SIZE on NULL (#12461) 2025-10-05 22:09:29 -04:00
George Hotz
46e8ea15c1 split pm_substitute_recurse (#12460) 2025-10-05 21:35:50 -04:00
George Hotz
df1b379a36 Merge branch 'master' into add_local_buffer 2025-10-06 08:58:46 +08:00
nimlgen
1216fff781 remote: raise runtimeerror in checkz (#12453) 2025-10-05 21:22:53 +08:00
George Hotz
a976ace404 minor improvements to rewrite (#12454)
* minor improvements to rewrite

* need that continue

* faster
2025-10-05 18:09:32 +08:00
qazal
4b60121498 fix bmnist torch with RANGEIFY=1 (#12442)
* fix bmnist torch with RANGEIFY=1

* alt

* test and comment

* this was always wrong

* simple failing test for rangeify

* simple upat to match the old behavior
2025-10-05 12:34:27 +03:00
George Hotz
b5f31d7505 earlier seen children (#12451) 2025-10-05 15:55:13 +08:00
Sieds Lykles
e74be4a140 UOp.factor and add chain sorting (#12413)
* add ordering

* fix some tests

* fix more tests

* shorten comment

* update test

* add rule and test

* add rule and test

* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line

* new algo

* add test

* add function to un-nest the div

* add UOp.factor

* test UOp.factor

* uop_given_valid tries to factor simplex expression

* shorten line

* symbolic_flat is back

* change that back

* fix those new tests

* new rule for ordering

* factor multiple factors

* no symbolic_flat

* symbolic_flat to there

* move that back

* fix imports

* merge correctly

* linter happy

* add rule

* add a test

* cleanup

* revert that for now

* UOp.factor returns self instead of None

* try all_candidates

* remove or_else

* post index symbolic

* add test

* maket this closer to the original

* increase mac hlb_cifar min step time

* add some ordering tests

* cleanup

* increase pytest timeout time

* check dtype
2025-10-04 06:05:38 +02:00
Sieds Lykles
394dc24110 post index symbolic (#12446)
* post index symbolic

* add test
2025-10-03 23:23:03 +02:00
George Hotz
0b534f71c2 recursive substitute should be O(n) (#12444)
* recursive substitute

* even faster

* make that a single rewrite
2025-10-03 18:29:59 +08:00
chenyu
940a8d5ba9 default IGNORE_OOB=1 (#12441)
* default IGNORE_OOB=1

z3 can get very slow with RANGEIFY, also update some kernel numbers to what it is

* add to test
2025-10-03 04:16:19 -04:00
George Hotz
d290e77a5b pend substitutes for speed (#12440) 2025-10-03 15:49:19 +08:00
nimlgen
23d310bcc1 ptx: handle i8/u8 casts correctly (#12439)
* ptx: handle casts correctly

* notsetp
2025-10-03 15:34:15 +08:00
George Hotz
c7849ac593 fix test lil model (#12437)
* fix test lil model

* 4 not 3
2025-10-03 02:28:37 -04:00
chenyu
0f82d92b9d use float for softmax in llm.py (#12438)
fixed numerical issue in `CPU=1 RANGEIFY=1 python3 -m tinygrad.apps.llm`
2025-10-03 02:27:56 -04:00
George Hotz
b9f7a7e218 Merge branch 'master' into add_local_buffer 2025-10-03 13:06:14 +08:00
George Hotz
4c63f7e786 skip copies of reshaped buffers (#12430)
* skip copies of reshaped buffers

* always run NOOP

* comment

* comment
2025-10-03 13:05:47 +08:00
Sieds Lykles
0047bcc535 undo loaded comparison swap (#12436)
* add rule

* add a test
2025-10-03 06:57:29 +02:00
George Hotz
9273d7d404 adding a local buffer is very simple now 2025-10-03 11:17:57 +08:00
George Hotz
a734437da8 skip copies of reshaped buffers 2025-10-03 10:55:58 +08:00
George Hotz
9cd365c12e little changes from double gemm (#12429)
* little changes from double gemm

* split pm_group_for_reduce

* pm_add_buffers_local

* Revert "pm_add_buffers_local"

This reverts commit 4d30a91db2.
2025-10-03 10:31:51 +08:00
chenyu
2d24af888b REWRITE_STACK_LIMIT (#12426) 2025-10-02 21:51:04 -04:00
hooved
1b58ef0d60 Increase stack size limit in unified_rewrite (#12424)
* increase stack size limit

* rerun CI due to random tqdm test fail
2025-10-03 09:06:47 +08:00
qazal
17d36d0952 don't tag MSTACK/MSELECT on global buffers (#12423)
* don't tag MSTACK/MSELECT

* fix
2025-10-02 13:32:15 +03:00
qazal
f21851b099 ops: n^2 .device property fix (#12419)
* test case for a long rand chain

currently failing with RANGEIFY because device propogates too deep

* skip

* ops: n^2 .device property fix

* unskip

---------

Co-authored-by: Chen-Yu Yang <chenyu@fastmail.com>
2025-10-02 03:28:12 -04:00
b1tg
ec177c80c2 rangeify: fix test_where_fold (llvm) (#12416)
* rangeify: fix test_where_fold (AMD_LLVM)

* rm comment
2025-10-02 02:57:49 -04:00
qazal
13a25b2e67 rangeify: don't shape INDEX on kernelize (#12417) 2025-10-02 09:45:37 +03:00
George Hotz
0eee93f0c0 hotfix: disable split ranges for non rangeify 2025-10-02 13:15:24 +08:00
George Hotz
583553f467 split ranges (#12411)
* split ranges

* simpler

* split ranges

* range str

* fix test

* oops

* faster

* no group 2

* tests

* dont_sub_ranges_for_image

* revert that
2025-10-02 12:57:22 +08:00
qazal
6fc6b51b59 fix limit_bufs with kernelize (#12415) 2025-10-02 07:49:11 +03:00
qazal
d1c868f990 fix limit_bufs with multi (#12414) 2025-10-02 05:51:56 +03:00
George Hotz
3770dd9d80 annotate bufferize in viz 2025-10-02 09:20:50 +08:00
qazal
5b649616ff rangeify: detect and assert cycles (#12405)
* rangeify: assert cycles

* rng=2

* any
2025-10-02 03:39:43 +03:00
Sieds Lykles
9a64fc0d28 Load alt value with cast try 2 (#12407)
* add or_casted

* add tests and fix old tests

* cast load

* move that to pm_render

* add allow_any_len to gated load patterns in renderers

* slice [:2]
2025-10-02 00:55:29 +02:00
Sieds Lykles
2f8ac77c25 add allow_any_len to gated load patterns in renderers (#12406) 2025-10-01 20:35:32 +02:00
George Hotz
89bed28716 split reduceop (#12404)
* some rangeify tests fixed

* bring split reduceop to rangeify

* fix tests
2025-10-01 18:45:16 +08:00
b1tg
ac3d457d5e rangeify: TestReduceOpsConstFolding (#12397)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-10-01 17:58:19 +08:00
George Hotz
60e52fbe36 support opts in contig, simpler (#12400) 2025-10-01 17:20:04 +08:00
chenyu
adc8c3b28f Revert "load alt value with cast (#12384)" (#12392)
This reverts commit 05e91a248d.
2025-10-01 03:20:04 -04:00