George Hotz
1d0b114a7b
flip that
2025-10-06 19:25:05 +08:00
George Hotz
51301c3b22
no locals
2025-10-06 19:18:26 +08:00
George Hotz
17644fc304
gate pipeline
2025-10-06 19:02:21 +08:00
George Hotz
bf59379741
pipeline works on tc
2025-10-06 18:59:51 +08:00
George Hotz
97f122b591
tensor core works
2025-10-06 18:34:12 +08:00
George Hotz
afe31cc92a
it works
2025-10-06 18:34:12 +08:00
George Hotz
3444e414f6
Merge branch 'master' into add_local_buffer
2025-10-06 16:17:08 +08:00
George Hotz
0c015a24fe
use recursive_property to prevent RecursionError ( #12465 )
...
* use recursive_property to prevent RecursionError
* not slower
* fix tests
* faster
* simpler
2025-10-06 15:59:18 +08:00
George Hotz
39d8459ff2
comments
2025-10-06 14:13:31 +08:00
George Hotz
fdc0489e18
pipelining wip
2025-10-06 12:31:20 +08:00
qazal
1b1978b9c0
early copy fixup ( #12463 )
...
* simple failing test
* early copy fixup
2025-10-06 06:38:29 +03:00
chenyu
1823a5043f
don't check MAX_BUFFER_SIZE on NULL ( #12461 )
2025-10-05 22:09:29 -04:00
George Hotz
46e8ea15c1
split pm_substitute_recurse ( #12460 )
2025-10-05 21:35:50 -04:00
George Hotz
df1b379a36
Merge branch 'master' into add_local_buffer
2025-10-06 08:58:46 +08:00
nimlgen
1216fff781
remote: raise runtimeerror in checkz ( #12453 )
2025-10-05 21:22:53 +08:00
George Hotz
a976ace404
minor improvements to rewrite ( #12454 )
...
* minor improvements to rewrite
* need that continue
* faster
2025-10-05 18:09:32 +08:00
qazal
4b60121498
fix bmnist torch with RANGEIFY=1 ( #12442 )
...
* fix bmnist torch with RANGEIFY=1
* alt
* test and comment
* this was always wrong
* simple failing test for rangeify
* simple upat to match the old behavior
2025-10-05 12:34:27 +03:00
George Hotz
b5f31d7505
earlier seen children ( #12451 )
2025-10-05 15:55:13 +08:00
Sieds Lykles
e74be4a140
UOp.factor and add chain sorting ( #12413 )
...
* add ordering
* fix some tests
* fix more tests
* shorten comment
* update test
* add rule and test
* add rule and test
* remove check
* use fold_divmod_congruence instead of simplify
* adjust tests
* shorten line
* new algo
* add test
* add function to un-nest the div
* add UOp.factor
* test UOp.factor
* uop_given_valid tries to factor simplex expression
* shorten line
* symbolic_flat is back
* change that back
* fix those new tests
* new rule for ordering
* factor multiple factors
* no symbolic_flat
* symbolic_flat to there
* move that back
* fix imports
* merge correctly
* linter happy
* add rule
* add a test
* cleanup
* revert that for now
* UOp.factor returns self instead of None
* try all_candidates
* remove or_else
* post index symbolic
* add test
* maket this closer to the original
* increase mac hlb_cifar min step time
* add some ordering tests
* cleanup
* increase pytest timeout time
* check dtype
2025-10-04 06:05:38 +02:00
Sieds Lykles
394dc24110
post index symbolic ( #12446 )
...
* post index symbolic
* add test
2025-10-03 23:23:03 +02:00
George Hotz
0b534f71c2
recursive substitute should be O(n) ( #12444 )
...
* recursive substitute
* even faster
* make that a single rewrite
2025-10-03 18:29:59 +08:00
chenyu
940a8d5ba9
default IGNORE_OOB=1 ( #12441 )
...
* default IGNORE_OOB=1
z3 can get very slow with RANGEIFY, also update some kernel numbers to what it is
* add to test
2025-10-03 04:16:19 -04:00
George Hotz
d290e77a5b
pend substitutes for speed ( #12440 )
2025-10-03 15:49:19 +08:00
nimlgen
23d310bcc1
ptx: handle i8/u8 casts correctly ( #12439 )
...
* ptx: handle casts correctly
* notsetp
2025-10-03 15:34:15 +08:00
George Hotz
c7849ac593
fix test lil model ( #12437 )
...
* fix test lil model
* 4 not 3
2025-10-03 02:28:37 -04:00
chenyu
0f82d92b9d
use float for softmax in llm.py ( #12438 )
...
fixed numerical issue in `CPU=1 RANGEIFY=1 python3 -m tinygrad.apps.llm`
2025-10-03 02:27:56 -04:00
George Hotz
b9f7a7e218
Merge branch 'master' into add_local_buffer
2025-10-03 13:06:14 +08:00
George Hotz
4c63f7e786
skip copies of reshaped buffers ( #12430 )
...
* skip copies of reshaped buffers
* always run NOOP
* comment
* comment
2025-10-03 13:05:47 +08:00
Sieds Lykles
0047bcc535
undo loaded comparison swap ( #12436 )
...
* add rule
* add a test
2025-10-03 06:57:29 +02:00
George Hotz
9273d7d404
adding a local buffer is very simple now
2025-10-03 11:17:57 +08:00
George Hotz
a734437da8
skip copies of reshaped buffers
2025-10-03 10:55:58 +08:00
George Hotz
9cd365c12e
little changes from double gemm ( #12429 )
...
* little changes from double gemm
* split pm_group_for_reduce
* pm_add_buffers_local
* Revert "pm_add_buffers_local"
This reverts commit 4d30a91db2 .
2025-10-03 10:31:51 +08:00
chenyu
2d24af888b
REWRITE_STACK_LIMIT ( #12426 )
2025-10-02 21:51:04 -04:00
hooved
1b58ef0d60
Increase stack size limit in unified_rewrite ( #12424 )
...
* increase stack size limit
* rerun CI due to random tqdm test fail
2025-10-03 09:06:47 +08:00
qazal
17d36d0952
don't tag MSTACK/MSELECT on global buffers ( #12423 )
...
* don't tag MSTACK/MSELECT
* fix
2025-10-02 13:32:15 +03:00
qazal
f21851b099
ops: n^2 .device property fix ( #12419 )
...
* test case for a long rand chain
currently failing with RANGEIFY because device propogates too deep
* skip
* ops: n^2 .device property fix
* unskip
---------
Co-authored-by: Chen-Yu Yang <chenyu@fastmail.com >
2025-10-02 03:28:12 -04:00
b1tg
ec177c80c2
rangeify: fix test_where_fold (llvm) ( #12416 )
...
* rangeify: fix test_where_fold (AMD_LLVM)
* rm comment
2025-10-02 02:57:49 -04:00
qazal
13a25b2e67
rangeify: don't shape INDEX on kernelize ( #12417 )
2025-10-02 09:45:37 +03:00
George Hotz
0eee93f0c0
hotfix: disable split ranges for non rangeify
2025-10-02 13:15:24 +08:00
George Hotz
583553f467
split ranges ( #12411 )
...
* split ranges
* simpler
* split ranges
* range str
* fix test
* oops
* faster
* no group 2
* tests
* dont_sub_ranges_for_image
* revert that
2025-10-02 12:57:22 +08:00
qazal
6fc6b51b59
fix limit_bufs with kernelize ( #12415 )
2025-10-02 07:49:11 +03:00
qazal
d1c868f990
fix limit_bufs with multi ( #12414 )
2025-10-02 05:51:56 +03:00
George Hotz
3770dd9d80
annotate bufferize in viz
2025-10-02 09:20:50 +08:00
qazal
5b649616ff
rangeify: detect and assert cycles ( #12405 )
...
* rangeify: assert cycles
* rng=2
* any
2025-10-02 03:39:43 +03:00
Sieds Lykles
9a64fc0d28
Load alt value with cast try 2 ( #12407 )
...
* add or_casted
* add tests and fix old tests
* cast load
* move that to pm_render
* add allow_any_len to gated load patterns in renderers
* slice [:2]
2025-10-02 00:55:29 +02:00
Sieds Lykles
2f8ac77c25
add allow_any_len to gated load patterns in renderers ( #12406 )
2025-10-01 20:35:32 +02:00
George Hotz
89bed28716
split reduceop ( #12404 )
...
* some rangeify tests fixed
* bring split reduceop to rangeify
* fix tests
2025-10-01 18:45:16 +08:00
b1tg
ac3d457d5e
rangeify: TestReduceOpsConstFolding ( #12397 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-10-01 17:58:19 +08:00
George Hotz
60e52fbe36
support opts in contig, simpler ( #12400 )
2025-10-01 17:20:04 +08:00
chenyu
adc8c3b28f
Revert "load alt value with cast ( #12384 )" ( #12392 )
...
This reverts commit 05e91a248d .
2025-10-01 03:20:04 -04:00