chenyu
a0cbbc35ad
remove LLAMA_LAYERS in ci ( #12562 )
2025-10-09 04:46:41 -04:00
nimlgen
658c566e22
vars in gated_read_image_count ( #12486 )
...
* vars in gated_read_image_count
* nc
2025-10-09 14:54:15 +08:00
chenyu
942022c309
smaller LLAMA_LAYER in Test llama 3 training ( #12516 )
...
very slow now
2025-10-08 05:10:51 -04:00
chenyu
e701106a64
remove FUSE_ARANGE ( #12511 )
...
it was the default already
2025-10-08 04:54:07 -04:00
chenyu
da1f46ff3f
remove RANGEIFY specific test jobs ( #12507 )
2025-10-08 04:12:04 -04:00
George Hotz
403fdfcfd4
check spec in test, cleanup vectorize render ( #12484 )
2025-10-07 17:05:50 +08:00
chenyu
8ad5f9e74f
skip slow benchmarks ( #12481 )
...
* skip slow benchmarks
padded tc is already slow, rest are slow with rangeify (correct if run locally)
* relax more
2025-10-07 03:28:56 -04:00
chenyu
1823a5043f
don't check MAX_BUFFER_SIZE on NULL ( #12461 )
2025-10-05 22:09:29 -04:00
chenyu
74b04f7dca
test beautiful_mnist_multigpu ( #12455 )
...
* test beautiful_mnist_multigpu
another example that fails with RANGEIFY
* now i remember
* MAX_BUFFER_SIZE=0
2025-10-05 08:45:01 -04:00
chenyu
98163832e4
update RANGEIFY test_cast_padded ( #12421 )
...
* update RANGEIFY test_cast_padded
* update test
2025-10-02 04:37:35 -04:00
chenyu
37beef6de3
add null bert training test in ci ( #12420 )
...
fails with RANGEIFY `RuntimeError: children not making progress`
2025-10-02 04:05:19 -04:00
b1tg
ec177c80c2
rangeify: fix test_where_fold (llvm) ( #12416 )
...
* rangeify: fix test_where_fold (AMD_LLVM)
* rm comment
2025-10-02 02:57:49 -04:00
qazal
d1c868f990
fix limit_bufs with multi ( #12414 )
2025-10-02 05:51:56 +03:00
qazal
5b649616ff
rangeify: detect and assert cycles ( #12405 )
...
* rangeify: assert cycles
* rng=2
* any
2025-10-02 03:39:43 +03:00
b1tg
ac3d457d5e
rangeify: TestReduceOpsConstFolding ( #12397 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-10-01 17:58:19 +08:00
chenyu
6c95b1f39d
explicitly set device for CI unit test ( #12399 )
2025-10-01 05:16:54 -04:00
chenyu
689ab9151b
more RANGEIFY tests ( #12393 )
...
would have caught the load alt regression without adding too many tests
2025-10-01 03:43:58 -04:00
b1tg
154d114364
rangeify: fix abstractions2.py ( #12386 )
...
* rangeify: fix abstractions2.py
* tests
* lint
* only abstractions2
* base
2025-10-01 09:58:56 +03:00
b1tg
da52006bde
rangeify: fix test_scatter_reduce ( #12380 )
...
* rangeify: fix test_scatter_reduce
* ext_vector_type
* set alignment=1 on boolean
2025-09-30 23:26:36 -04:00
chenyu
8def8145e4
ALLOWED_KERNEL_COUNT openpilot 0.9.4 with RANGEIFY ( #12381 )
2025-09-30 22:58:59 -04:00
qazal
26247573e1
rangeify multi tests on gpu ( #12376 )
...
* rangeify multi tests on gpu
* fix limit_bufs
2025-10-01 04:53:04 +03:00
chenyu
b4a4817c9c
fix rangeigy test_linalg ( #12365 )
2025-09-30 06:28:35 -04:00
b1tg
c9ef5d8fe5
rangeify: fix test_tensor_index_overflow (CPU_LLVM=1) ( #12362 )
...
* rangeify: fix test_tensor_index_overflow (CPU_LLVM=1)
* add test
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-09-30 05:55:15 -04:00
qazal
6a56d3c859
rangeify: only test correctness in multi ( #12339 )
...
* work
* more work
* back here
* skip tests
* work
2025-09-30 09:55:59 +03:00
George Hotz
ab6b0d3a21
enable cleanup_dead_axes ( #12351 )
...
* enable cleanup_dead_axes
* don't mess with user contig
* correct tag behavior
* double reshape isn't correct
* block on assign too
* skip messing with symbolic
* Fix tests
* disable RANGEIFY=2
* test w rangeify
2025-09-30 14:09:39 +08:00
qazal
2a7310ab59
rangeify: fix remaining multi correctness issue ( #12354 )
2025-09-30 08:08:27 +03:00
chenyu
881709cd33
don't skip rangeify test_instancenorm_3d ( #12350 )
...
seems fine now
2025-09-30 00:05:59 -04:00
hooved
39aae679e4
Support bfloat16 on NULL backend ( #12340 )
...
* add failing test
* move test
* only run test with NULL default
* add skip reason
* add fix
2025-09-30 00:02:30 -04:00
chenyu
af935e7d32
Revert "reduce const folding ( #12344 )" ( #12349 )
...
This reverts commit 8e508a9927 .
2025-09-29 23:45:30 -04:00
qazal
05275c9ec3
rangeify: enable assign to mstack target ( #12345 )
2025-09-30 06:27:57 +03:00
chenyu
8e508a9927
reduce const folding ( #12344 )
2025-09-29 23:08:56 -04:00
qazal
32d69d07d7
rangeify: enable multitensor TestBatchNorm ( #12342 )
2025-09-30 06:05:00 +03:00
Sieds Lykles
c38f6ce140
unified_rewrite: use deque and dont add nodes to the stack multiple times ( #12320 )
...
* use deque instead of list
* increase ctx.progress and max stack_len
* add openpilot
* prevent placing uops on stack many times
* revert increasing ctx.progress and stack length limit
* dont block adding to the stack there
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-09-30 10:02:28 +08:00
hooved
c2689c505e
Clip model updates for Stable Diffusion mlperf training ( #12313 )
...
* stable diffusion mlperf clip changes
* add clip tests
* set gelu as attribute
* add more tests
* factor out GPUS
* rerun CI
* add imports to if blocks
* remove unneeded axis
* add clip tests to CI
* move clip tests
* add deps, disable max buf size
2025-09-29 21:50:14 -04:00
qazal
250cb10e8f
rangeify permuted assign ( #12299 )
...
* enable RANGEIFY=1 test_assign
* work
* rangeify=0 asserts this ast
* remove that
* beta test, it's correct though
* skip multi
* matches torch/np output
* memcopy without memcopy
* can remove this
* rangeify isn't silently wrong anymore
* diff cleanup
* use UOp toposort instead of global tags
* actual assert TestRangeifyAssign
* step
* work
* this isn't optimizing away now
* some todos
* test fusion schedule
* typo
* dedup idxs
* cleaner
* pre
* work
* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583
Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" ( #12318 )
...
This reverts commit 6f1cf717de .
2025-09-28 19:10:21 +02:00
Sieds Lykles
6f1cf717de
Bufferize early, fix "children not making progress" on big graphs ( #12308 )
...
* bufferize children early
* cleaner
* fix types
* lower number of reduceops
* test openpilot
2025-09-27 04:17:15 +02:00
qazal
8b2e0930d7
rangeify: enable passing multi test ( #12301 )
2025-09-26 08:31:13 +03:00
Sieds Lykles
74411984fc
Rangeify IMAGE ( #12304 )
...
* add imagedtype to rangeify
* enable some image tests
* move the tests
* image upcast before locals
* add if statement
* rangeify image_dtype test
* decrease read_image count
2025-09-26 07:21:02 +02:00
chenyu
17cec8d645
RANGEIFY winograd test ( #12297 )
...
speed seems fine
2025-09-24 23:42:32 -04:00
qazal
38ecefaacb
RANGEIFY=1 allreduce ( #12260 )
...
* ci
* extract mops
* work
* assert early
* port this?
* can realize shard
* allreduce passing
* notes
* better handling of shard
* err
* outerworld allreduce twice
* work
* don't tag movement ops
* don't tag movement ops
* delete old logic
* 19 failing + ram
* cleanup
* reset stuff
* simplest failing test
* diff
* test_ones
* allreduce work
* allreduce more work
* down to 22 failing tests
* port _device_num
* replace creates a new UOp here
* pour symbolic everywhere
* 7 failing
* focus on allreduce
* work
* cleanup
* more ci
* fix test_schedule_ring
* post index const shape
* much better
* diff cleanup
2025-09-24 18:13:08 +03:00
qazal
1400ce105f
rangeify: fix sharding ( #12288 )
2025-09-24 14:33:56 +03:00
qazal
154c865966
rangeify: fix ram usage in multi ( #12286 )
2025-09-24 13:48:58 +03:00
qazal
ad7c8c21ea
rangeify: INDEX doesn't passthrough MSELECT ( #12279 )
2025-09-23 21:36:50 +03:00
nimlgen
02a7b7fe48
rangeify: fix test_setitem ( #12269 )
...
* rangeify: fix test_setitem
* um?
* better?
* simple where folding
* f
* revert
* x
2025-09-23 20:42:36 +03:00
qazal
2f145a98e0
rangeify: fix contiguous multi ( #12278 )
...
* rangeify: fix contiguous multi
* when it's changing root, it should construct a new UOp
2025-09-23 20:05:29 +03:00
nimlgen
5f4eeb054c
rangeify: passes now ( #12277 )
2025-09-23 18:46:49 +03:00
chenyu
51b88b2265
process replay tests in rangeify ( #12274 )
2025-09-23 01:30:06 -04:00
chenyu
b03ceb806e
move test_sample to test_randomness ( #12266 )
2025-09-21 21:11:32 -04:00
nimlgen
b53a266254
rangeify: fix test_optim ( #12262 )
...
* rangeify: fix test_optim
* add to cl?
* these are good now
2025-09-21 18:08:35 +03:00