1147 Commits

Author SHA1 Message Date
b1tg
ac3d457d5e rangeify: TestReduceOpsConstFolding (#12397)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-10-01 17:58:19 +08:00
chenyu
6c95b1f39d explicitly set device for CI unit test (#12399) 2025-10-01 05:16:54 -04:00
chenyu
689ab9151b more RANGEIFY tests (#12393)
would have caught the load alt regression without adding too many tests
2025-10-01 03:43:58 -04:00
b1tg
154d114364 rangeify: fix abstractions2.py (#12386)
* rangeify: fix abstractions2.py

* tests

* lint

* only abstractions2

* base
2025-10-01 09:58:56 +03:00
b1tg
da52006bde rangeify: fix test_scatter_reduce (#12380)
* rangeify: fix test_scatter_reduce

* ext_vector_type

* set alignment=1 on boolean
2025-09-30 23:26:36 -04:00
chenyu
8def8145e4 ALLOWED_KERNEL_COUNT openpilot 0.9.4 with RANGEIFY (#12381) 2025-09-30 22:58:59 -04:00
qazal
26247573e1 rangeify multi tests on gpu (#12376)
* rangeify multi tests on gpu

* fix limit_bufs
2025-10-01 04:53:04 +03:00
chenyu
b4a4817c9c fix rangeigy test_linalg (#12365) 2025-09-30 06:28:35 -04:00
b1tg
c9ef5d8fe5 rangeify: fix test_tensor_index_overflow (CPU_LLVM=1) (#12362)
* rangeify: fix test_tensor_index_overflow (CPU_LLVM=1)

* add test

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-09-30 05:55:15 -04:00
qazal
6a56d3c859 rangeify: only test correctness in multi (#12339)
* work

* more work

* back here

* skip tests

* work
2025-09-30 09:55:59 +03:00
George Hotz
ab6b0d3a21 enable cleanup_dead_axes (#12351)
* enable cleanup_dead_axes

* don't mess with user contig

* correct tag behavior

* double reshape isn't correct

* block on assign too

* skip messing with symbolic

* Fix tests

* disable RANGEIFY=2

* test w rangeify
2025-09-30 14:09:39 +08:00
qazal
2a7310ab59 rangeify: fix remaining multi correctness issue (#12354) 2025-09-30 08:08:27 +03:00
chenyu
881709cd33 don't skip rangeify test_instancenorm_3d (#12350)
seems fine now
2025-09-30 00:05:59 -04:00
hooved
39aae679e4 Support bfloat16 on NULL backend (#12340)
* add failing test

* move test

* only run test with NULL default

* add skip reason

* add fix
2025-09-30 00:02:30 -04:00
chenyu
af935e7d32 Revert "reduce const folding (#12344)" (#12349)
This reverts commit 8e508a9927.
2025-09-29 23:45:30 -04:00
qazal
05275c9ec3 rangeify: enable assign to mstack target (#12345) 2025-09-30 06:27:57 +03:00
chenyu
8e508a9927 reduce const folding (#12344) 2025-09-29 23:08:56 -04:00
qazal
32d69d07d7 rangeify: enable multitensor TestBatchNorm (#12342) 2025-09-30 06:05:00 +03:00
Sieds Lykles
c38f6ce140 unified_rewrite: use deque and dont add nodes to the stack multiple times (#12320)
* use deque instead of list

* increase ctx.progress and max stack_len

* add openpilot

* prevent placing uops on stack many times

* revert increasing ctx.progress and stack length limit

* dont block adding to the stack there

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-09-30 10:02:28 +08:00
hooved
c2689c505e Clip model updates for Stable Diffusion mlperf training (#12313)
* stable diffusion mlperf clip changes

* add clip tests

* set gelu as attribute

* add more tests

* factor out GPUS

* rerun CI

* add imports to if blocks

* remove unneeded axis

* add clip tests to CI

* move clip tests

* add deps, disable max buf size
2025-09-29 21:50:14 -04:00
qazal
250cb10e8f rangeify permuted assign (#12299)
* enable RANGEIFY=1 test_assign

* work

* rangeify=0 asserts this ast

* remove that

* beta test, it's correct though

* skip multi

* matches torch/np output

* memcopy without memcopy

* can remove this

* rangeify isn't silently wrong anymore

* diff cleanup

* use UOp toposort instead of global tags

* actual assert TestRangeifyAssign

* step

* work

* this isn't optimizing away now

* some todos

* test fusion schedule

* typo

* dedup idxs

* cleaner

* pre

* work

* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583 Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" (#12318)
This reverts commit 6f1cf717de.
2025-09-28 19:10:21 +02:00
Sieds Lykles
6f1cf717de Bufferize early, fix "children not making progress" on big graphs (#12308)
* bufferize children early

* cleaner

* fix types

* lower number of reduceops

* test openpilot
2025-09-27 04:17:15 +02:00
qazal
8b2e0930d7 rangeify: enable passing multi test (#12301) 2025-09-26 08:31:13 +03:00
Sieds Lykles
74411984fc Rangeify IMAGE (#12304)
* add imagedtype to rangeify

* enable some image tests

* move the tests

* image upcast before locals

* add if statement

* rangeify image_dtype test

* decrease read_image count
2025-09-26 07:21:02 +02:00
chenyu
17cec8d645 RANGEIFY winograd test (#12297)
speed seems fine
2025-09-24 23:42:32 -04:00
qazal
38ecefaacb RANGEIFY=1 allreduce (#12260)
* ci

* extract mops

* work

* assert early

* port this?

* can realize shard

* allreduce passing

* notes

* better handling of shard

* err

* outerworld allreduce twice

* work

* don't tag movement ops

* don't tag movement ops

* delete old logic

* 19 failing + ram

* cleanup

* reset stuff

* simplest failing test

* diff

* test_ones

* allreduce work

* allreduce more work

* down to 22 failing tests

* port _device_num

* replace creates a new UOp here

* pour symbolic everywhere

* 7 failing

* focus on allreduce

* work

* cleanup

* more ci

* fix test_schedule_ring

* post index const shape

* much better

* diff cleanup
2025-09-24 18:13:08 +03:00
qazal
1400ce105f rangeify: fix sharding (#12288) 2025-09-24 14:33:56 +03:00
qazal
154c865966 rangeify: fix ram usage in multi (#12286) 2025-09-24 13:48:58 +03:00
qazal
ad7c8c21ea rangeify: INDEX doesn't passthrough MSELECT (#12279) 2025-09-23 21:36:50 +03:00
nimlgen
02a7b7fe48 rangeify: fix test_setitem (#12269)
* rangeify: fix test_setitem

* um?

* better?

* simple where folding

* f

* revert

* x
2025-09-23 20:42:36 +03:00
qazal
2f145a98e0 rangeify: fix contiguous multi (#12278)
* rangeify: fix contiguous multi

* when it's changing root, it should construct a new UOp
2025-09-23 20:05:29 +03:00
nimlgen
5f4eeb054c rangeify: passes now (#12277) 2025-09-23 18:46:49 +03:00
chenyu
51b88b2265 process replay tests in rangeify (#12274) 2025-09-23 01:30:06 -04:00
chenyu
b03ceb806e move test_sample to test_randomness (#12266) 2025-09-21 21:11:32 -04:00
nimlgen
b53a266254 rangeify: fix test_optim (#12262)
* rangeify: fix test_optim

* add to cl?

* these are good now
2025-09-21 18:08:35 +03:00
qazal
57c7e0a8f8 RANGEIFY=1 test_jit (#12254)
* RANGEIFY=1 test_jit

* don't do any of that

* disk

* simple disk tensor

* more work

* run more tests

* it also doesn't copy everytime

* skip tests that hang everything
2025-09-20 17:34:32 +03:00
chenyu
393c6b236c test case to sum twice in different order (#12253)
* test case to sum twice in different order

fixed by #12251

* try metal
2025-09-20 10:11:57 -04:00
Sieds Lykles
7e06d3ebba enable test_symbolic_jit (#12245)
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-09-19 20:23:42 +02:00
chenyu
a531a649fb test_resize_upsample_scales_cubic_align_corners_cpu is fixed (#12244) 2025-09-18 20:55:26 -04:00
chenyu
cff1065f5e test CL=1 RANGEIFY=1 onnx (#12240)
all except test_resize_upsample_scales_cubic_align_corners_cpu runs
2025-09-18 16:49:46 -04:00
chenyu
f82b16a0e9 RANGEIFY test_tensor (#12235) 2025-09-18 10:35:43 -04:00
chenyu
7487c13b61 truncate_fp16 -> float_to_fp16 (#12234)
match float_to_bf16 and float_to_fp8
2025-09-18 09:48:27 -04:00
Sieds Lykles
f1108f1cbe Enable test_symbolic_ops on rangeify (#12230)
* enable

* merge correctly
2025-09-18 02:12:36 +02:00
Sieds Lykles
812f485cd7 Enable threefry_doesnt_use_long test on rangeify (#12229)
* dont bufferize rangeify

* enable doesnt_use_long test
2025-09-18 01:58:34 +02:00
qazal
525f80e0d2 rangeify: enable putting consts back in the tensor graph (#12225)
* rangeify: enable putting consts back in the tensor graph

* work

* sym in ci
2025-09-17 19:45:04 +03:00
qazal
d917895569 map out rangeify errors in test_schedule (#12211)
* map out rangeify errors in test_schedule

* skip that

* add to ci
2025-09-17 09:10:28 +03:00
chenyu
5b12764b83 add arange cat arange test (#12217)
simple test case to catch wrong reduce const folding. also clean up the old arange complexity test
2025-09-16 17:12:32 -04:00
chenyu
494bb12500 skip slow cifar bf16 on red benchmark (#12213)
very slow to compile the fake bf16
2025-09-16 14:55:01 -04:00
chenyu
419e997187 increase benchmark timeout (#12212)
account for compile cache, and it's annoying that job died due to timeout also messes the machine
2025-09-16 14:09:02 -04:00