hooved
c2689c505e
Clip model updates for Stable Diffusion mlperf training ( #12313 )
...
* stable diffusion mlperf clip changes
* add clip tests
* set gelu as attribute
* add more tests
* factor out GPUS
* rerun CI
* add imports to if blocks
* remove unneeded axis
* add clip tests to CI
* move clip tests
* add deps, disable max buf size
2025-09-29 21:50:14 -04:00
qazal
250cb10e8f
rangeify permuted assign ( #12299 )
...
* enable RANGEIFY=1 test_assign
* work
* rangeify=0 asserts this ast
* remove that
* beta test, it's correct though
* skip multi
* matches torch/np output
* memcopy without memcopy
* can remove this
* rangeify isn't silently wrong anymore
* diff cleanup
* use UOp toposort instead of global tags
* actual assert TestRangeifyAssign
* step
* work
* this isn't optimizing away now
* some todos
* test fusion schedule
* typo
* dedup idxs
* cleaner
* pre
* work
* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583
Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" ( #12318 )
...
This reverts commit 6f1cf717de .
2025-09-28 19:10:21 +02:00
Sieds Lykles
6f1cf717de
Bufferize early, fix "children not making progress" on big graphs ( #12308 )
...
* bufferize children early
* cleaner
* fix types
* lower number of reduceops
* test openpilot
2025-09-27 04:17:15 +02:00
qazal
8b2e0930d7
rangeify: enable passing multi test ( #12301 )
2025-09-26 08:31:13 +03:00
Sieds Lykles
74411984fc
Rangeify IMAGE ( #12304 )
...
* add imagedtype to rangeify
* enable some image tests
* move the tests
* image upcast before locals
* add if statement
* rangeify image_dtype test
* decrease read_image count
2025-09-26 07:21:02 +02:00
chenyu
17cec8d645
RANGEIFY winograd test ( #12297 )
...
speed seems fine
2025-09-24 23:42:32 -04:00
qazal
38ecefaacb
RANGEIFY=1 allreduce ( #12260 )
...
* ci
* extract mops
* work
* assert early
* port this?
* can realize shard
* allreduce passing
* notes
* better handling of shard
* err
* outerworld allreduce twice
* work
* don't tag movement ops
* don't tag movement ops
* delete old logic
* 19 failing + ram
* cleanup
* reset stuff
* simplest failing test
* diff
* test_ones
* allreduce work
* allreduce more work
* down to 22 failing tests
* port _device_num
* replace creates a new UOp here
* pour symbolic everywhere
* 7 failing
* focus on allreduce
* work
* cleanup
* more ci
* fix test_schedule_ring
* post index const shape
* much better
* diff cleanup
2025-09-24 18:13:08 +03:00
qazal
1400ce105f
rangeify: fix sharding ( #12288 )
2025-09-24 14:33:56 +03:00
qazal
154c865966
rangeify: fix ram usage in multi ( #12286 )
2025-09-24 13:48:58 +03:00
qazal
ad7c8c21ea
rangeify: INDEX doesn't passthrough MSELECT ( #12279 )
2025-09-23 21:36:50 +03:00
nimlgen
02a7b7fe48
rangeify: fix test_setitem ( #12269 )
...
* rangeify: fix test_setitem
* um?
* better?
* simple where folding
* f
* revert
* x
2025-09-23 20:42:36 +03:00
qazal
2f145a98e0
rangeify: fix contiguous multi ( #12278 )
...
* rangeify: fix contiguous multi
* when it's changing root, it should construct a new UOp
2025-09-23 20:05:29 +03:00
nimlgen
5f4eeb054c
rangeify: passes now ( #12277 )
2025-09-23 18:46:49 +03:00
chenyu
51b88b2265
process replay tests in rangeify ( #12274 )
2025-09-23 01:30:06 -04:00
chenyu
b03ceb806e
move test_sample to test_randomness ( #12266 )
2025-09-21 21:11:32 -04:00
nimlgen
b53a266254
rangeify: fix test_optim ( #12262 )
...
* rangeify: fix test_optim
* add to cl?
* these are good now
2025-09-21 18:08:35 +03:00
qazal
57c7e0a8f8
RANGEIFY=1 test_jit ( #12254 )
...
* RANGEIFY=1 test_jit
* don't do any of that
* disk
* simple disk tensor
* more work
* run more tests
* it also doesn't copy everytime
* skip tests that hang everything
2025-09-20 17:34:32 +03:00
chenyu
393c6b236c
test case to sum twice in different order ( #12253 )
...
* test case to sum twice in different order
fixed by #12251
* try metal
2025-09-20 10:11:57 -04:00
Sieds Lykles
7e06d3ebba
enable test_symbolic_jit ( #12245 )
...
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-09-19 20:23:42 +02:00
chenyu
a531a649fb
test_resize_upsample_scales_cubic_align_corners_cpu is fixed ( #12244 )
2025-09-18 20:55:26 -04:00
chenyu
cff1065f5e
test CL=1 RANGEIFY=1 onnx ( #12240 )
...
all except test_resize_upsample_scales_cubic_align_corners_cpu runs
2025-09-18 16:49:46 -04:00
chenyu
f82b16a0e9
RANGEIFY test_tensor ( #12235 )
2025-09-18 10:35:43 -04:00
chenyu
7487c13b61
truncate_fp16 -> float_to_fp16 ( #12234 )
...
match float_to_bf16 and float_to_fp8
2025-09-18 09:48:27 -04:00
Sieds Lykles
f1108f1cbe
Enable test_symbolic_ops on rangeify ( #12230 )
...
* enable
* merge correctly
2025-09-18 02:12:36 +02:00
Sieds Lykles
812f485cd7
Enable threefry_doesnt_use_long test on rangeify ( #12229 )
...
* dont bufferize rangeify
* enable doesnt_use_long test
2025-09-18 01:58:34 +02:00
qazal
525f80e0d2
rangeify: enable putting consts back in the tensor graph ( #12225 )
...
* rangeify: enable putting consts back in the tensor graph
* work
* sym in ci
2025-09-17 19:45:04 +03:00
qazal
d917895569
map out rangeify errors in test_schedule ( #12211 )
...
* map out rangeify errors in test_schedule
* skip that
* add to ci
2025-09-17 09:10:28 +03:00
chenyu
5b12764b83
add arange cat arange test ( #12217 )
...
simple test case to catch wrong reduce const folding. also clean up the old arange complexity test
2025-09-16 17:12:32 -04:00
chenyu
e555748807
test rangeify const folding ( #12200 )
...
* test rangeify const folding
reduce i know how to fix, multi and test_cast_padded tbd
* test_instancenorm_3d is very slow
2025-09-15 20:03:48 -04:00
chenyu
f732f66709
rangeify test_nn almost pass ( #12198 )
...
* rangeify test_nn almost pass
* issue with jit
* flaky
2025-09-15 17:49:20 -04:00
chenyu
82e037aad5
ci test.yml updates ( #12197 )
...
* ci test.yml updates
move docs together and external_benchmark_schedule to unit
* torch
2025-09-15 17:09:02 -04:00
chenyu
146c31586d
split RANGEIFY ci ( #12196 )
...
one CPU and one CL for speed
2025-09-15 15:41:10 -04:00
chenyu
72e010d816
fix rangeify ci ( #12189 )
...
CL=1, and multitensor needs to test with CPU since CL does not support multi in CI
2025-09-15 10:24:57 -04:00
qazal
f1bd06134d
test fuse with RANGEIFY=2 ( #12187 )
2025-09-15 15:51:23 +03:00
qazal
a388d2cb1a
remove PROFILE=1 option, it's just VIZ=1 [pr] ( #12176 )
...
* remove PROFILE=1 option, it's just VIZ=1 [pr]
* sqtt
* sqtt 2
* return last
* rename
2025-09-15 12:51:50 +03:00
George Hotz
d5bc27797b
fix some multitensor on rangeify ( #12162 )
...
* fix some multitensor on rangeify
* rangeify multi hacks
* copy on const
2025-09-14 14:31:57 +08:00
chenyu
aac3dceaf6
merge two PYTHON backend ci job ( #12143 )
...
* merge two PYTHON backend ci job
and mark anything that takes > 10 in test_ops slow
* two more
2025-09-12 17:36:46 -04:00
George Hotz
a2f502b89e
fix rangeify=1 ops on GPU ( #12130 )
2025-09-12 11:17:37 +08:00
chenyu
e5ef9ec5b1
remove IGNORE_OOB=0 in ci tests ( #12117 )
2025-09-11 15:05:04 -04:00
chenyu
520e2e0727
actually run unit tests in ci MacOS (unit) ( #12122 )
...
* actually run unit tests in ci MacOS (unit)
* that's always wrong
2025-09-11 13:32:30 -04:00
nimlgen
acb700fc26
ci: fix ptx env ( #12120 )
2025-09-11 12:42:15 -04:00
chenyu
20cd7177de
delete test_bert_fuse_arange ( #12121 )
...
* delete test_bert_fuse_arange
it's the default now and we are not interested in FUSE_ARANGE=0 version
* remove -v
2025-09-11 12:35:51 -04:00
chenyu
b07f962058
split metal model tests ( #12119 )
...
* split metal model tests
* llama too
2025-09-11 12:20:12 -04:00
chenyu
66593f135f
remove duplicated test_real_world ( #12118 )
...
included in the test/models right below
2025-09-11 11:57:14 -04:00
chenyu
0e266f376c
ops_gpu -> ops_cl ( #12103 )
2025-09-10 15:15:48 -04:00
nimlgen
fb96394ff5
auto-select available compilers ( #12094 )
...
* device: auto select compilers
* fix
* metal+opencl
* nv/cuda
* test without ptx
* ptx
* fix tests
* fix
* fix test
* rename
* test + cleaner
* xx
* ops
* better test
* win?
* um?
* types
* debug
* win??
* sep rung
* wtf?
* debug
* skip win
* revert this
* types
2025-09-10 19:52:01 +03:00
nimlgen
1c6c42715f
unify cpu and llvm ( #11982 )
...
* try unify cpu and llvm
* fixes
* fix
* ops
* no llvm
* fix
* rm
* lvmm is ot
* oops
* override
* no llvm
* ignore
* skip llvm
* ooops
2025-09-09 13:54:44 +03:00
chenyu
2bd1fff79c
ci GPU misc cleanups ( #12078 )
2025-09-08 16:47:29 -04:00
chenyu
1781d5bced
remove PYTHONPATH in test.yml ( #12077 )
...
set globally already
2025-09-08 15:41:47 -04:00