Commit Graph

10490 Commits

Author SHA1 Message Date
chenyu
87707ef0b8 unify range_start [pr] (#12236) 2025-09-18 13:52:54 -04:00
qazal
825f148469 rangeify: fix copy size mismatch errs (#12232)
* rangeify: fix copy size mismatch errs

* const folding can happen in sym

assert it

* shippable

* rangeify copy is completely wrong

* pre_bufferize

* tag bufferize

* pre back
2025-09-18 18:23:32 +03:00
chenyu
f82b16a0e9 RANGEIFY test_tensor (#12235) 2025-09-18 10:35:43 -04:00
chenyu
7487c13b61 truncate_fp16 -> float_to_fp16 (#12234)
match float_to_bf16 and float_to_fp8
2025-09-18 09:48:27 -04:00
b1tg
54c15d74a4 python float8 support (#11960)
* basic support

* alu

* nan in exec_alu

* rand_for_dtype

* inf + 0.0

* finfo

* revert rand_for_dtype

* clean

* truncate fp8s inf

* spec ok

* float_to_fp8 nan/inf

* least_upper_dtype

* clean up

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-09-18 09:17:09 -04:00
qazal
dbbc261075 rangeify: fix COPY simplifier (#12233) 2025-09-18 14:35:33 +03:00
Sieds Lykles
f1108f1cbe Enable test_symbolic_ops on rangeify (#12230)
* enable

* merge correctly
2025-09-18 02:12:36 +02:00
Sieds Lykles
812f485cd7 Enable threefry_doesnt_use_long test on rangeify (#12229)
* dont bufferize rangeify

* enable doesnt_use_long test
2025-09-18 01:58:34 +02:00
nimlgen
3c5b8bf50c am: bump fw to rocm7 (#12226) 2025-09-17 21:20:22 +03:00
qazal
525f80e0d2 rangeify: enable putting consts back in the tensor graph (#12225)
* rangeify: enable putting consts back in the tensor graph

* work

* sym in ci
2025-09-17 19:45:04 +03:00
chenyu
edffc246ed MUL in reduce_unparented (#12223)
* MUL in reduce_unparented

* some test
2025-09-17 11:56:39 -04:00
qazal
7733c217c5 remove spam comments in test_schedule (#12224) 2025-09-17 18:24:55 +03:00
qazal
d917895569 map out rangeify errors in test_schedule (#12211)
* map out rangeify errors in test_schedule

* skip that

* add to ci
2025-09-17 09:10:28 +03:00
Sieds Lykles
158506b91e Upgrade some divmod folding for symbolic divs (#12216)
* use const_factor() instead of arg

* add test

* change div min_max

* add tests

* add divide_by_symbolic_gcd

* add tests

* one more test

* Slice to unbind symbolic

* deal with const factor properly

* minor cleanup

* divide_by_symbolic_gcd becomes UOp.gcd and UOp.divide_exact

* add tests

* add gcd_without_const

* fix divide_exact bug

* add factor_remainder

* add tests

* fix imports

* elif -> if

* remove expectedFailure

* add more tests

* add more unwrap

* fix signature of pop_const

* remove that

* remove that
2025-09-17 03:00:50 +02:00
Sieds Lykles
328bfe6b9b fix map_expand for symbolic shapes (#12218)
fix incorrect default argument in resolve
2025-09-17 01:20:18 +02:00
chenyu
5b12764b83 add arange cat arange test (#12217)
simple test case to catch wrong reduce const folding. also clean up the old arange complexity test
2025-09-16 17:12:32 -04:00
nimlgen
53655a4ee5 cuda: cleanup old comment (#12215) 2025-09-16 23:11:32 +03:00
chenyu
6b808c5fe6 update TestSymbolicJit.test_plus1_pad (#12214)
was failing because movement was not captured
2025-09-16 15:57:50 -04:00
Shun Usami
2a72b00679 Add test for 2D tensor indexing in setitem (#12193)
* Add test for 2D tensor indexing in setitem

* Fix _masked_setitem to handle multi dim indexing correctly

* Fix indent

* Add fuzz test for 3D tensor indexing in setitem

* Skip indexing fuzz test (slow)
2025-09-16 14:57:25 -04:00
chenyu
c7b03457d7 Revert "Revert "more llvm intrinsics (#11961)" (#12194)" (#12195)
This reverts commit df1c183e46.
2025-09-16 14:55:31 -04:00
chenyu
494bb12500 skip slow cifar bf16 on red benchmark (#12213)
very slow to compile the fake bf16
2025-09-16 14:55:01 -04:00
chenyu
419e997187 increase benchmark timeout (#12212)
account for compile cache, and it's annoying that job died due to timeout also messes the machine
2025-09-16 14:09:02 -04:00
chenyu
84d2d047ea Tensor.pad_to and Tensor.shrink_to (#12210)
most of the time i want this instead of spelling out the args

also add more input validation to shrink
2025-09-16 12:24:55 -04:00
qazal
122a50fe8c assert kernel count (#12205) 2025-09-16 14:24:39 +03:00
chenyu
e555748807 test rangeify const folding (#12200)
* test rangeify const folding

reduce i know how to fix, multi and test_cast_padded tbd

* test_instancenorm_3d is very slow
2025-09-15 20:03:48 -04:00
chenyu
f732f66709 rangeify test_nn almost pass (#12198)
* rangeify test_nn almost pass

* issue with jit

* flaky
2025-09-15 17:49:20 -04:00
chenyu
82e037aad5 ci test.yml updates (#12197)
* ci test.yml updates

move docs together and external_benchmark_schedule to unit

* torch
2025-09-15 17:09:02 -04:00
chenyu
146c31586d split RANGEIFY ci (#12196)
one CPU and one CL for speed
2025-09-15 15:41:10 -04:00
chenyu
df1c183e46 Revert "more llvm intrinsics (#11961)" (#12194)
This reverts commit d01e3d7719.
2025-09-15 13:56:43 -04:00
b1tg
d01e3d7719 more llvm intrinsics (#11961)
* more llvm intrinsics

* assert nan

* skip test_log_nan on metal

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-09-15 13:05:23 -04:00
nimlgen
b63bd02969 update runtime docs (#12191) 2025-09-15 17:46:20 +03:00
qazal
57e8bf61e8 viz: fix Specificity for rect styling (#12190) 2025-09-15 17:33:37 +03:00
chenyu
72e010d816 fix rangeify ci (#12189)
CL=1, and multitensor needs to test with CPU since CL does not support multi in CI
2025-09-15 10:24:57 -04:00
qazal
f1bd06134d test fuse with RANGEIFY=2 (#12187) 2025-09-15 15:51:23 +03:00
qazal
ef0ef705fe viz: remove async from event listener (#12186) 2025-09-15 15:08:28 +03:00
qazal
d8855ec266 viz/serve.py cleanups (#12185)
* don't assign unused variable

* *path to
2025-09-15 13:43:26 +03:00
qazal
b8a74c1569 cpu: add disassembler err message (#12184)
* cpu: add disassembler err message

* print msg
2025-09-15 13:29:44 +03:00
qazal
a388d2cb1a remove PROFILE=1 option, it's just VIZ=1 [pr] (#12176)
* remove PROFILE=1 option, it's just VIZ=1 [pr]

* sqtt

* sqtt 2

* return last

* rename
2025-09-15 12:51:50 +03:00
George Hotz
65397bfdeb set testpath on pytest (#12183) 2025-09-15 16:13:05 +08:00
George Hotz
ae0edc8a67 renumber ranges (#12182)
* enable rangeify const folding

* renumber ranges for kernel deduping
2025-09-15 13:03:39 +08:00
hooved
e1fef895b1 don't hardcode weights path (#12171) 2025-09-15 00:33:47 -04:00
hooved
3a9db08b49 download data and ckpts for sd train/eval (#12170) 2025-09-15 00:31:45 -04:00
chenyu
bdb3afd566 failed test case for symbolic pad (#12179) 2025-09-15 00:25:21 -04:00
George Hotz
9fcc87761e enable rangeify const folding (#12181) 2025-09-15 12:02:19 +08:00
George Hotz
1353250b6c tags on bufferize are the tensor tags (#12180) 2025-09-15 11:46:03 +08:00
George Hotz
60d7db093e delete bufferized consts + output noops (#12163)
* bring const folding to rangeify

* comment that
2025-09-15 11:07:44 +08:00
qazal
525c20dc7e viz: remove unused runtime_stats feature (#12177) 2025-09-15 02:53:05 +03:00
qazal
75ff9b7a9a viz: add buffer lifetime to tooltip (#12175) 2025-09-15 02:33:50 +03:00
chenyu
15b166ce6d bump test_module_runs to 30 seconds (#12174)
25 seconds sometimes
2025-09-14 16:48:40 -04:00
ttomsa
943236ef74 move cast pat out of symbolic_simple (#11945)
* move pat

* move it here

* rm extra check

---------

Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
2025-09-14 21:39:48 +02:00