chenyu
87707ef0b8
unify range_start [pr] ( #12236 )
2025-09-18 13:52:54 -04:00
qazal
825f148469
rangeify: fix copy size mismatch errs ( #12232 )
...
* rangeify: fix copy size mismatch errs
* const folding can happen in sym
assert it
* shippable
* rangeify copy is completely wrong
* pre_bufferize
* tag bufferize
* pre back
2025-09-18 18:23:32 +03:00
chenyu
f82b16a0e9
RANGEIFY test_tensor ( #12235 )
2025-09-18 10:35:43 -04:00
chenyu
7487c13b61
truncate_fp16 -> float_to_fp16 ( #12234 )
...
match float_to_bf16 and float_to_fp8
2025-09-18 09:48:27 -04:00
b1tg
54c15d74a4
python float8 support ( #11960 )
...
* basic support
* alu
* nan in exec_alu
* rand_for_dtype
* inf + 0.0
* finfo
* revert rand_for_dtype
* clean
* truncate fp8s inf
* spec ok
* float_to_fp8 nan/inf
* least_upper_dtype
* clean up
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-09-18 09:17:09 -04:00
qazal
dbbc261075
rangeify: fix COPY simplifier ( #12233 )
2025-09-18 14:35:33 +03:00
Sieds Lykles
f1108f1cbe
Enable test_symbolic_ops on rangeify ( #12230 )
...
* enable
* merge correctly
2025-09-18 02:12:36 +02:00
Sieds Lykles
812f485cd7
Enable threefry_doesnt_use_long test on rangeify ( #12229 )
...
* dont bufferize rangeify
* enable doesnt_use_long test
2025-09-18 01:58:34 +02:00
nimlgen
3c5b8bf50c
am: bump fw to rocm7 ( #12226 )
2025-09-17 21:20:22 +03:00
qazal
525f80e0d2
rangeify: enable putting consts back in the tensor graph ( #12225 )
...
* rangeify: enable putting consts back in the tensor graph
* work
* sym in ci
2025-09-17 19:45:04 +03:00
chenyu
edffc246ed
MUL in reduce_unparented ( #12223 )
...
* MUL in reduce_unparented
* some test
2025-09-17 11:56:39 -04:00
qazal
7733c217c5
remove spam comments in test_schedule ( #12224 )
2025-09-17 18:24:55 +03:00
qazal
d917895569
map out rangeify errors in test_schedule ( #12211 )
...
* map out rangeify errors in test_schedule
* skip that
* add to ci
2025-09-17 09:10:28 +03:00
Sieds Lykles
158506b91e
Upgrade some divmod folding for symbolic divs ( #12216 )
...
* use const_factor() instead of arg
* add test
* change div min_max
* add tests
* add divide_by_symbolic_gcd
* add tests
* one more test
* Slice to unbind symbolic
* deal with const factor properly
* minor cleanup
* divide_by_symbolic_gcd becomes UOp.gcd and UOp.divide_exact
* add tests
* add gcd_without_const
* fix divide_exact bug
* add factor_remainder
* add tests
* fix imports
* elif -> if
* remove expectedFailure
* add more tests
* add more unwrap
* fix signature of pop_const
* remove that
* remove that
2025-09-17 03:00:50 +02:00
Sieds Lykles
328bfe6b9b
fix map_expand for symbolic shapes ( #12218 )
...
fix incorrect default argument in resolve
2025-09-17 01:20:18 +02:00
chenyu
5b12764b83
add arange cat arange test ( #12217 )
...
simple test case to catch wrong reduce const folding. also clean up the old arange complexity test
2025-09-16 17:12:32 -04:00
nimlgen
53655a4ee5
cuda: cleanup old comment ( #12215 )
2025-09-16 23:11:32 +03:00
chenyu
6b808c5fe6
update TestSymbolicJit.test_plus1_pad ( #12214 )
...
was failing because movement was not captured
2025-09-16 15:57:50 -04:00
Shun Usami
2a72b00679
Add test for 2D tensor indexing in setitem ( #12193 )
...
* Add test for 2D tensor indexing in setitem
* Fix _masked_setitem to handle multi dim indexing correctly
* Fix indent
* Add fuzz test for 3D tensor indexing in setitem
* Skip indexing fuzz test (slow)
2025-09-16 14:57:25 -04:00
chenyu
c7b03457d7
Revert "Revert "more llvm intrinsics ( #11961 )" ( #12194 )" ( #12195 )
...
This reverts commit df1c183e46 .
2025-09-16 14:55:31 -04:00
chenyu
494bb12500
skip slow cifar bf16 on red benchmark ( #12213 )
...
very slow to compile the fake bf16
2025-09-16 14:55:01 -04:00
chenyu
419e997187
increase benchmark timeout ( #12212 )
...
account for compile cache, and it's annoying that job died due to timeout also messes the machine
2025-09-16 14:09:02 -04:00
chenyu
84d2d047ea
Tensor.pad_to and Tensor.shrink_to ( #12210 )
...
most of the time i want this instead of spelling out the args
also add more input validation to shrink
2025-09-16 12:24:55 -04:00
qazal
122a50fe8c
assert kernel count ( #12205 )
2025-09-16 14:24:39 +03:00
chenyu
e555748807
test rangeify const folding ( #12200 )
...
* test rangeify const folding
reduce i know how to fix, multi and test_cast_padded tbd
* test_instancenorm_3d is very slow
2025-09-15 20:03:48 -04:00
chenyu
f732f66709
rangeify test_nn almost pass ( #12198 )
...
* rangeify test_nn almost pass
* issue with jit
* flaky
2025-09-15 17:49:20 -04:00
chenyu
82e037aad5
ci test.yml updates ( #12197 )
...
* ci test.yml updates
move docs together and external_benchmark_schedule to unit
* torch
2025-09-15 17:09:02 -04:00
chenyu
146c31586d
split RANGEIFY ci ( #12196 )
...
one CPU and one CL for speed
2025-09-15 15:41:10 -04:00
chenyu
df1c183e46
Revert "more llvm intrinsics ( #11961 )" ( #12194 )
...
This reverts commit d01e3d7719 .
2025-09-15 13:56:43 -04:00
b1tg
d01e3d7719
more llvm intrinsics ( #11961 )
...
* more llvm intrinsics
* assert nan
* skip test_log_nan on metal
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-09-15 13:05:23 -04:00
nimlgen
b63bd02969
update runtime docs ( #12191 )
2025-09-15 17:46:20 +03:00
qazal
57e8bf61e8
viz: fix Specificity for rect styling ( #12190 )
2025-09-15 17:33:37 +03:00
chenyu
72e010d816
fix rangeify ci ( #12189 )
...
CL=1, and multitensor needs to test with CPU since CL does not support multi in CI
2025-09-15 10:24:57 -04:00
qazal
f1bd06134d
test fuse with RANGEIFY=2 ( #12187 )
2025-09-15 15:51:23 +03:00
qazal
ef0ef705fe
viz: remove async from event listener ( #12186 )
2025-09-15 15:08:28 +03:00
qazal
d8855ec266
viz/serve.py cleanups ( #12185 )
...
* don't assign unused variable
* *path to
2025-09-15 13:43:26 +03:00
qazal
b8a74c1569
cpu: add disassembler err message ( #12184 )
...
* cpu: add disassembler err message
* print msg
2025-09-15 13:29:44 +03:00
qazal
a388d2cb1a
remove PROFILE=1 option, it's just VIZ=1 [pr] ( #12176 )
...
* remove PROFILE=1 option, it's just VIZ=1 [pr]
* sqtt
* sqtt 2
* return last
* rename
2025-09-15 12:51:50 +03:00
George Hotz
65397bfdeb
set testpath on pytest ( #12183 )
2025-09-15 16:13:05 +08:00
George Hotz
ae0edc8a67
renumber ranges ( #12182 )
...
* enable rangeify const folding
* renumber ranges for kernel deduping
2025-09-15 13:03:39 +08:00
hooved
e1fef895b1
don't hardcode weights path ( #12171 )
2025-09-15 00:33:47 -04:00
hooved
3a9db08b49
download data and ckpts for sd train/eval ( #12170 )
2025-09-15 00:31:45 -04:00
chenyu
bdb3afd566
failed test case for symbolic pad ( #12179 )
2025-09-15 00:25:21 -04:00
George Hotz
9fcc87761e
enable rangeify const folding ( #12181 )
2025-09-15 12:02:19 +08:00
George Hotz
1353250b6c
tags on bufferize are the tensor tags ( #12180 )
2025-09-15 11:46:03 +08:00
George Hotz
60d7db093e
delete bufferized consts + output noops ( #12163 )
...
* bring const folding to rangeify
* comment that
2025-09-15 11:07:44 +08:00
qazal
525c20dc7e
viz: remove unused runtime_stats feature ( #12177 )
2025-09-15 02:53:05 +03:00
qazal
75ff9b7a9a
viz: add buffer lifetime to tooltip ( #12175 )
2025-09-15 02:33:50 +03:00
chenyu
15b166ce6d
bump test_module_runs to 30 seconds ( #12174 )
...
25 seconds sometimes
2025-09-14 16:48:40 -04:00
ttomsa
943236ef74
move cast pat out of symbolic_simple ( #11945 )
...
* move pat
* move it here
* rm extra check
---------
Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com >
2025-09-14 21:39:48 +02:00