George Hotz
15ee742afa
add get_children_map to uop ( #9470 )
...
* add get_children_map to uop
* update_children
* fix new children
2025-03-17 14:36:13 +08:00
George Hotz
cb7a7f69c7
quantization preprocessor from DSP, should be universal ( #9437 )
...
* quantization preprocessor from DSP, should be universal
* touchups
* fix tests
2025-03-15 07:49:37 +08:00
chenyu
99b0287e4e
add GROUP and GROUPTOP to test_arange ( #9432 )
...
it does not grow quadratically, but it's not 0 ops now
2025-03-13 11:28:38 -04:00
qazal
90ffa9bd45
swizzle without buffer ops try 2 [pr] ( #9427 )
...
* add DONT_PUSH_VIEWS to matchers
* swizzle without buffer ops try 2 [pr]
* swizzle reduceop
* simple failing test
* fix failing test
* s/on/for
2025-03-13 10:00:40 +01:00
chenyu
22fc0a2e36
bert sum acc in half ( #9412 )
...
also BS=96
2025-03-11 23:03:15 -04:00
George Hotz
e174c6c3bc
new devectorizer ( #9331 )
...
* new devectorizer
* lidx
* test linearizer passes
* fix images
* fix unfoldable image load
* delete unused
* improve fix_unfoldable_image_load
* working for image
* fixup types
* fixup transcendental
* cast_vec
* cleaner transcendental
* skip failing test
* err, flip that
* not devec
* sqrt
2025-03-11 18:47:56 +08:00
George Hotz
2780e2027e
devectorize prereqs [pr] ( #9404 )
2025-03-11 12:33:29 +08:00
chenyu
01e8b60911
acc_dtype -> dtype ( #9402 )
...
matched numpy and torch
2025-03-10 16:05:30 -04:00
qazal
59dfb234eb
replace hardcoded ast with tensors in TestSwizzle [pr] ( #9401 )
2025-03-10 19:33:57 +01:00
geohotstan
1d64c12f2b
add Topk to tensor ( #9343 )
...
* terrible but somewhat working impl
* linux behaves differently than macos?
* slightly better impl
* small clean up; haven't figured this out yet
* better
* torch has different behavior on linux and macos for duplicated values
* add sum docs
* fix test
* add torch return_type test
* add an exception test
* wrap_fxn instead, and move op lower in order
* better repeated values test
* rerun ci
2025-03-09 20:01:42 -04:00
qazal
a1f41fadf6
test_schedule cleanups + add DONT_GROUP_REDUCES [pr] ( #9392 )
...
* test_schedule cleanups + add DONT_GROUP_REDUCES [pr]
* replace with test_swizzle_reduceop
* delete duplicate tests
* test_allow_push_permutes
* one kernel tests
2025-03-09 15:01:08 +01:00
qazal
286b480f82
do not replace assign with the offset buffer [pr] ( #9387 )
2025-03-08 11:57:44 +01:00
qazal
0d2762c010
prep refactor for adding buffer ops last [pr] ( #9383 )
...
* prep refactor for adding buffer ops last [pr]
* freeze buffers
* add swizzle_reduceop
* shape for reduceop_view_right
* simpler elementwise_view_right
* add shapetracker to const
* only const
* from process replay
2025-03-08 08:00:14 +01:00
nimlgen
243078dda9
am: optimize tlb usage ( #9049 )
...
* am: optimize tlb usage
* fxies
* comments
* tiny
2025-03-07 19:37:29 +03:00
geohotstan
088d86691b
fix onnx gather and onnx auto_pad VALID mode ( #9375 )
...
* fix gather and auto_pad
* long -> int64
2025-03-07 10:27:23 -05:00
hooved
136cf7b8b1
hotfix: load >2 GiB from disk on macOS ( #9361 )
...
* enable loading >2 GiB buffer from disk on macOS
* handle None case raised by mypy
* add test
* revert fix to repro bug in CI
* tell CI to run a unit test for macOS
* reapply fix
2025-03-07 14:51:58 +08:00
nimlgen
9bd13de44c
lower test_gemv_4096_16384 to 750 for red ( #9367 )
2025-03-05 22:44:48 +03:00
uuuvn
b75f307234
amd: autogen ip bases ( #9360 )
2025-03-05 22:30:38 +03:00
chenyu
2cb2fce8d9
lower test_gemm_8192 amd_tflops to 65 ( #9364 )
2025-03-05 14:06:11 -05:00
nimlgen
14c88abf27
add some options to allreduce bench ( #9348 )
2025-03-04 23:46:36 +03:00
Anish Umale
bafa40fe12
Tiny backend test_ops fix part1 ( #9338 )
...
* extract name methods from https://github.com/tinygrad/tinygrad/pull/9302
* t.grad.numpy() -> t.grad.cpu().numpy()
* revert TORCH_DEBUG change
* revert dtype change in aten.sum
2025-03-03 12:36:51 -05:00
George Hotz
0d4ba7dd87
import tinygrad.frontend.torch ( #9337 )
...
* import tinygrad.frontend.torch
* type ignore
2025-03-04 00:15:29 +08:00
qazal
23084fd850
merge merge_views and remove_movement_ops [pr] ( #9333 )
...
* merge merge_views and remove_movement_ops [pr]
* fix that assert
2025-03-03 12:38:59 +01:00
George Hotz
ece0a0f305
use empty for test instead of rand ( #9332 )
2025-03-03 16:19:06 +08:00
George Hotz
2cc4cb74f0
reorder binops ( #9328 )
...
* reorder binops
* test improvements + fix string tests
* ugh, okay this
2025-03-03 14:58:18 +08:00
chenyu
146eb73790
fix Tensor.view with a tuple arg ( #9330 )
2025-03-02 23:35:23 -05:00
chenyu
ba4b8c2c23
Tensor.copysign ( #9329 )
2025-03-02 21:33:49 -05:00
nimlgen
8cae00833c
flaky test in ci ( #9321 )
2025-03-02 16:27:22 +03:00
Ali Ladjevardi
00028e87bb
Failing test for not realizing intermediate expand in multi-GPU ( #9320 )
2025-03-02 12:54:48 +01:00
George Hotz
ba97fd0b9c
hotfix: add test/external/external_benchmark_disk_raw
2025-03-02 02:32:15 +00:00
chenyu
cc2bbb0bf1
Tensor.isfinite ( #9316 )
2025-03-01 19:58:56 -05:00
geohotstan
d9ec05cea6
Test Onnx quantization behavior ( #9301 )
...
* add DynamicDequantizeLinear and corresponding tests
* wow qlinearops are round away from zero
* this passes locally...
* again
* try
* try separate test
* round to even again
* also add QLinearMul
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-01 19:21:58 -05:00
chenyu
fe0f860209
update test_ops for tensors from torch ( #9308 )
...
a few detach().numpy() -> detach().cpu().numpy()
2025-02-28 15:57:25 -05:00
chenyu
38d7aae3b7
onnx fmod ( #9307 )
2025-02-28 14:09:22 -05:00
chenyu
7c7db78feb
support float mod ( #9306 )
...
also added spec check on Ops.MOD to be ints only
2025-02-28 13:33:58 -05:00
chenyu
90808e2dd0
div rounding_mode ( #9304 )
2025-02-28 11:38:25 -05:00
chenyu
3ae66e59a3
least_upper_float is at least default_float ( #9303 )
...
* least_upper_float is at least default_float
en route for div rounding mode. dtype of true int division would change from int32 to default_float, which matches torch too.
* fix bert acc
2025-02-28 10:41:56 -05:00
Eitan Turok
d657d5f754
[Bounty] Vectorize Transcendental ( #9058 )
...
* init
* cast everythig right
* more casting
* install pillow in test
* quick tests
* simplify
* quick tests
* delete test
* tests
* fix import error
* add vec to ldexp3k
* vec for bitcast
* some helper tests
* high level tests
* clean tests
* change tolerance so cuda passes
* ruff passes
* remove tests for transcendental helpers
* ruff passes
* make exponent in power vectorized
* fix pow test
* add newline
* add vec dtype to ilogb2k
* comment + clean up
* ruff
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-28 15:47:25 +08:00
qazal
cdf66cc67f
test: recompute expanded CAST ( #9286 )
...
* those views should merge
* diff cleanup
* gpu
* put it behind CAST_AFTER_EXPAND
2025-02-27 19:22:17 +01:00
chenyu
4342300eff
lower test_gemm_8192 amd to 70 ( #9277 )
...
flaky
2025-02-26 16:32:08 -05:00
Francis Lata
86b737a120
leakyrelu to leaky_relu ( #9270 )
2025-02-26 13:22:08 -05:00
chenyu
cd822bbe11
hotfix torch_grad.detach().cpu().numpy() in test_ops ( #9268 )
2025-02-26 12:27:35 -05:00
chenyu
49ca90df75
update test_ops backward tests ( #9267 )
...
instead of `(out+1).square().mean().backward()`, use forward.sum().gradient to get closer to the gradients
2025-02-26 12:09:24 -05:00
chenyu
aaf0a8069f
xor -> bitwise_xor ( #9264 )
2025-02-26 10:21:14 -05:00
qazal
e162aa862d
is_realized only if buffer is allocated ( #9253 )
...
* is_realized only if the buffer is allocated
* fix the image check too
* assert test_lil_model after ExecItems run
2025-02-26 08:58:08 +01:00
George Hotz
3f4eb9006a
test for device mismatch [pr] ( #9250 )
...
* test for device mismatch [pr]
* fix bert
2025-02-26 13:06:33 +08:00
Sieds Lykles
9c4d9d9f10
Acc first ( #9232 )
...
* put acc in front of the add chain
* handle the other case
* Make loop collapse more generic
* Remove mulacc_unrolled
* Actually remove it
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-25 22:10:15 -05:00
nimlgen
70db8c3003
hcq: dyn alloc signals ( #9238 )
...
* hcq: dyn alloc signals
* types and uniqueue devs
* typing
* mypy
* mypy one more time
* test
* make fds to not intersect in mockgpu between drivers
2025-02-25 17:22:24 +03:00
nimlgen
b4c3780df0
hotfix: interop example ( #9237 )
...
* hotfix: interop example
* rm this
* fix
* fix ci mps
* atol rtol
* no uaf
2025-02-25 10:32:00 +03:00
Sieds Lykles
990c240b82
Stable pow gradient ( #9226 )
...
* Stable gradient
* More efficient
* Fix and test for +-inf
* cleaner
* skip webgpu test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-24 20:54:26 -05:00