Commit Graph

4667 Commits

Author SHA1 Message Date
George Hotz
15ee742afa add get_children_map to uop (#9470)
* add get_children_map to uop

* update_children

* fix new children
2025-03-17 14:36:13 +08:00
George Hotz
cb7a7f69c7 quantization preprocessor from DSP, should be universal (#9437)
* quantization preprocessor from DSP, should be universal

* touchups

* fix tests
2025-03-15 07:49:37 +08:00
chenyu
99b0287e4e add GROUP and GROUPTOP to test_arange (#9432)
it does not grow quadratically, but it's not 0 ops now
2025-03-13 11:28:38 -04:00
qazal
90ffa9bd45 swizzle without buffer ops try 2 [pr] (#9427)
* add DONT_PUSH_VIEWS to matchers

* swizzle without buffer ops try 2 [pr]

* swizzle reduceop

* simple failing test

* fix failing test

* s/on/for
2025-03-13 10:00:40 +01:00
chenyu
22fc0a2e36 bert sum acc in half (#9412)
also BS=96
2025-03-11 23:03:15 -04:00
George Hotz
e174c6c3bc new devectorizer (#9331)
* new devectorizer

* lidx

* test linearizer passes

* fix images

* fix unfoldable image load

* delete unused

* improve fix_unfoldable_image_load

* working for image

* fixup types

* fixup transcendental

* cast_vec

* cleaner transcendental

* skip failing test

* err, flip that

* not devec

* sqrt
2025-03-11 18:47:56 +08:00
George Hotz
2780e2027e devectorize prereqs [pr] (#9404) 2025-03-11 12:33:29 +08:00
chenyu
01e8b60911 acc_dtype -> dtype (#9402)
matched numpy and torch
2025-03-10 16:05:30 -04:00
qazal
59dfb234eb replace hardcoded ast with tensors in TestSwizzle [pr] (#9401) 2025-03-10 19:33:57 +01:00
geohotstan
1d64c12f2b add Topk to tensor (#9343)
* terrible but somewhat working impl

* linux behaves differently than macos?

* slightly better impl

* small clean up; haven't figured this out yet

* better

* torch has different behavior on linux and macos for duplicated values

* add sum docs

* fix test

* add torch return_type test

* add an exception test

* wrap_fxn instead, and move op lower in order

* better repeated values test

* rerun ci
2025-03-09 20:01:42 -04:00
qazal
a1f41fadf6 test_schedule cleanups + add DONT_GROUP_REDUCES [pr] (#9392)
* test_schedule cleanups + add DONT_GROUP_REDUCES [pr]

* replace with test_swizzle_reduceop

* delete duplicate tests

* test_allow_push_permutes

* one kernel tests
2025-03-09 15:01:08 +01:00
qazal
286b480f82 do not replace assign with the offset buffer [pr] (#9387) 2025-03-08 11:57:44 +01:00
qazal
0d2762c010 prep refactor for adding buffer ops last [pr] (#9383)
* prep refactor for adding buffer ops last [pr]

* freeze buffers

* add swizzle_reduceop

* shape for reduceop_view_right

* simpler elementwise_view_right

* add shapetracker to const

* only const

* from process replay
2025-03-08 08:00:14 +01:00
nimlgen
243078dda9 am: optimize tlb usage (#9049)
* am: optimize tlb usage

* fxies

* comments

* tiny
2025-03-07 19:37:29 +03:00
geohotstan
088d86691b fix onnx gather and onnx auto_pad VALID mode (#9375)
* fix gather and auto_pad

* long -> int64
2025-03-07 10:27:23 -05:00
hooved
136cf7b8b1 hotfix: load >2 GiB from disk on macOS (#9361)
* enable loading >2 GiB buffer from disk on macOS

* handle None case raised by mypy

* add test

* revert fix to repro bug in CI

* tell CI to run a unit test for macOS

* reapply fix
2025-03-07 14:51:58 +08:00
nimlgen
9bd13de44c lower test_gemv_4096_16384 to 750 for red (#9367) 2025-03-05 22:44:48 +03:00
uuuvn
b75f307234 amd: autogen ip bases (#9360) 2025-03-05 22:30:38 +03:00
chenyu
2cb2fce8d9 lower test_gemm_8192 amd_tflops to 65 (#9364) 2025-03-05 14:06:11 -05:00
nimlgen
14c88abf27 add some options to allreduce bench (#9348) 2025-03-04 23:46:36 +03:00
Anish Umale
bafa40fe12 Tiny backend test_ops fix part1 (#9338)
* extract name methods from https://github.com/tinygrad/tinygrad/pull/9302

* t.grad.numpy() -> t.grad.cpu().numpy()

* revert TORCH_DEBUG change

* revert dtype change in aten.sum
2025-03-03 12:36:51 -05:00
George Hotz
0d4ba7dd87 import tinygrad.frontend.torch (#9337)
* import tinygrad.frontend.torch

* type ignore
2025-03-04 00:15:29 +08:00
qazal
23084fd850 merge merge_views and remove_movement_ops [pr] (#9333)
* merge merge_views and remove_movement_ops [pr]

* fix that assert
2025-03-03 12:38:59 +01:00
George Hotz
ece0a0f305 use empty for test instead of rand (#9332) 2025-03-03 16:19:06 +08:00
George Hotz
2cc4cb74f0 reorder binops (#9328)
* reorder binops

* test improvements + fix string tests

* ugh, okay this
2025-03-03 14:58:18 +08:00
chenyu
146eb73790 fix Tensor.view with a tuple arg (#9330) 2025-03-02 23:35:23 -05:00
chenyu
ba4b8c2c23 Tensor.copysign (#9329) 2025-03-02 21:33:49 -05:00
nimlgen
8cae00833c flaky test in ci (#9321) 2025-03-02 16:27:22 +03:00
Ali Ladjevardi
00028e87bb Failing test for not realizing intermediate expand in multi-GPU (#9320) 2025-03-02 12:54:48 +01:00
George Hotz
ba97fd0b9c hotfix: add test/external/external_benchmark_disk_raw 2025-03-02 02:32:15 +00:00
chenyu
cc2bbb0bf1 Tensor.isfinite (#9316) 2025-03-01 19:58:56 -05:00
geohotstan
d9ec05cea6 Test Onnx quantization behavior (#9301)
* add DynamicDequantizeLinear and corresponding tests

* wow qlinearops are round away from zero

* this passes locally...

* again

* try

* try separate test

* round to even again

* also add QLinearMul

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-01 19:21:58 -05:00
chenyu
fe0f860209 update test_ops for tensors from torch (#9308)
a few detach().numpy() -> detach().cpu().numpy()
2025-02-28 15:57:25 -05:00
chenyu
38d7aae3b7 onnx fmod (#9307) 2025-02-28 14:09:22 -05:00
chenyu
7c7db78feb support float mod (#9306)
also added spec check on Ops.MOD to be ints only
2025-02-28 13:33:58 -05:00
chenyu
90808e2dd0 div rounding_mode (#9304) 2025-02-28 11:38:25 -05:00
chenyu
3ae66e59a3 least_upper_float is at least default_float (#9303)
* least_upper_float is at least default_float

en route for div rounding mode. dtype of true int division would change from int32 to default_float, which matches torch too.

* fix bert acc
2025-02-28 10:41:56 -05:00
Eitan Turok
d657d5f754 [Bounty] Vectorize Transcendental (#9058)
* init

* cast everythig right

* more casting

* install pillow in test

* quick tests

* simplify

* quick tests

* delete test

* tests

* fix import error

* add vec to ldexp3k

* vec for bitcast

* some helper tests

* high level tests

* clean tests

* change tolerance so cuda passes

* ruff passes

* remove tests for transcendental helpers

* ruff passes

* make exponent in power vectorized

* fix pow test

* add newline

* add vec dtype to ilogb2k

* comment + clean up

* ruff

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-28 15:47:25 +08:00
qazal
cdf66cc67f test: recompute expanded CAST (#9286)
* those views should merge

* diff cleanup

* gpu

* put it behind CAST_AFTER_EXPAND
2025-02-27 19:22:17 +01:00
chenyu
4342300eff lower test_gemm_8192 amd to 70 (#9277)
flaky
2025-02-26 16:32:08 -05:00
Francis Lata
86b737a120 leakyrelu to leaky_relu (#9270) 2025-02-26 13:22:08 -05:00
chenyu
cd822bbe11 hotfix torch_grad.detach().cpu().numpy() in test_ops (#9268) 2025-02-26 12:27:35 -05:00
chenyu
49ca90df75 update test_ops backward tests (#9267)
instead of `(out+1).square().mean().backward()`, use forward.sum().gradient to get closer to the gradients
2025-02-26 12:09:24 -05:00
chenyu
aaf0a8069f xor -> bitwise_xor (#9264) 2025-02-26 10:21:14 -05:00
qazal
e162aa862d is_realized only if buffer is allocated (#9253)
* is_realized only if the buffer is allocated

* fix the image check too

* assert test_lil_model after ExecItems run
2025-02-26 08:58:08 +01:00
George Hotz
3f4eb9006a test for device mismatch [pr] (#9250)
* test for device mismatch [pr]

* fix bert
2025-02-26 13:06:33 +08:00
Sieds Lykles
9c4d9d9f10 Acc first (#9232)
* put acc in front of the add chain

* handle the other case

* Make loop collapse more generic

* Remove mulacc_unrolled

* Actually remove it

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-25 22:10:15 -05:00
nimlgen
70db8c3003 hcq: dyn alloc signals (#9238)
* hcq: dyn alloc signals

* types and uniqueue devs

* typing

* mypy

* mypy one more time

* test

* make fds to not intersect in mockgpu between drivers
2025-02-25 17:22:24 +03:00
nimlgen
b4c3780df0 hotfix: interop example (#9237)
* hotfix: interop example

* rm this

* fix

* fix ci mps

* atol rtol

* no uaf
2025-02-25 10:32:00 +03:00
Sieds Lykles
990c240b82 Stable pow gradient (#9226)
* Stable gradient

* More efficient

* Fix and test for +-inf

* cleaner

* skip webgpu test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-24 20:54:26 -05:00