nimlgen
9bd13de44c
lower test_gemv_4096_16384 to 750 for red ( #9367 )
2025-03-05 22:44:48 +03:00
uuuvn
b75f307234
amd: autogen ip bases ( #9360 )
2025-03-05 22:30:38 +03:00
chenyu
2cb2fce8d9
lower test_gemm_8192 amd_tflops to 65 ( #9364 )
2025-03-05 14:06:11 -05:00
nimlgen
14c88abf27
add some options to allreduce bench ( #9348 )
2025-03-04 23:46:36 +03:00
Anish Umale
bafa40fe12
Tiny backend test_ops fix part1 ( #9338 )
...
* extract name methods from https://github.com/tinygrad/tinygrad/pull/9302
* t.grad.numpy() -> t.grad.cpu().numpy()
* revert TORCH_DEBUG change
* revert dtype change in aten.sum
2025-03-03 12:36:51 -05:00
George Hotz
0d4ba7dd87
import tinygrad.frontend.torch ( #9337 )
...
* import tinygrad.frontend.torch
* type ignore
2025-03-04 00:15:29 +08:00
qazal
23084fd850
merge merge_views and remove_movement_ops [pr] ( #9333 )
...
* merge merge_views and remove_movement_ops [pr]
* fix that assert
2025-03-03 12:38:59 +01:00
George Hotz
ece0a0f305
use empty for test instead of rand ( #9332 )
2025-03-03 16:19:06 +08:00
George Hotz
2cc4cb74f0
reorder binops ( #9328 )
...
* reorder binops
* test improvements + fix string tests
* ugh, okay this
2025-03-03 14:58:18 +08:00
chenyu
146eb73790
fix Tensor.view with a tuple arg ( #9330 )
2025-03-02 23:35:23 -05:00
chenyu
ba4b8c2c23
Tensor.copysign ( #9329 )
2025-03-02 21:33:49 -05:00
nimlgen
8cae00833c
flaky test in ci ( #9321 )
2025-03-02 16:27:22 +03:00
Ali Ladjevardi
00028e87bb
Failing test for not realizing intermediate expand in multi-GPU ( #9320 )
2025-03-02 12:54:48 +01:00
George Hotz
ba97fd0b9c
hotfix: add test/external/external_benchmark_disk_raw
2025-03-02 02:32:15 +00:00
chenyu
cc2bbb0bf1
Tensor.isfinite ( #9316 )
2025-03-01 19:58:56 -05:00
geohotstan
d9ec05cea6
Test Onnx quantization behavior ( #9301 )
...
* add DynamicDequantizeLinear and corresponding tests
* wow qlinearops are round away from zero
* this passes locally...
* again
* try
* try separate test
* round to even again
* also add QLinearMul
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-01 19:21:58 -05:00
chenyu
fe0f860209
update test_ops for tensors from torch ( #9308 )
...
a few detach().numpy() -> detach().cpu().numpy()
2025-02-28 15:57:25 -05:00
chenyu
38d7aae3b7
onnx fmod ( #9307 )
2025-02-28 14:09:22 -05:00
chenyu
7c7db78feb
support float mod ( #9306 )
...
also added spec check on Ops.MOD to be ints only
2025-02-28 13:33:58 -05:00
chenyu
90808e2dd0
div rounding_mode ( #9304 )
2025-02-28 11:38:25 -05:00
chenyu
3ae66e59a3
least_upper_float is at least default_float ( #9303 )
...
* least_upper_float is at least default_float
en route for div rounding mode. dtype of true int division would change from int32 to default_float, which matches torch too.
* fix bert acc
2025-02-28 10:41:56 -05:00
Eitan Turok
d657d5f754
[Bounty] Vectorize Transcendental ( #9058 )
...
* init
* cast everythig right
* more casting
* install pillow in test
* quick tests
* simplify
* quick tests
* delete test
* tests
* fix import error
* add vec to ldexp3k
* vec for bitcast
* some helper tests
* high level tests
* clean tests
* change tolerance so cuda passes
* ruff passes
* remove tests for transcendental helpers
* ruff passes
* make exponent in power vectorized
* fix pow test
* add newline
* add vec dtype to ilogb2k
* comment + clean up
* ruff
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-28 15:47:25 +08:00
qazal
cdf66cc67f
test: recompute expanded CAST ( #9286 )
...
* those views should merge
* diff cleanup
* gpu
* put it behind CAST_AFTER_EXPAND
2025-02-27 19:22:17 +01:00
chenyu
4342300eff
lower test_gemm_8192 amd to 70 ( #9277 )
...
flaky
2025-02-26 16:32:08 -05:00
Francis Lata
86b737a120
leakyrelu to leaky_relu ( #9270 )
2025-02-26 13:22:08 -05:00
chenyu
cd822bbe11
hotfix torch_grad.detach().cpu().numpy() in test_ops ( #9268 )
2025-02-26 12:27:35 -05:00
chenyu
49ca90df75
update test_ops backward tests ( #9267 )
...
instead of `(out+1).square().mean().backward()`, use forward.sum().gradient to get closer to the gradients
2025-02-26 12:09:24 -05:00
chenyu
aaf0a8069f
xor -> bitwise_xor ( #9264 )
2025-02-26 10:21:14 -05:00
qazal
e162aa862d
is_realized only if buffer is allocated ( #9253 )
...
* is_realized only if the buffer is allocated
* fix the image check too
* assert test_lil_model after ExecItems run
2025-02-26 08:58:08 +01:00
George Hotz
3f4eb9006a
test for device mismatch [pr] ( #9250 )
...
* test for device mismatch [pr]
* fix bert
2025-02-26 13:06:33 +08:00
Sieds Lykles
9c4d9d9f10
Acc first ( #9232 )
...
* put acc in front of the add chain
* handle the other case
* Make loop collapse more generic
* Remove mulacc_unrolled
* Actually remove it
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-25 22:10:15 -05:00
nimlgen
70db8c3003
hcq: dyn alloc signals ( #9238 )
...
* hcq: dyn alloc signals
* types and uniqueue devs
* typing
* mypy
* mypy one more time
* test
* make fds to not intersect in mockgpu between drivers
2025-02-25 17:22:24 +03:00
nimlgen
b4c3780df0
hotfix: interop example ( #9237 )
...
* hotfix: interop example
* rm this
* fix
* fix ci mps
* atol rtol
* no uaf
2025-02-25 10:32:00 +03:00
Sieds Lykles
990c240b82
Stable pow gradient ( #9226 )
...
* Stable gradient
* More efficient
* Fix and test for +-inf
* cleaner
* skip webgpu test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-24 20:54:26 -05:00
qazal
cbfe95d306
bring cast before view back ( #9230 )
...
* bring cast before view back
* tune it to only trigger on expands
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-25 01:50:39 +02:00
chenyu
90c3ed17c5
move cast to before softmax in attention ( #9213 )
...
* move cast to before softmax in attention
saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66)
* test
2025-02-24 17:24:59 -05:00
geohotstan
f0b24d230c
add test_onnx_ops.py ( #8569 )
...
* boom
* fix webgpu
* use exact variable names in test so that AI can read easier
* add tag for specific test name like test a specific dtype
* fix ruff
* astype everything
* dtype in array creation
* just arange
* is 67% considered fixed?
* move test up
* small cleanups
* share function
* add qgemm as well
* add qgemm too
* make sure qgemm comes out as int
* take out qgemm for now
* fixed test
* add correct qgemm
* addressing feedback here too, early naive fix for now
* simplify bias and c to be minimalistic enough to test correctness
* refactored qlinearops
* maybe these asserts aren't the best..
* fix test
* updated tests to cover new ops
* try to add to CI
* move test_onnx_ops into testextra/
* more attention tests
* qlinear_add atol=1
* attention still not fullllllly correct
* it is what it is
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-24 16:15:22 -05:00
George Hotz
c9493e41a6
reorder expand ( #9051 )
...
* reorder expand
* symbolic ops needs resolve here
* s/arg/st + whitespace
* viz
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-02-24 13:55:47 +01:00
qazal
14aa2395d0
allow VIEW(BUFFER) in Tensor UOps [pr] ( #9210 )
...
* allow VIEW(BUFFER) in Tensor UOps [pr]
* still reshapes
* update becomes_map tests
* bring copy folder to the scheduler
* lint
* only sgd left
* optimizer assign
* 13 kernels
* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
qazal
d12efc95d4
support custom name function in viz [pr] ( #9219 )
...
* support custom name function in viz [pr]
* title case
* assert name count in test_track_rewrites_name_fxn
2025-02-24 03:03:25 +02:00
chenyu
b3ae664d5d
fix gradient of pow(t, int) ( #9217 )
...
semi revert some pow logic back to tensor. added direct gradient check because the backward in test_ops passed by luck
2025-02-23 17:42:09 -05:00
qazal
9db0ec46a7
simpler buf_uop [pr] ( #9215 )
...
* simpler buf_uop [pr]
* assert after realize it's buffer
2025-02-23 19:23:14 +01:00
qazal
81a71ae0f6
hotfix: skip test_exclude_const_metadata ( #9208 )
2025-02-22 23:26:04 +02:00
qazal
4578c3e8fd
simpler tensor metadata mapping + tests [pr] ( #9203 )
...
* simpler tensor metadata mapping + tests [pr]
* remove kernel metadata
* don't map nones
2025-02-22 20:18:46 +01:00
George Hotz
4e6665bda5
different way to write torch backend ( #9197 )
...
* different way to write torch backend
* both backends
* more work
* simpler code
* more work
* test both
* imply unwrap/wrap
* FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works
* ready to start making test_ops work in torch backend
* backward pass, TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works
* FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_simple_conv2d works
* matmul backward is broken with as_strided
2025-02-22 14:42:26 +08:00
qazal
2eab8021fb
remove inputs+outputs attributes from ScheduleItem [pr] ( #9192 )
...
* remove inputs/outputs from ScheduleItem
* fix test_linearizer
* fix test_conv_shapetracker
* fix test_schedule + lint
* test_image_dtype + multitensor + search
2025-02-21 13:48:11 +01:00
chenyu
2e7c2780a9
CLANG -> CPU ( #9189 )
2025-02-20 18:03:09 -05:00
chenyu
3e22747799
run unit test on windows ci ( #9187 )
...
* factor out testing_minimal in setup.py [pr]
* testing_unit + windows
2025-02-20 14:40:41 -05:00
chenyu
287de4ecc6
use torch in test_gradient ( #9186 )
...
used torch.autograd.grad, but not sure if it can be a template like jax
2025-02-20 12:26:11 -05:00
George Hotz
caee42e8a6
Revert "name from uops [pr] ( #9151 )" ( #9154 )
...
This reverts commit 28897be9a2 .
2025-02-18 16:06:44 +08:00