Francis Lata
dc394e8214
Merge branch 'master' into retinanet_mlperf
2025-02-27 15:33:20 -05:00
qazal
cdf66cc67f
test: recompute expanded CAST ( #9286 )
...
* those views should merge
* diff cleanup
* gpu
* put it behind CAST_AFTER_EXPAND
2025-02-27 19:22:17 +01:00
chenyu
4342300eff
lower test_gemm_8192 amd to 70 ( #9277 )
...
flaky
2025-02-26 16:32:08 -05:00
Francis Lata
4fa62ba304
Merge branch 'master' into retinanet_mlperf
2025-02-26 13:27:35 -05:00
Francis Lata
86b737a120
leakyrelu to leaky_relu ( #9270 )
2025-02-26 13:22:08 -05:00
chenyu
cd822bbe11
hotfix torch_grad.detach().cpu().numpy() in test_ops ( #9268 )
2025-02-26 12:27:35 -05:00
chenyu
49ca90df75
update test_ops backward tests ( #9267 )
...
instead of `(out+1).square().mean().backward()`, use forward.sum().gradient to get closer to the gradients
2025-02-26 12:09:24 -05:00
Francis Lata
e0e50fc482
Merge branch 'master' into retinanet_mlperf
2025-02-26 15:43:05 +00:00
chenyu
aaf0a8069f
xor -> bitwise_xor ( #9264 )
2025-02-26 10:21:14 -05:00
qazal
e162aa862d
is_realized only if buffer is allocated ( #9253 )
...
* is_realized only if the buffer is allocated
* fix the image check too
* assert test_lil_model after ExecItems run
2025-02-26 08:58:08 +01:00
Francis Lata
e006ae24ea
Merge branch 'master' into retinanet_mlperf
2025-02-26 07:31:32 +00:00
George Hotz
3f4eb9006a
test for device mismatch [pr] ( #9250 )
...
* test for device mismatch [pr]
* fix bert
2025-02-26 13:06:33 +08:00
Sieds Lykles
9c4d9d9f10
Acc first ( #9232 )
...
* put acc in front of the add chain
* handle the other case
* Make loop collapse more generic
* Remove mulacc_unrolled
* Actually remove it
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-25 22:10:15 -05:00
nimlgen
70db8c3003
hcq: dyn alloc signals ( #9238 )
...
* hcq: dyn alloc signals
* types and uniqueue devs
* typing
* mypy
* mypy one more time
* test
* make fds to not intersect in mockgpu between drivers
2025-02-25 17:22:24 +03:00
Francis Lata
30d5daa121
Merge branch 'master' into retinanet_mlperf
2025-02-25 10:32:34 +00:00
nimlgen
b4c3780df0
hotfix: interop example ( #9237 )
...
* hotfix: interop example
* rm this
* fix
* fix ci mps
* atol rtol
* no uaf
2025-02-25 10:32:00 +03:00
Sieds Lykles
990c240b82
Stable pow gradient ( #9226 )
...
* Stable gradient
* More efficient
* Fix and test for +-inf
* cleaner
* skip webgpu test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-24 20:54:26 -05:00
qazal
cbfe95d306
bring cast before view back ( #9230 )
...
* bring cast before view back
* tune it to only trigger on expands
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-25 01:50:39 +02:00
chenyu
90c3ed17c5
move cast to before softmax in attention ( #9213 )
...
* move cast to before softmax in attention
saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66)
* test
2025-02-24 17:24:59 -05:00
geohotstan
f0b24d230c
add test_onnx_ops.py ( #8569 )
...
* boom
* fix webgpu
* use exact variable names in test so that AI can read easier
* add tag for specific test name like test a specific dtype
* fix ruff
* astype everything
* dtype in array creation
* just arange
* is 67% considered fixed?
* move test up
* small cleanups
* share function
* add qgemm as well
* add qgemm too
* make sure qgemm comes out as int
* take out qgemm for now
* fixed test
* add correct qgemm
* addressing feedback here too, early naive fix for now
* simplify bias and c to be minimalistic enough to test correctness
* refactored qlinearops
* maybe these asserts aren't the best..
* fix test
* updated tests to cover new ops
* try to add to CI
* move test_onnx_ops into testextra/
* more attention tests
* qlinear_add atol=1
* attention still not fullllllly correct
* it is what it is
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-24 16:15:22 -05:00
George Hotz
c9493e41a6
reorder expand ( #9051 )
...
* reorder expand
* symbolic ops needs resolve here
* s/arg/st + whitespace
* viz
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-02-24 13:55:47 +01:00
qazal
14aa2395d0
allow VIEW(BUFFER) in Tensor UOps [pr] ( #9210 )
...
* allow VIEW(BUFFER) in Tensor UOps [pr]
* still reshapes
* update becomes_map tests
* bring copy folder to the scheduler
* lint
* only sgd left
* optimizer assign
* 13 kernels
* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
qazal
d12efc95d4
support custom name function in viz [pr] ( #9219 )
...
* support custom name function in viz [pr]
* title case
* assert name count in test_track_rewrites_name_fxn
2025-02-24 03:03:25 +02:00
chenyu
b3ae664d5d
fix gradient of pow(t, int) ( #9217 )
...
semi revert some pow logic back to tensor. added direct gradient check because the backward in test_ops passed by luck
2025-02-23 17:42:09 -05:00
Francis Lata
2c3417dfce
Merge branch 'master' into retinanet_mlperf
2025-02-23 21:23:28 +00:00
qazal
9db0ec46a7
simpler buf_uop [pr] ( #9215 )
...
* simpler buf_uop [pr]
* assert after realize it's buffer
2025-02-23 19:23:14 +01:00
qazal
81a71ae0f6
hotfix: skip test_exclude_const_metadata ( #9208 )
2025-02-22 23:26:04 +02:00
qazal
4578c3e8fd
simpler tensor metadata mapping + tests [pr] ( #9203 )
...
* simpler tensor metadata mapping + tests [pr]
* remove kernel metadata
* don't map nones
2025-02-22 20:18:46 +01:00
George Hotz
4e6665bda5
different way to write torch backend ( #9197 )
...
* different way to write torch backend
* both backends
* more work
* simpler code
* more work
* test both
* imply unwrap/wrap
* FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works
* ready to start making test_ops work in torch backend
* backward pass, TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works
* FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_simple_conv2d works
* matmul backward is broken with as_strided
2025-02-22 14:42:26 +08:00
qazal
2eab8021fb
remove inputs+outputs attributes from ScheduleItem [pr] ( #9192 )
...
* remove inputs/outputs from ScheduleItem
* fix test_linearizer
* fix test_conv_shapetracker
* fix test_schedule + lint
* test_image_dtype + multitensor + search
2025-02-21 13:48:11 +01:00
chenyu
2e7c2780a9
CLANG -> CPU ( #9189 )
2025-02-20 18:03:09 -05:00
chenyu
3e22747799
run unit test on windows ci ( #9187 )
...
* factor out testing_minimal in setup.py [pr]
* testing_unit + windows
2025-02-20 14:40:41 -05:00
chenyu
287de4ecc6
use torch in test_gradient ( #9186 )
...
used torch.autograd.grad, but not sure if it can be a template like jax
2025-02-20 12:26:11 -05:00
George Hotz
caee42e8a6
Revert "name from uops [pr] ( #9151 )" ( #9154 )
...
This reverts commit 28897be9a2 .
2025-02-18 16:06:44 +08:00
George Hotz
28897be9a2
name from uops [pr] ( #9151 )
2025-02-18 15:52:03 +08:00
George Hotz
a4dab3ec3f
add name uop ( #9149 )
...
* add name uop, TODO: refactor renderer to use
* renderer uses name uop
* fix tests
* render
* ptx
2025-02-18 15:26:58 +08:00
George Hotz
df3b320f46
rewriter -> devectorizer [pr] ( #9147 )
2025-02-18 12:42:08 +08:00
chenyu
465421b525
fix Tensor.isclose ( #9143 )
...
many corner cases around inf and nan
2025-02-17 12:03:12 -05:00
qazal
36741cbbc1
enable real_size assert for test_conv_2x2_backward_one_view [pr] ( #9142 )
2025-02-17 17:53:44 +01:00
Ali Ladjevardi
35e9c4657b
Use proper units when printing beam time ( #9103 )
...
* use proper units when printing beam time
* refactor DEBUG=2
2025-02-17 23:41:38 +08:00
Clément Verrier
a7f91224eb
add Tensor.isclose() ( #8844 )
...
* add `Tensor.isclose()`
* support `equal_nan`
so as to match PyTorch's behavior
* update unit tests
* remove some tests temporarily
* re-enable one test
* re-enable other test
* try to fix failing tests during CI
* save one line of code
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-17 10:11:40 -05:00
qazal
660c034da6
KERNEL op try 3 ( #9061 )
...
* work
* tolerate shape, maybe this is ASSIGN(RESHAPE(BUF), KERNEL)
* err, it's not ASSIGN(BUF, KERNEL), it's ASSIGN(VIEW(BUF), KERNEL)
* burn the boats
* assign slightly works
* assign works
* cleanup + var_vals can exist
* fine image + fix metadata
* metadata, without making everything 30% slower
* diff pruning
* faster assign schedule
* add_buffer_ops stage
* add kernel_spec back
* add viz display
* more strict kernel_spec
2025-02-17 14:47:54 +01:00
George Hotz
4dd10d03b7
move is_increasing to ops [pr] ( #9134 )
2025-02-17 19:27:48 +08:00
George Hotz
1bf66d62cf
symbolic gets its own file [pr] ( #9132 )
2025-02-17 18:55:21 +08:00
George Hotz
bd694faf6c
factor out the expander logic [pr] ( #9131 )
2025-02-17 18:09:48 +08:00
quortus
5bdf0c7951
Bitcast constant folding 2.0 ( #9089 )
...
* Prevent const folding in test_payne_hanek_reduction
* Do not use list as a default parameter
* Bitcast constant folding
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-17 18:08:20 +08:00
quortus
2be4529f14
Test broken const folding wraparound behavior ( #9080 )
...
* Test broken const folding wraparound behavior
* Add repro for test_payne_hanek_reduction const folding bug
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-17 17:44:56 +08:00
quortus
638d925e4e
Prevent const folding in test_payne_hanek_reduction ( #9088 )
...
* Prevent const folding in test_payne_hanek_reduction
* Do not use list as a default parameter
2025-02-17 17:31:10 +08:00
George Hotz
9289425170
add ast to ProgramSpec + pre matcher [pr] ( #9128 )
...
* add ast to ProgramSpec + pre matcher [pr]
* cleaner cast + test fix
2025-02-17 16:39:14 +08:00
quortus
edf7213f34
Make bitcast to the same dtype noop ( #9121 )
2025-02-16 20:28:44 -05:00