Commit Graph

8012 Commits

Author SHA1 Message Date
Francis Lata
86b737a120 leakyrelu to leaky_relu (#9270) 2025-02-26 13:22:08 -05:00
chenyu
cd822bbe11 hotfix torch_grad.detach().cpu().numpy() in test_ops (#9268) 2025-02-26 12:27:35 -05:00
chenyu
49ca90df75 update test_ops backward tests (#9267)
instead of `(out+1).square().mean().backward()`, use forward.sum().gradient to get closer to the gradients
2025-02-26 12:09:24 -05:00
chenyu
aaf0a8069f xor -> bitwise_xor (#9264) 2025-02-26 10:21:14 -05:00
George Hotz
2158dc4849 full fix for as_strided in torch backend (#9257)
* fixes from chargpt for torch backend

* shrink support

* add stride support

* comment cleanup

* a few more

* work

* import the stream hack

* llvm multi auto
2025-02-26 22:34:05 +08:00
qazal
f60f997bf7 viz ui fixes [pr] (#9261) 2025-02-26 14:52:18 +01:00
qazal
bfd1e55bda show zoom to fit button in VIZ if graph isn't in view [pr] (#9258)
* show zoom to fit button in VIZ if graph isn't in view [pr]

* select #render
2025-02-26 14:20:39 +01:00
qazal
f70bad42ce minor becomes_map cleanup + comments [pr] (#9256)
* substitute assign source for KERNEL + comments [pr]

* minor becomes_map cleanup + comments [pr]
2025-02-26 12:36:27 +01:00
George Hotz
7780393460 rig up torch's testing framework [pr] (#9254)
* rig up torch's testing framework [pr]

* support more movement ops

* dec on expand

* fix tests

* work

* fix tests

* a few more

* decomps + opt hook

* installed pytest
2025-02-26 18:46:22 +08:00
qazal
b3755370ae substitute assign source for KERNEL + comments [pr] (#9255) 2025-02-26 11:44:29 +01:00
qazal
941559098b do not lockup VIZ when rendering big graphs [pr] (#8795)
* new viz renderer

* aesthetics

* progress message

* pruning + timeout at 2s
2025-02-26 09:15:26 +01:00
qazal
e162aa862d is_realized only if buffer is allocated (#9253)
* is_realized only if the buffer is allocated

* fix the image check too

* assert test_lil_model after ExecItems run
2025-02-26 08:58:08 +01:00
George Hotz
b603af373e run some tests from torch [pr] (#9252)
* run some tests from torch [pr]

* yml

* wrap_out

* clean up for the new people

* a lil more
2025-02-26 15:42:22 +08:00
George Hotz
3f4eb9006a test for device mismatch [pr] (#9250)
* test for device mismatch [pr]

* fix bert
2025-02-26 13:06:33 +08:00
Sieds Lykles
9c4d9d9f10 Acc first (#9232)
* put acc in front of the add chain

* handle the other case

* Make loop collapse more generic

* Remove mulacc_unrolled

* Actually remove it

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-25 22:10:15 -05:00
chenyu
979e84f30e RESET_STEP in bert setup and beam (#9248)
running dev_beam migh OOM without it but runs fine in real run.
2025-02-25 19:15:10 -05:00
nimlgen
2676c9d46e dsp: raise exec errors as RuntimeError for beam (#9246) 2025-02-25 19:22:35 +03:00
nimlgen
70db8c3003 hcq: dyn alloc signals (#9238)
* hcq: dyn alloc signals

* types and uniqueue devs

* typing

* mypy

* mypy one more time

* test

* make fds to not intersect in mockgpu between drivers
2025-02-25 17:22:24 +03:00
chenyu
6610ad58ab hotfix bert no shard with only one device (#9243)
`LLVM=1 BERT_SIZE="tiny" DEFAULT_FLOAT=HALF BENCHMARK=5 MODEL="bert" python3 examples/mlperf/model_train.py` runs for me with this. it should not failed with single device shard though
2025-02-25 09:05:11 -05:00
qazal
bba9c22f53 implement the new subbuffer spec for DISK [pr] (#9241) 2025-02-25 13:36:23 +01:00
qazal
48dfed064a remove const/var from the kernel graph [pr] (#9240) 2025-02-25 12:21:55 +01:00
nimlgen
b4c3780df0 hotfix: interop example (#9237)
* hotfix: interop example

* rm this

* fix

* fix ci mps

* atol rtol

* no uaf
2025-02-25 10:32:00 +03:00
chenyu
8c7be428e5 update bert BS to 78 (#9236)
fits 78 now. about 215 tflops on green
2025-02-24 22:47:35 -05:00
Sieds Lykles
990c240b82 Stable pow gradient (#9226)
* Stable gradient

* More efficient

* Fix and test for +-inf

* cleaner

* skip webgpu test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-24 20:54:26 -05:00
chenyu
731d14e718 hotfix bump testmetal2 timeout-minutes to 20 (#9235)
setup is taking too long
2025-02-24 20:23:56 -05:00
qazal
cbfe95d306 bring cast before view back (#9230)
* bring cast before view back

* tune it to only trigger on expands

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-25 01:50:39 +02:00
chenyu
90c3ed17c5 move cast to before softmax in attention (#9213)
* move cast to before softmax in attention

saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66)

* test
2025-02-24 17:24:59 -05:00
geohotstan
f0b24d230c add test_onnx_ops.py (#8569)
* boom

* fix webgpu

* use exact variable names in test so that AI can read easier

* add tag for specific test name like test a specific dtype

* fix ruff

* astype everything

* dtype in array creation

* just arange

* is 67% considered fixed?

* move test up

* small cleanups

* share function

* add qgemm as well

* add qgemm too

* make sure qgemm comes out as int

* take out qgemm for now

* fixed test

* add correct qgemm

* addressing feedback here too, early naive fix for now

* simplify bias and c to be minimalistic enough to test correctness

* refactored qlinearops

* maybe these asserts aren't the best..

* fix test

* updated tests to cover new ops

* try to add to CI

* move test_onnx_ops into testextra/

* more attention tests

* qlinear_add atol=1

* attention still not fullllllly correct

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-24 16:15:22 -05:00
nimlgen
56288243e6 metal PyTorch interop (#9229)
* add from_blob support to mps cuda

* objc_id

* metal pytorch interop

* fix comments

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-02-24 22:36:08 +03:00
qazal
687d157906 delete cast early folding from ops [pr] (#9228) 2025-02-24 19:00:51 +01:00
George Hotz
c9493e41a6 reorder expand (#9051)
* reorder expand

* symbolic ops needs resolve here

* s/arg/st + whitespace

* viz

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-02-24 13:55:47 +01:00
qazal
14aa2395d0 allow VIEW(BUFFER) in Tensor UOps [pr] (#9210)
* allow VIEW(BUFFER) in Tensor UOps [pr]

* still reshapes

* update becomes_map tests

* bring copy folder to the scheduler

* lint

* only sgd left

* optimizer assign

* 13 kernels

* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
nimlgen
1d06d61b16 from_blob for cuda (#9223)
* from_blob for cuda

* maybe docs?

* minor docs

* example

* waiting 9224

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-24 14:02:06 +03:00
George Hotz
fc32ff80d6 torch and numpy dtype interop [pr] (#9224)
* torch and numpy dtype interop [pr]

* less lines

* order
2025-02-24 18:26:49 +08:00
George Hotz
24615db5f5 hotfix: torch cuda interop example 2025-02-24 09:02:48 +00:00
George Hotz
fd731e740a hotfix: add note on backend2.py 2025-02-24 11:23:03 +08:00
albanD
f2dd9c1562 simplify c++ code (#9221) 2025-02-24 11:04:41 +08:00
qazal
d12efc95d4 support custom name function in viz [pr] (#9219)
* support custom name function in viz [pr]

* title case

* assert name count in test_track_rewrites_name_fxn
2025-02-24 03:03:25 +02:00
chenyu
b3ae664d5d fix gradient of pow(t, int) (#9217)
semi revert some pow logic back to tensor. added direct gradient check because the backward in test_ops passed by luck
2025-02-23 17:42:09 -05:00
qazal
12b5b83821 set TRACK_MATCH_STATS=0 for real_strides [pr] (#9216) 2025-02-23 23:26:31 +02:00
qazal
9db0ec46a7 simpler buf_uop [pr] (#9215)
* simpler buf_uop [pr]

* assert after realize it's buffer
2025-02-23 19:23:14 +01:00
qazal
898aafe6fd move split_reduceop to scheduler + enable it for multi (#9214)
* move split_reduceop to scheduler + enable it for multi

* merge r and _reduceop
2025-02-23 17:30:04 +01:00
ShikChen
05e3202fba remove unused memsize_to_str and minor cleanups [pr] (#9211)
* fix edge cases in memsize_to_str()

Inputs <= 1 now return "0.00 B" for 0 and "1.00 B" for 1, avoiding an
IndexError. Also, memsize_to_str(1000) now returns "1.00 KB" instead of
"1000.00 B".

Replaced the list comprehension with a next(...) generator for conciseness
and efficiency.

* simplify code using idiomatic python

- Remove the unused `memsize_to_str()` function in helpers.
- Use a tuple for checking multiple string prefixes/suffixes.
- Avoid unnecessary list construction by using iterables directly.
- Check None in @diskcache to ensure proper caching of falsy values.

* revert generators back to list comprehension

Sometimes building list first could be faster. Keep it as is.
2025-02-23 09:58:37 -05:00
qazal
81a71ae0f6 hotfix: skip test_exclude_const_metadata (#9208) 2025-02-22 23:26:04 +02:00
chenyu
e0adb1fc76 really run test_ops with TINY_BACKEND in ci (#9206)
was failing with `line 1: pytest: command not found`
2025-02-22 15:51:24 -05:00
qazal
e6d20c47e3 simpler becomes_map update [pr] (#9201)
* simpler becomes_map update

* err, no metadata for device

* simpler tensor metadata mapping + tests [pr]

* remove kernel metadata

* don't map nones

* pruning

* linter
2025-02-22 20:50:58 +01:00
qazal
4578c3e8fd simpler tensor metadata mapping + tests [pr] (#9203)
* simpler tensor metadata mapping + tests [pr]

* remove kernel metadata

* don't map nones
2025-02-22 20:18:46 +01:00
qazal
b711c6343a no early return + allow childless const/bind/var in kernel graph [pr] (#9202) 2025-02-22 19:28:22 +01:00
George Hotz
97bc723538 torch backend works for ResNet-18 (#9200)
* torch backend progress, a few more functions

* resnet works

* pillow

* tv
2025-02-22 22:16:23 +08:00
George Hotz
f92820d30d torch backend tests (#9198)
* torch backend tests

* pythonpath

* install ninja
2025-02-22 16:01:49 +08:00