chenyu
184030168d
fix aten.reflection_pad2d ( #9289 )
...
tested the torch doc example
2025-02-27 15:53:46 -05:00
chenyu
0de6585df0
fix aten.normal_ arg ( #9288 )
...
should be mean and std.
2025-02-27 15:36:25 -05:00
chenyu
8ee2b460ee
Tensor.var_mean ( #9287 )
2025-02-27 15:15:31 -05:00
qazal
cdf66cc67f
test: recompute expanded CAST ( #9286 )
...
* those views should merge
* diff cleanup
* gpu
* put it behind CAST_AFTER_EXPAND
2025-02-27 19:22:17 +01:00
nimlgen
43e60914f3
init torch hooking ( #9284 )
...
* smth
* mv
* prof wk
* revert and move
* fix
* nvprof
* fix and no print much
2025-02-27 19:36:55 +03:00
George Hotz
387ea41e99
increase speed of torch mnist: use gradient api ( #9282 )
2025-02-27 11:57:41 +08:00
Priyank Patel
a0764f0dc0
(bounty) Make mnist training run with torch backend ( #9233 )
...
* yml changes
* torch backend remove meta decomps and add test
* torch backend bump timeout for tests
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-27 11:32:25 +08:00
George Hotz
67ba073c55
hotfix: test accuracy in beautiful_mnist_torch
2025-02-27 11:18:59 +08:00
George Hotz
9088125a6a
a lil more torch ( #9280 )
2025-02-27 11:12:20 +08:00
George Hotz
b6a14911c8
start torch.compile support ( #9279 )
2025-02-27 10:29:51 +08:00
chenyu
4342300eff
lower test_gemm_8192 amd to 70 ( #9277 )
...
flaky
2025-02-26 16:32:08 -05:00
nimlgen
c4c29c8acc
nv: parse elf attrs ( #9275 )
...
* better
* hm
* hm
* fixed
2025-02-26 23:21:57 +03:00
chenyu
6350725e2d
simpler leaky_relu ( #9271 )
...
rendered as `*(data0+alu0) = ((val0<0.0f)?(0.01f*val0):val0);` instead of two wheres.
possible to update rewrite rules too
2025-02-26 13:43:48 -05:00
Francis Lata
86b737a120
leakyrelu to leaky_relu ( #9270 )
2025-02-26 13:22:08 -05:00
chenyu
cd822bbe11
hotfix torch_grad.detach().cpu().numpy() in test_ops ( #9268 )
2025-02-26 12:27:35 -05:00
chenyu
49ca90df75
update test_ops backward tests ( #9267 )
...
instead of `(out+1).square().mean().backward()`, use forward.sum().gradient to get closer to the gradients
2025-02-26 12:09:24 -05:00
chenyu
aaf0a8069f
xor -> bitwise_xor ( #9264 )
2025-02-26 10:21:14 -05:00
George Hotz
2158dc4849
full fix for as_strided in torch backend ( #9257 )
...
* fixes from chargpt for torch backend
* shrink support
* add stride support
* comment cleanup
* a few more
* work
* import the stream hack
* llvm multi auto
2025-02-26 22:34:05 +08:00
qazal
f60f997bf7
viz ui fixes [pr] ( #9261 )
2025-02-26 14:52:18 +01:00
qazal
bfd1e55bda
show zoom to fit button in VIZ if graph isn't in view [pr] ( #9258 )
...
* show zoom to fit button in VIZ if graph isn't in view [pr]
* select #render
2025-02-26 14:20:39 +01:00
qazal
f70bad42ce
minor becomes_map cleanup + comments [pr] ( #9256 )
...
* substitute assign source for KERNEL + comments [pr]
* minor becomes_map cleanup + comments [pr]
2025-02-26 12:36:27 +01:00
George Hotz
7780393460
rig up torch's testing framework [pr] ( #9254 )
...
* rig up torch's testing framework [pr]
* support more movement ops
* dec on expand
* fix tests
* work
* fix tests
* a few more
* decomps + opt hook
* installed pytest
2025-02-26 18:46:22 +08:00
qazal
b3755370ae
substitute assign source for KERNEL + comments [pr] ( #9255 )
2025-02-26 11:44:29 +01:00
qazal
941559098b
do not lockup VIZ when rendering big graphs [pr] ( #8795 )
...
* new viz renderer
* aesthetics
* progress message
* pruning + timeout at 2s
2025-02-26 09:15:26 +01:00
qazal
e162aa862d
is_realized only if buffer is allocated ( #9253 )
...
* is_realized only if the buffer is allocated
* fix the image check too
* assert test_lil_model after ExecItems run
2025-02-26 08:58:08 +01:00
George Hotz
b603af373e
run some tests from torch [pr] ( #9252 )
...
* run some tests from torch [pr]
* yml
* wrap_out
* clean up for the new people
* a lil more
2025-02-26 15:42:22 +08:00
George Hotz
3f4eb9006a
test for device mismatch [pr] ( #9250 )
...
* test for device mismatch [pr]
* fix bert
2025-02-26 13:06:33 +08:00
Sieds Lykles
9c4d9d9f10
Acc first ( #9232 )
...
* put acc in front of the add chain
* handle the other case
* Make loop collapse more generic
* Remove mulacc_unrolled
* Actually remove it
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-25 22:10:15 -05:00
chenyu
979e84f30e
RESET_STEP in bert setup and beam ( #9248 )
...
running dev_beam migh OOM without it but runs fine in real run.
2025-02-25 19:15:10 -05:00
nimlgen
2676c9d46e
dsp: raise exec errors as RuntimeError for beam ( #9246 )
2025-02-25 19:22:35 +03:00
nimlgen
70db8c3003
hcq: dyn alloc signals ( #9238 )
...
* hcq: dyn alloc signals
* types and uniqueue devs
* typing
* mypy
* mypy one more time
* test
* make fds to not intersect in mockgpu between drivers
2025-02-25 17:22:24 +03:00
chenyu
6610ad58ab
hotfix bert no shard with only one device ( #9243 )
...
`LLVM=1 BERT_SIZE="tiny" DEFAULT_FLOAT=HALF BENCHMARK=5 MODEL="bert" python3 examples/mlperf/model_train.py` runs for me with this. it should not failed with single device shard though
2025-02-25 09:05:11 -05:00
qazal
bba9c22f53
implement the new subbuffer spec for DISK [pr] ( #9241 )
2025-02-25 13:36:23 +01:00
qazal
48dfed064a
remove const/var from the kernel graph [pr] ( #9240 )
2025-02-25 12:21:55 +01:00
nimlgen
b4c3780df0
hotfix: interop example ( #9237 )
...
* hotfix: interop example
* rm this
* fix
* fix ci mps
* atol rtol
* no uaf
2025-02-25 10:32:00 +03:00
chenyu
8c7be428e5
update bert BS to 78 ( #9236 )
...
fits 78 now. about 215 tflops on green
2025-02-24 22:47:35 -05:00
Sieds Lykles
990c240b82
Stable pow gradient ( #9226 )
...
* Stable gradient
* More efficient
* Fix and test for +-inf
* cleaner
* skip webgpu test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-24 20:54:26 -05:00
chenyu
731d14e718
hotfix bump testmetal2 timeout-minutes to 20 ( #9235 )
...
setup is taking too long
2025-02-24 20:23:56 -05:00
qazal
cbfe95d306
bring cast before view back ( #9230 )
...
* bring cast before view back
* tune it to only trigger on expands
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-25 01:50:39 +02:00
chenyu
90c3ed17c5
move cast to before softmax in attention ( #9213 )
...
* move cast to before softmax in attention
saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66)
* test
2025-02-24 17:24:59 -05:00
geohotstan
f0b24d230c
add test_onnx_ops.py ( #8569 )
...
* boom
* fix webgpu
* use exact variable names in test so that AI can read easier
* add tag for specific test name like test a specific dtype
* fix ruff
* astype everything
* dtype in array creation
* just arange
* is 67% considered fixed?
* move test up
* small cleanups
* share function
* add qgemm as well
* add qgemm too
* make sure qgemm comes out as int
* take out qgemm for now
* fixed test
* add correct qgemm
* addressing feedback here too, early naive fix for now
* simplify bias and c to be minimalistic enough to test correctness
* refactored qlinearops
* maybe these asserts aren't the best..
* fix test
* updated tests to cover new ops
* try to add to CI
* move test_onnx_ops into testextra/
* more attention tests
* qlinear_add atol=1
* attention still not fullllllly correct
* it is what it is
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-24 16:15:22 -05:00
nimlgen
56288243e6
metal PyTorch interop ( #9229 )
...
* add from_blob support to mps cuda
* objc_id
* metal pytorch interop
* fix comments
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2025-02-24 22:36:08 +03:00
qazal
687d157906
delete cast early folding from ops [pr] ( #9228 )
2025-02-24 19:00:51 +01:00
George Hotz
c9493e41a6
reorder expand ( #9051 )
...
* reorder expand
* symbolic ops needs resolve here
* s/arg/st + whitespace
* viz
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-02-24 13:55:47 +01:00
qazal
14aa2395d0
allow VIEW(BUFFER) in Tensor UOps [pr] ( #9210 )
...
* allow VIEW(BUFFER) in Tensor UOps [pr]
* still reshapes
* update becomes_map tests
* bring copy folder to the scheduler
* lint
* only sgd left
* optimizer assign
* 13 kernels
* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
nimlgen
1d06d61b16
from_blob for cuda ( #9223 )
...
* from_blob for cuda
* maybe docs?
* minor docs
* example
* waiting 9224
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-24 14:02:06 +03:00
George Hotz
fc32ff80d6
torch and numpy dtype interop [pr] ( #9224 )
...
* torch and numpy dtype interop [pr]
* less lines
* order
2025-02-24 18:26:49 +08:00
George Hotz
24615db5f5
hotfix: torch cuda interop example
2025-02-24 09:02:48 +00:00
George Hotz
fd731e740a
hotfix: add note on backend2.py
2025-02-24 11:23:03 +08:00
albanD
f2dd9c1562
simplify c++ code ( #9221 )
2025-02-24 11:04:41 +08:00