Commit Graph

7991 Commits

Author SHA1 Message Date
nimlgen
b4c3780df0 hotfix: interop example (#9237)
* hotfix: interop example

* rm this

* fix

* fix ci mps

* atol rtol

* no uaf
2025-02-25 10:32:00 +03:00
chenyu
8c7be428e5 update bert BS to 78 (#9236)
fits 78 now. about 215 tflops on green
2025-02-24 22:47:35 -05:00
Sieds Lykles
990c240b82 Stable pow gradient (#9226)
* Stable gradient

* More efficient

* Fix and test for +-inf

* cleaner

* skip webgpu test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-24 20:54:26 -05:00
chenyu
731d14e718 hotfix bump testmetal2 timeout-minutes to 20 (#9235)
setup is taking too long
2025-02-24 20:23:56 -05:00
qazal
cbfe95d306 bring cast before view back (#9230)
* bring cast before view back

* tune it to only trigger on expands

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-25 01:50:39 +02:00
chenyu
90c3ed17c5 move cast to before softmax in attention (#9213)
* move cast to before softmax in attention

saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66)

* test
2025-02-24 17:24:59 -05:00
geohotstan
f0b24d230c add test_onnx_ops.py (#8569)
* boom

* fix webgpu

* use exact variable names in test so that AI can read easier

* add tag for specific test name like test a specific dtype

* fix ruff

* astype everything

* dtype in array creation

* just arange

* is 67% considered fixed?

* move test up

* small cleanups

* share function

* add qgemm as well

* add qgemm too

* make sure qgemm comes out as int

* take out qgemm for now

* fixed test

* add correct qgemm

* addressing feedback here too, early naive fix for now

* simplify bias and c to be minimalistic enough to test correctness

* refactored qlinearops

* maybe these asserts aren't the best..

* fix test

* updated tests to cover new ops

* try to add to CI

* move test_onnx_ops into testextra/

* more attention tests

* qlinear_add atol=1

* attention still not fullllllly correct

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-24 16:15:22 -05:00
nimlgen
56288243e6 metal PyTorch interop (#9229)
* add from_blob support to mps cuda

* objc_id

* metal pytorch interop

* fix comments

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-02-24 22:36:08 +03:00
qazal
687d157906 delete cast early folding from ops [pr] (#9228) 2025-02-24 19:00:51 +01:00
George Hotz
c9493e41a6 reorder expand (#9051)
* reorder expand

* symbolic ops needs resolve here

* s/arg/st + whitespace

* viz

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-02-24 13:55:47 +01:00
qazal
14aa2395d0 allow VIEW(BUFFER) in Tensor UOps [pr] (#9210)
* allow VIEW(BUFFER) in Tensor UOps [pr]

* still reshapes

* update becomes_map tests

* bring copy folder to the scheduler

* lint

* only sgd left

* optimizer assign

* 13 kernels

* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
nimlgen
1d06d61b16 from_blob for cuda (#9223)
* from_blob for cuda

* maybe docs?

* minor docs

* example

* waiting 9224

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-24 14:02:06 +03:00
George Hotz
fc32ff80d6 torch and numpy dtype interop [pr] (#9224)
* torch and numpy dtype interop [pr]

* less lines

* order
2025-02-24 18:26:49 +08:00
George Hotz
24615db5f5 hotfix: torch cuda interop example 2025-02-24 09:02:48 +00:00
George Hotz
fd731e740a hotfix: add note on backend2.py 2025-02-24 11:23:03 +08:00
albanD
f2dd9c1562 simplify c++ code (#9221) 2025-02-24 11:04:41 +08:00
qazal
d12efc95d4 support custom name function in viz [pr] (#9219)
* support custom name function in viz [pr]

* title case

* assert name count in test_track_rewrites_name_fxn
2025-02-24 03:03:25 +02:00
chenyu
b3ae664d5d fix gradient of pow(t, int) (#9217)
semi revert some pow logic back to tensor. added direct gradient check because the backward in test_ops passed by luck
2025-02-23 17:42:09 -05:00
qazal
12b5b83821 set TRACK_MATCH_STATS=0 for real_strides [pr] (#9216) 2025-02-23 23:26:31 +02:00
qazal
9db0ec46a7 simpler buf_uop [pr] (#9215)
* simpler buf_uop [pr]

* assert after realize it's buffer
2025-02-23 19:23:14 +01:00
qazal
898aafe6fd move split_reduceop to scheduler + enable it for multi (#9214)
* move split_reduceop to scheduler + enable it for multi

* merge r and _reduceop
2025-02-23 17:30:04 +01:00
ShikChen
05e3202fba remove unused memsize_to_str and minor cleanups [pr] (#9211)
* fix edge cases in memsize_to_str()

Inputs <= 1 now return "0.00 B" for 0 and "1.00 B" for 1, avoiding an
IndexError. Also, memsize_to_str(1000) now returns "1.00 KB" instead of
"1000.00 B".

Replaced the list comprehension with a next(...) generator for conciseness
and efficiency.

* simplify code using idiomatic python

- Remove the unused `memsize_to_str()` function in helpers.
- Use a tuple for checking multiple string prefixes/suffixes.
- Avoid unnecessary list construction by using iterables directly.
- Check None in @diskcache to ensure proper caching of falsy values.

* revert generators back to list comprehension

Sometimes building list first could be faster. Keep it as is.
2025-02-23 09:58:37 -05:00
qazal
81a71ae0f6 hotfix: skip test_exclude_const_metadata (#9208) 2025-02-22 23:26:04 +02:00
chenyu
e0adb1fc76 really run test_ops with TINY_BACKEND in ci (#9206)
was failing with `line 1: pytest: command not found`
2025-02-22 15:51:24 -05:00
qazal
e6d20c47e3 simpler becomes_map update [pr] (#9201)
* simpler becomes_map update

* err, no metadata for device

* simpler tensor metadata mapping + tests [pr]

* remove kernel metadata

* don't map nones

* pruning

* linter
2025-02-22 20:50:58 +01:00
qazal
4578c3e8fd simpler tensor metadata mapping + tests [pr] (#9203)
* simpler tensor metadata mapping + tests [pr]

* remove kernel metadata

* don't map nones
2025-02-22 20:18:46 +01:00
qazal
b711c6343a no early return + allow childless const/bind/var in kernel graph [pr] (#9202) 2025-02-22 19:28:22 +01:00
George Hotz
97bc723538 torch backend works for ResNet-18 (#9200)
* torch backend progress, a few more functions

* resnet works

* pillow

* tv
2025-02-22 22:16:23 +08:00
George Hotz
f92820d30d torch backend tests (#9198)
* torch backend tests

* pythonpath

* install ninja
2025-02-22 16:01:49 +08:00
George Hotz
4e6665bda5 different way to write torch backend (#9197)
* different way to write torch backend

* both backends

* more work

* simpler code

* more work

* test both

* imply unwrap/wrap

* FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works

* ready to start making test_ops work in torch backend

* backward pass, TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works

* FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_simple_conv2d works

* matmul backward is broken with as_strided
2025-02-22 14:42:26 +08:00
nimlgen
041b6d5678 am: load fw in batches (#9185)
* am: load fw in batches

* am: 1mb less fw copies

* mypy

* list
2025-02-21 23:21:31 +03:00
qazal
1db4341e9f move viz graph to lib/graph [pr] (#9196)
* move viz graph to lib/graph [pr]

* add package

* share with program
2025-02-21 21:04:07 +01:00
geohotstan
6587c7879b simple fixes to onnx (#9195)
* uncontroversial changes

* cleaner _prepare_quantize
2025-02-21 13:10:06 -05:00
Simon R
2318d7ac51 Add missing tinygrad.runtime.autogen.am to packages (#9194) 2025-02-21 15:38:24 +02:00
qazal
8bb80b6e5e reorder AST matchers + comments [pr] (#9193) 2025-02-21 14:31:15 +01:00
qazal
2eab8021fb remove inputs+outputs attributes from ScheduleItem [pr] (#9192)
* remove inputs/outputs from ScheduleItem

* fix test_linearizer

* fix test_conv_shapetracker

* fix test_schedule + lint

* test_image_dtype + multitensor + search
2025-02-21 13:48:11 +01:00
George Hotz
e87be0131e torch backend start (#9191)
* start torch backend

* progress

* ugh, you need cpp crap

* 1+1 works

* 1+1 works

* becoming a real backend

* ready to merge?
2025-02-21 16:57:28 +08:00
George Hotz
d3a21cced2 hotfix: bump version to 0.10.2 v0.10.2 2025-02-21 10:43:49 +08:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
nimlgen
f986e12f91 metal: choose compile spec based on macos (#9188)
* metal: choose compile spec based on macos

* correction
2025-02-21 00:43:39 +03:00
chenyu
3e22747799 run unit test on windows ci (#9187)
* factor out testing_minimal in setup.py [pr]

* testing_unit + windows
2025-02-20 14:40:41 -05:00
chenyu
287de4ecc6 use torch in test_gradient (#9186)
used torch.autograd.grad, but not sure if it can be a template like jax
2025-02-20 12:26:11 -05:00
qazal
574a905291 Fix running VIZ=1 after package installation + test (#9183)
* test running viz from pip install

* add pkg

* do 10 connection attempts

* include assets in package_data

* quiet curl

* better print
2025-02-20 15:02:00 +01:00
chenyu
1692087db5 _one_hot_along_dim input needs to be int (#9179)
* _one_hot_along_dim input needs to be int

indexing and onehot compare with arange, and non-int dtype is likely a bug
2025-02-20 09:00:43 -05:00
George Hotz
bf36967883 cuda hooking (#9180)
* cuda hooking

* progress

* more hook cuda

* fix params

* compile + cuMemHostAlloc hook

* work

* revert that
2025-02-20 19:20:01 +08:00
chenyu
3b37cc898b add bert tiny config (#9177)
set with BERT_SIZE=tiny. easier to study embedding and fusion
2025-02-19 14:57:03 -05:00
qazal
5662c898f1 correctly step through bottom_up_rewrites in viz [pr] (#9176) 2025-02-19 19:20:57 +01:00
peppingdore
b1ddb2a1a6 fix win32 CPUProgram missing cache flush (#9171)
* win32: fix missing inst cache flush, rename ptr->self.mem for consistency with posix code

* fix types, remove assert

* fix memory leak

* rm whitespace
2025-02-19 21:38:51 +08:00
qazal
1bb9d78c7a hotfix: add output buffer back to kernel parents + comment [pr] (#9174) 2025-02-19 14:22:01 +01:00
chenyu
975c318dbc bert use int32 for input ids (#9173)
original data was int32 for these. float might have caused precision issues
2025-02-19 08:17:27 -05:00