nimlgen
b4c3780df0
hotfix: interop example ( #9237 )
...
* hotfix: interop example
* rm this
* fix
* fix ci mps
* atol rtol
* no uaf
2025-02-25 10:32:00 +03:00
chenyu
8c7be428e5
update bert BS to 78 ( #9236 )
...
fits 78 now. about 215 tflops on green
2025-02-24 22:47:35 -05:00
Sieds Lykles
990c240b82
Stable pow gradient ( #9226 )
...
* Stable gradient
* More efficient
* Fix and test for +-inf
* cleaner
* skip webgpu test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-24 20:54:26 -05:00
chenyu
731d14e718
hotfix bump testmetal2 timeout-minutes to 20 ( #9235 )
...
setup is taking too long
2025-02-24 20:23:56 -05:00
qazal
cbfe95d306
bring cast before view back ( #9230 )
...
* bring cast before view back
* tune it to only trigger on expands
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-25 01:50:39 +02:00
chenyu
90c3ed17c5
move cast to before softmax in attention ( #9213 )
...
* move cast to before softmax in attention
saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66)
* test
2025-02-24 17:24:59 -05:00
geohotstan
f0b24d230c
add test_onnx_ops.py ( #8569 )
...
* boom
* fix webgpu
* use exact variable names in test so that AI can read easier
* add tag for specific test name like test a specific dtype
* fix ruff
* astype everything
* dtype in array creation
* just arange
* is 67% considered fixed?
* move test up
* small cleanups
* share function
* add qgemm as well
* add qgemm too
* make sure qgemm comes out as int
* take out qgemm for now
* fixed test
* add correct qgemm
* addressing feedback here too, early naive fix for now
* simplify bias and c to be minimalistic enough to test correctness
* refactored qlinearops
* maybe these asserts aren't the best..
* fix test
* updated tests to cover new ops
* try to add to CI
* move test_onnx_ops into testextra/
* more attention tests
* qlinear_add atol=1
* attention still not fullllllly correct
* it is what it is
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-24 16:15:22 -05:00
nimlgen
56288243e6
metal PyTorch interop ( #9229 )
...
* add from_blob support to mps cuda
* objc_id
* metal pytorch interop
* fix comments
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2025-02-24 22:36:08 +03:00
qazal
687d157906
delete cast early folding from ops [pr] ( #9228 )
2025-02-24 19:00:51 +01:00
George Hotz
c9493e41a6
reorder expand ( #9051 )
...
* reorder expand
* symbolic ops needs resolve here
* s/arg/st + whitespace
* viz
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-02-24 13:55:47 +01:00
qazal
14aa2395d0
allow VIEW(BUFFER) in Tensor UOps [pr] ( #9210 )
...
* allow VIEW(BUFFER) in Tensor UOps [pr]
* still reshapes
* update becomes_map tests
* bring copy folder to the scheduler
* lint
* only sgd left
* optimizer assign
* 13 kernels
* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
nimlgen
1d06d61b16
from_blob for cuda ( #9223 )
...
* from_blob for cuda
* maybe docs?
* minor docs
* example
* waiting 9224
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-24 14:02:06 +03:00
George Hotz
fc32ff80d6
torch and numpy dtype interop [pr] ( #9224 )
...
* torch and numpy dtype interop [pr]
* less lines
* order
2025-02-24 18:26:49 +08:00
George Hotz
24615db5f5
hotfix: torch cuda interop example
2025-02-24 09:02:48 +00:00
George Hotz
fd731e740a
hotfix: add note on backend2.py
2025-02-24 11:23:03 +08:00
albanD
f2dd9c1562
simplify c++ code ( #9221 )
2025-02-24 11:04:41 +08:00
qazal
d12efc95d4
support custom name function in viz [pr] ( #9219 )
...
* support custom name function in viz [pr]
* title case
* assert name count in test_track_rewrites_name_fxn
2025-02-24 03:03:25 +02:00
chenyu
b3ae664d5d
fix gradient of pow(t, int) ( #9217 )
...
semi revert some pow logic back to tensor. added direct gradient check because the backward in test_ops passed by luck
2025-02-23 17:42:09 -05:00
qazal
12b5b83821
set TRACK_MATCH_STATS=0 for real_strides [pr] ( #9216 )
2025-02-23 23:26:31 +02:00
qazal
9db0ec46a7
simpler buf_uop [pr] ( #9215 )
...
* simpler buf_uop [pr]
* assert after realize it's buffer
2025-02-23 19:23:14 +01:00
qazal
898aafe6fd
move split_reduceop to scheduler + enable it for multi ( #9214 )
...
* move split_reduceop to scheduler + enable it for multi
* merge r and _reduceop
2025-02-23 17:30:04 +01:00
ShikChen
05e3202fba
remove unused memsize_to_str and minor cleanups [pr] ( #9211 )
...
* fix edge cases in memsize_to_str()
Inputs <= 1 now return "0.00 B" for 0 and "1.00 B" for 1, avoiding an
IndexError. Also, memsize_to_str(1000) now returns "1.00 KB" instead of
"1000.00 B".
Replaced the list comprehension with a next(...) generator for conciseness
and efficiency.
* simplify code using idiomatic python
- Remove the unused `memsize_to_str()` function in helpers.
- Use a tuple for checking multiple string prefixes/suffixes.
- Avoid unnecessary list construction by using iterables directly.
- Check None in @diskcache to ensure proper caching of falsy values.
* revert generators back to list comprehension
Sometimes building list first could be faster. Keep it as is.
2025-02-23 09:58:37 -05:00
qazal
81a71ae0f6
hotfix: skip test_exclude_const_metadata ( #9208 )
2025-02-22 23:26:04 +02:00
chenyu
e0adb1fc76
really run test_ops with TINY_BACKEND in ci ( #9206 )
...
was failing with `line 1: pytest: command not found`
2025-02-22 15:51:24 -05:00
qazal
e6d20c47e3
simpler becomes_map update [pr] ( #9201 )
...
* simpler becomes_map update
* err, no metadata for device
* simpler tensor metadata mapping + tests [pr]
* remove kernel metadata
* don't map nones
* pruning
* linter
2025-02-22 20:50:58 +01:00
qazal
4578c3e8fd
simpler tensor metadata mapping + tests [pr] ( #9203 )
...
* simpler tensor metadata mapping + tests [pr]
* remove kernel metadata
* don't map nones
2025-02-22 20:18:46 +01:00
qazal
b711c6343a
no early return + allow childless const/bind/var in kernel graph [pr] ( #9202 )
2025-02-22 19:28:22 +01:00
George Hotz
97bc723538
torch backend works for ResNet-18 ( #9200 )
...
* torch backend progress, a few more functions
* resnet works
* pillow
* tv
2025-02-22 22:16:23 +08:00
George Hotz
f92820d30d
torch backend tests ( #9198 )
...
* torch backend tests
* pythonpath
* install ninja
2025-02-22 16:01:49 +08:00
George Hotz
4e6665bda5
different way to write torch backend ( #9197 )
...
* different way to write torch backend
* both backends
* more work
* simpler code
* more work
* test both
* imply unwrap/wrap
* FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works
* ready to start making test_ops work in torch backend
* backward pass, TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works
* FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_simple_conv2d works
* matmul backward is broken with as_strided
2025-02-22 14:42:26 +08:00
nimlgen
041b6d5678
am: load fw in batches ( #9185 )
...
* am: load fw in batches
* am: 1mb less fw copies
* mypy
* list
2025-02-21 23:21:31 +03:00
qazal
1db4341e9f
move viz graph to lib/graph [pr] ( #9196 )
...
* move viz graph to lib/graph [pr]
* add package
* share with program
2025-02-21 21:04:07 +01:00
geohotstan
6587c7879b
simple fixes to onnx ( #9195 )
...
* uncontroversial changes
* cleaner _prepare_quantize
2025-02-21 13:10:06 -05:00
Simon R
2318d7ac51
Add missing tinygrad.runtime.autogen.am to packages ( #9194 )
2025-02-21 15:38:24 +02:00
qazal
8bb80b6e5e
reorder AST matchers + comments [pr] ( #9193 )
2025-02-21 14:31:15 +01:00
qazal
2eab8021fb
remove inputs+outputs attributes from ScheduleItem [pr] ( #9192 )
...
* remove inputs/outputs from ScheduleItem
* fix test_linearizer
* fix test_conv_shapetracker
* fix test_schedule + lint
* test_image_dtype + multitensor + search
2025-02-21 13:48:11 +01:00
George Hotz
e87be0131e
torch backend start ( #9191 )
...
* start torch backend
* progress
* ugh, you need cpp crap
* 1+1 works
* 1+1 works
* becoming a real backend
* ready to merge?
2025-02-21 16:57:28 +08:00
George Hotz
d3a21cced2
hotfix: bump version to 0.10.2
v0.10.2
2025-02-21 10:43:49 +08:00
chenyu
2e7c2780a9
CLANG -> CPU ( #9189 )
2025-02-20 18:03:09 -05:00
nimlgen
f986e12f91
metal: choose compile spec based on macos ( #9188 )
...
* metal: choose compile spec based on macos
* correction
2025-02-21 00:43:39 +03:00
chenyu
3e22747799
run unit test on windows ci ( #9187 )
...
* factor out testing_minimal in setup.py [pr]
* testing_unit + windows
2025-02-20 14:40:41 -05:00
chenyu
287de4ecc6
use torch in test_gradient ( #9186 )
...
used torch.autograd.grad, but not sure if it can be a template like jax
2025-02-20 12:26:11 -05:00
qazal
574a905291
Fix running VIZ=1 after package installation + test ( #9183 )
...
* test running viz from pip install
* add pkg
* do 10 connection attempts
* include assets in package_data
* quiet curl
* better print
2025-02-20 15:02:00 +01:00
chenyu
1692087db5
_one_hot_along_dim input needs to be int ( #9179 )
...
* _one_hot_along_dim input needs to be int
indexing and onehot compare with arange, and non-int dtype is likely a bug
2025-02-20 09:00:43 -05:00
George Hotz
bf36967883
cuda hooking ( #9180 )
...
* cuda hooking
* progress
* more hook cuda
* fix params
* compile + cuMemHostAlloc hook
* work
* revert that
2025-02-20 19:20:01 +08:00
chenyu
3b37cc898b
add bert tiny config ( #9177 )
...
set with BERT_SIZE=tiny. easier to study embedding and fusion
2025-02-19 14:57:03 -05:00
qazal
5662c898f1
correctly step through bottom_up_rewrites in viz [pr] ( #9176 )
2025-02-19 19:20:57 +01:00
peppingdore
b1ddb2a1a6
fix win32 CPUProgram missing cache flush ( #9171 )
...
* win32: fix missing inst cache flush, rename ptr->self.mem for consistency with posix code
* fix types, remove assert
* fix memory leak
* rm whitespace
2025-02-19 21:38:51 +08:00
qazal
1bb9d78c7a
hotfix: add output buffer back to kernel parents + comment [pr] ( #9174 )
2025-02-19 14:22:01 +01:00
chenyu
975c318dbc
bert use int32 for input ids ( #9173 )
...
original data was int32 for these. float might have caused precision issues
2025-02-19 08:17:27 -05:00