Commit Graph

10633 Commits

Author SHA1 Message Date
qazal
687d157906 delete cast early folding from ops [pr] (#9228) 2025-02-24 19:00:51 +01:00
George Hotz
c9493e41a6 reorder expand (#9051)
* reorder expand

* symbolic ops needs resolve here

* s/arg/st + whitespace

* viz

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-02-24 13:55:47 +01:00
qazal
14aa2395d0 allow VIEW(BUFFER) in Tensor UOps [pr] (#9210)
* allow VIEW(BUFFER) in Tensor UOps [pr]

* still reshapes

* update becomes_map tests

* bring copy folder to the scheduler

* lint

* only sgd left

* optimizer assign

* 13 kernels

* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
nimlgen
1d06d61b16 from_blob for cuda (#9223)
* from_blob for cuda

* maybe docs?

* minor docs

* example

* waiting 9224

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-24 14:02:06 +03:00
George Hotz
fc32ff80d6 torch and numpy dtype interop [pr] (#9224)
* torch and numpy dtype interop [pr]

* less lines

* order
2025-02-24 18:26:49 +08:00
George Hotz
24615db5f5 hotfix: torch cuda interop example 2025-02-24 09:02:48 +00:00
George Hotz
fd731e740a hotfix: add note on backend2.py 2025-02-24 11:23:03 +08:00
albanD
f2dd9c1562 simplify c++ code (#9221) 2025-02-24 11:04:41 +08:00
qazal
d12efc95d4 support custom name function in viz [pr] (#9219)
* support custom name function in viz [pr]

* title case

* assert name count in test_track_rewrites_name_fxn
2025-02-24 03:03:25 +02:00
chenyu
b3ae664d5d fix gradient of pow(t, int) (#9217)
semi revert some pow logic back to tensor. added direct gradient check because the backward in test_ops passed by luck
2025-02-23 17:42:09 -05:00
qazal
12b5b83821 set TRACK_MATCH_STATS=0 for real_strides [pr] (#9216) 2025-02-23 23:26:31 +02:00
qazal
9db0ec46a7 simpler buf_uop [pr] (#9215)
* simpler buf_uop [pr]

* assert after realize it's buffer
2025-02-23 19:23:14 +01:00
qazal
898aafe6fd move split_reduceop to scheduler + enable it for multi (#9214)
* move split_reduceop to scheduler + enable it for multi

* merge r and _reduceop
2025-02-23 17:30:04 +01:00
ShikChen
05e3202fba remove unused memsize_to_str and minor cleanups [pr] (#9211)
* fix edge cases in memsize_to_str()

Inputs <= 1 now return "0.00 B" for 0 and "1.00 B" for 1, avoiding an
IndexError. Also, memsize_to_str(1000) now returns "1.00 KB" instead of
"1000.00 B".

Replaced the list comprehension with a next(...) generator for conciseness
and efficiency.

* simplify code using idiomatic python

- Remove the unused `memsize_to_str()` function in helpers.
- Use a tuple for checking multiple string prefixes/suffixes.
- Avoid unnecessary list construction by using iterables directly.
- Check None in @diskcache to ensure proper caching of falsy values.

* revert generators back to list comprehension

Sometimes building list first could be faster. Keep it as is.
2025-02-23 09:58:37 -05:00
qazal
81a71ae0f6 hotfix: skip test_exclude_const_metadata (#9208) 2025-02-22 23:26:04 +02:00
chenyu
e0adb1fc76 really run test_ops with TINY_BACKEND in ci (#9206)
was failing with `line 1: pytest: command not found`
2025-02-22 15:51:24 -05:00
qazal
e6d20c47e3 simpler becomes_map update [pr] (#9201)
* simpler becomes_map update

* err, no metadata for device

* simpler tensor metadata mapping + tests [pr]

* remove kernel metadata

* don't map nones

* pruning

* linter
2025-02-22 20:50:58 +01:00
qazal
4578c3e8fd simpler tensor metadata mapping + tests [pr] (#9203)
* simpler tensor metadata mapping + tests [pr]

* remove kernel metadata

* don't map nones
2025-02-22 20:18:46 +01:00
qazal
b711c6343a no early return + allow childless const/bind/var in kernel graph [pr] (#9202) 2025-02-22 19:28:22 +01:00
George Hotz
97bc723538 torch backend works for ResNet-18 (#9200)
* torch backend progress, a few more functions

* resnet works

* pillow

* tv
2025-02-22 22:16:23 +08:00
George Hotz
f92820d30d torch backend tests (#9198)
* torch backend tests

* pythonpath

* install ninja
2025-02-22 16:01:49 +08:00
George Hotz
4e6665bda5 different way to write torch backend (#9197)
* different way to write torch backend

* both backends

* more work

* simpler code

* more work

* test both

* imply unwrap/wrap

* FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works

* ready to start making test_ops work in torch backend

* backward pass, TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works

* FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_simple_conv2d works

* matmul backward is broken with as_strided
2025-02-22 14:42:26 +08:00
nimlgen
041b6d5678 am: load fw in batches (#9185)
* am: load fw in batches

* am: 1mb less fw copies

* mypy

* list
2025-02-21 23:21:31 +03:00
qazal
1db4341e9f move viz graph to lib/graph [pr] (#9196)
* move viz graph to lib/graph [pr]

* add package

* share with program
2025-02-21 21:04:07 +01:00
geohotstan
6587c7879b simple fixes to onnx (#9195)
* uncontroversial changes

* cleaner _prepare_quantize
2025-02-21 13:10:06 -05:00
Simon R
2318d7ac51 Add missing tinygrad.runtime.autogen.am to packages (#9194) 2025-02-21 15:38:24 +02:00
qazal
8bb80b6e5e reorder AST matchers + comments [pr] (#9193) 2025-02-21 14:31:15 +01:00
qazal
2eab8021fb remove inputs+outputs attributes from ScheduleItem [pr] (#9192)
* remove inputs/outputs from ScheduleItem

* fix test_linearizer

* fix test_conv_shapetracker

* fix test_schedule + lint

* test_image_dtype + multitensor + search
2025-02-21 13:48:11 +01:00
George Hotz
e87be0131e torch backend start (#9191)
* start torch backend

* progress

* ugh, you need cpp crap

* 1+1 works

* 1+1 works

* becoming a real backend

* ready to merge?
2025-02-21 16:57:28 +08:00
George Hotz
d3a21cced2 hotfix: bump version to 0.10.2 v0.10.2 2025-02-21 10:43:49 +08:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
nimlgen
f986e12f91 metal: choose compile spec based on macos (#9188)
* metal: choose compile spec based on macos

* correction
2025-02-21 00:43:39 +03:00
chenyu
3e22747799 run unit test on windows ci (#9187)
* factor out testing_minimal in setup.py [pr]

* testing_unit + windows
2025-02-20 14:40:41 -05:00
chenyu
287de4ecc6 use torch in test_gradient (#9186)
used torch.autograd.grad, but not sure if it can be a template like jax
2025-02-20 12:26:11 -05:00
qazal
574a905291 Fix running VIZ=1 after package installation + test (#9183)
* test running viz from pip install

* add pkg

* do 10 connection attempts

* include assets in package_data

* quiet curl

* better print
2025-02-20 15:02:00 +01:00
chenyu
1692087db5 _one_hot_along_dim input needs to be int (#9179)
* _one_hot_along_dim input needs to be int

indexing and onehot compare with arange, and non-int dtype is likely a bug
2025-02-20 09:00:43 -05:00
George Hotz
bf36967883 cuda hooking (#9180)
* cuda hooking

* progress

* more hook cuda

* fix params

* compile + cuMemHostAlloc hook

* work

* revert that
2025-02-20 19:20:01 +08:00
chenyu
3b37cc898b add bert tiny config (#9177)
set with BERT_SIZE=tiny. easier to study embedding and fusion
2025-02-19 14:57:03 -05:00
qazal
5662c898f1 correctly step through bottom_up_rewrites in viz [pr] (#9176) 2025-02-19 19:20:57 +01:00
peppingdore
b1ddb2a1a6 fix win32 CPUProgram missing cache flush (#9171)
* win32: fix missing inst cache flush, rename ptr->self.mem for consistency with posix code

* fix types, remove assert

* fix memory leak

* rm whitespace
2025-02-19 21:38:51 +08:00
qazal
1bb9d78c7a hotfix: add output buffer back to kernel parents + comment [pr] (#9174) 2025-02-19 14:22:01 +01:00
chenyu
975c318dbc bert use int32 for input ids (#9173)
original data was int32 for these. float might have caused precision issues
2025-02-19 08:17:27 -05:00
qazal
e4a8bf28ea scheduler cleanups + better cycle assert [pr] (#9172)
* scheduler cleanups + better cycle assert [pr]

* type_verify after assign fixup

* don't need base

* always realize sink parents
2025-02-19 13:30:58 +01:00
qazal
cf315d544b rename can_pad arg to cache [pr] (#9170) 2025-02-19 12:24:59 +01:00
qazal
2fc8bf115d remove support for VIEW with two sources in ops [pr] (#9168)
* only 1 src views can exist [pr]

* views can still exist without a base, this is a separate project
2025-02-19 11:10:18 +01:00
Ahmed Harmouche
a2afa523a0 Only add enable f16 directive if ShaderF16 is supported (#9163)
* F16 in check in wgsl renderer

* Default in renderer to fix pickle

* Refactor f16 check
2025-02-19 17:20:03 +08:00
Ahmed Harmouche
0f94b98646 Force WebGPU backend type [pr] (#9164)
* Force webgpu backend type

* Mypy fix

* Rename to WEBGPU_BACKEND

* Add it to env_vars docs

* Remove link
2025-02-19 17:19:39 +08:00
qazal
4bc708a9b0 do not create buffers we never realize in scheduler (#9165)
* work

* delete

* fix

* works

* FUSE_CONV_BW

* FUSE_ARANGE

* becomes_map

* fix assign p1

* fix assign (diamond) - 2

* fix test_assign_double_diamond_reduce

* fix subbuffer

* faster rewrite

* fix simple_pads

* start metadata work

* do some diff cleanups

* make things that can't be images not images

* openpilot fix

* fix linter

* diff

* minimal diff

* more work on the diff

* metadata
2025-02-19 10:11:47 +01:00
George Hotz
1c4e9bc363 image fixup tensor map [pr] (#8611)
Co-authored-by: qazal <qazal.software@gmail.com>
2025-02-19 10:11:06 +02:00
qazal
2a5fe3e700 whitespace changes from the map_tensors branch [pr] (#9167) 2025-02-19 09:52:59 +02:00