Commit Graph

7774 Commits

Author SHA1 Message Date
Ahmed Harmouche
133cacadde Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646)
* Switch to dawn, all tests passing locally

* Use dawn-python

* Skip failing test

* Skip midcast and fix timestamp on metal ci

* Autogen webgpu

* Try fetch dawn lib again

* /usr/lib

* Without lib prefix

* Test autogen diff

* Delete webgpu support, move everything to ops_webgpu

* mypy fix

* Simplify, refactor

* Line savings

* No ResultContainer

* Type annotation for result

* Some more simplifications

* Why was this explicit sync used at all?

* Refactor: delete functions that are only used once

* Create shader module inline

* Clear unit tests cache, maybe that solves it

* That wasn't it

* Try deleting cache to pass failing weight compare

* weights_only=False for pytorch 2.6

* Simplify ctype array creation

* Remove nanosecond precision timestamps

* Simplify error handling

* Refactor, add back type annotations

* Deleted custom submit function, refactor

* read_buffer simplify

* Fix use after free, refactor

* Simplify supported_features

* Runtime docs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 15:16:59 +08:00
George Hotz
dbda72f91d hotfix: raise line limit to 11200 for new webgpu backend 2025-02-07 14:29:20 +08:00
George Hotz
b1e1319972 ci speed on the enterprise plan [pr] (#8942) 2025-02-07 11:18:12 +08:00
Bhavya Gada
3b67712892 [bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937)
* fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple

* remove expectedFailure

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 10:07:54 +08:00
George Hotz
f54242849d failing test for the devectorize [pr] (#8940)
* failing test for the devectorize [pr]

* add DEVECTORIZE to method_cache
2025-02-07 09:44:54 +08:00
nimlgen
ee1a0fb8ec am_smi: print device name (#8939) 2025-02-07 03:01:25 +03:00
chenyu
a092b6395d Tuple -> tuple, List -> list [pr] (#8936) 2025-02-06 14:21:19 -05:00
chenyu
d5183e1584 remove unneeded annotation import (#8934) 2025-02-06 13:12:35 -05:00
chenyu
00d72a5144 setitem isinstance cleanup [pr] (#8932) 2025-02-06 11:44:57 -05:00
qazal
81e241150a hotfix: save 1 line (#8931)
* hotfix: save 1 line

* no unwrap
2025-02-06 17:26:05 +02:00
qazal
eb1144be8b hotfix: only check current graph when excluding nodes in viz (#8930) 2025-02-06 16:58:53 +02:00
George Hotz
3cc05081f4 llvm no devectorize, the right way (#8901)
* closer

* env flag + transcendental issue
2025-02-06 22:53:49 +08:00
George Hotz
8b16c65bca add compile3 benchmark [pr] (#8929) 2025-02-06 22:49:31 +08:00
qazal
79fb5c6470 hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] (#8928) 2025-02-06 16:27:59 +02:00
George Hotz
1249e8dd3b objc fast msg, try 2 [pr] (#8927) 2025-02-06 19:06:21 +08:00
nimlgen
86feb98dcd am: add support for 7600 (#8910)
* am: start to add support for 7600

* test_tiny passes

* mmhub 3 0 2

* cleaner
2025-02-06 14:04:07 +03:00
George Hotz
ae45826758 hotfix: GRAPH_ONE_KERNEL + fix timing 2025-02-06 17:52:20 +08:00
George Hotz
1c53e8bf27 Revert "objc fast msg (#8922)" (#8926)
This reverts commit c3f99a727e.
2025-02-06 17:50:49 +08:00
George Hotz
c3f99a727e objc fast msg (#8922)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* new objc message style [pr]

* without sync

* no div 0

* lru cache that

* no sync in the profile

* fix

* update all to new style

* remove comment

* graph one kernel

* fix graph one kernel

* remove that sync
2025-02-06 17:49:06 +08:00
qazal
a2e7e49fe1 prepickle scheduler process replay [pr] (#8924) 2025-02-06 10:16:36 +01:00
qazal
89d7480b0c hotfix: don't sink views [pr] (#8923) 2025-02-06 09:15:12 +01:00
George Hotz
0cbb7d7f1e hotfix: metal has known sync issue 2025-02-06 14:29:41 +08:00
George Hotz
a8e54df363 benchmark single kernel launch (#8921)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* without sync

* no div 0

* lru cache that

* no sync in the profile
2025-02-06 13:35:34 +08:00
George Hotz
3e082d4a9d add float4 support to LLVM (#8920)
* add float4 support to LLVM

* is_bool
2025-02-06 12:15:50 +08:00
George Hotz
b05c536f74 cleanup some llvm stuff [pr] (#8919)
* cleanup some llvm stuff [pr]

* debug

* default to newer llvm

* repr
2025-02-06 11:45:03 +08:00
Josh Moore
44e0eab8fd Fix AttributeError occurring after ValueError in _apply_uop (#8905)
* Fix AttributeError occurring after ValueError in _apply_uop

* Update tensor.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-06 10:56:29 +08:00
chenyu
30695da256 remove Tensor._to_const_val (#8917)
* remove Tensor._to_const_val

added a TODO for advance indexing on const, which was the last place that checks const in Tensor

* that is not folding now

* one more
2025-02-05 21:44:39 -05:00
George Hotz
d09b5f801c don't use Tensor new, add to all_tensors after constructions [pr] (#8918) 2025-02-06 10:21:32 +08:00
FICTURE7
759b3f86bf Pass host CPU features to LLVM target (#8909)
* Pass host CPU features to LLVM target

This gets `test_gemm_fp16` to pass on Windows. It would fail because the
generated machine code would call compiler-rt functions to to perform
truncating. This gets the test to pass on some hardware, because LLVM
gets access to more instructions. Essentially this is similar to
`-march=native`.

Unless this was intentionally left as is to be re-implemented fully in
LLVM IR or something.

* Fix linter complaints
2025-02-06 10:19:30 +08:00
uuuvn
09ec33a578 Better errors when relocating against undefined symbol (#8902) 2025-02-06 10:13:44 +08:00
chenyu
488200f16c move more pow const to rewrite (#8916)
* move more pow const to rewrite

one less use of _to_const_val

* fix
2025-02-05 20:30:12 -05:00
chenyu
76671381aa move positive const ** t to a rewrite rule (#8914)
* move positive const ** t to a rewrite rule

* one more test
2025-02-05 19:30:12 -05:00
Ignacio Sica
cad44f5f42 add Half-Precision Accumulation Support for Tensor Cores in NV, CUDA, and PTX (#8680)
* ptx and nv rendering refactor to work with half acc

* ptx fix!

* use same reg for acc and out

* fix comment

* another fix

* minor change in commet

* fix

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-02-05 16:56:37 -05:00
nimlgen
17f9b1cef6 am: load fw based on versions (#8913)
* am: load fw based on versions

* ops

* ops2
2025-02-06 00:02:09 +03:00
chenyu
189bfa164e enable backward test for pow(neg const ** x) (#8912)
backward works now. 0**x still does not work because it's a special case fixed in transcendental
2025-02-05 15:35:21 -05:00
chenyu
9307572fe3 Ops.POW and transcendental (#8911) 2025-02-05 15:15:59 -05:00
nimlgen
bff7c70eef hcq: better var check (#8908) 2025-02-05 22:38:59 +03:00
Ignacio Sica
aec3b8d515 add regression test: test_get_kernel_actions_preserves_actions_state (#8907)
* test_get_kernel_actions_preserves_actions_state

* simplify

* simplify

* refactor assert message
2025-02-05 14:13:01 -05:00
qazal
e71497aabc move assign ShapeTracker check to pattern matcher [pr] (#8906)
* move assign ShapeTracker check to pattern matcher [pr]

* rename the st uop to view
2025-02-05 19:47:20 +01:00
Ignacio Sica
0f6109ec00 hotfix bug in get_kernel_actions after TC_SEARCH_OVER_SHAPE was introduced (#8904)
* hotfix search bug

* copy actions
2025-02-05 13:10:05 -05:00
Ignacio Sica
15f94ac964 TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793)
* squash search over search

* refactor assert

* init benchmark

* cleaner get_kernel_actions

* cleaner get_kernel_actions

* add comment
2025-02-05 11:03:46 -05:00
qazal
e7edadda54 construct the sched_sink with graph_rewrite [pr] (#8903)
* construct the sched_sink with graph_rewrite

* diff

* move break_sched
2025-02-05 15:16:48 +01:00
qazal
ef7ad3f077 simpler subbuffer construction + copyin is always base (#8900)
* realize copy

* cleanup buffer_view

* smaller
2025-02-05 09:10:20 +01:00
qazal
6f0cc2e9c5 rename to KernelContext and move the linearize_sched comment [pr] (#8899)
* rename to KernelContext and move that comment [pr]

* 500
2025-02-05 07:49:58 +01:00
geohotstan
6fb0e5751b hotfix test_onnx_imagenet (#8897)
* start

* log severity

* only change this

* change abstraction so it's more usable for huggingface

* WHOOPS

* actually this is more correct
2025-02-05 14:39:55 +08:00
George Hotz
c1c5227acb preserve size in dtype ptr [pr] (#8898) 2025-02-05 14:38:57 +08:00
George Hotz
5844883e59 bump master version v0.10.1 2025-02-05 09:08:28 +08:00
uuuvn
a51c688f39 Cleanup llvm cleanup (and some clang things too) (#8871)
* Cleanup llvm cleanup (and some clang things too)

* Tests

* Tests 2

* forgot mockgpu

* more print some sources
2025-02-05 07:49:05 +08:00
eliotgolding
bb5ded85cc Don't rewrite idiv to rshift when numerator is negative (#8885)
* more conditions for shift rewrite mul/idiv

* make ptx test uint so the new condition is true

* delete idiv test

* rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division

* mul/div by 2**(large count) is unsupported anyway
2025-02-05 07:47:33 +08:00
pedro
666b6149bc Use full soname for libgcc_s in CPUProgram (#8642) (#8896)
Number after .so is abi version, it is always 1 for libgcc_s.
Most linux systems set default library versions via symlinks that are
simply followed to get actual elf, however conda does it via linker
scripts which ctypes doesn't follow (below contents of libgcc_s.so):
```
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library.  */
GROUP ( libgcc_s.so.1 -lgcc )
```
ctypes.util.find_library thinks that this is the actual elf and
ctypes.CDLL just loads this text file as a shared library. The result
is:
```
  File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram
    helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s'))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header
```

Co-authored-by: uuuvn <83587632+uuuvn@users.noreply.github.com>
2025-02-05 07:45:48 +08:00