Commit Graph

7782 Commits

Author SHA1 Message Date
chenyu
cfd28517df move pow folding tests to test_schedule [pr] (#8955)
not really belongs to test_const_folding
2025-02-07 12:51:43 -05:00
George Hotz
c2b4c43edb handle stride 0 reduce (#8068)
* handle stride 0 reduce [pr]

* more test fixups

* a few more

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-02-07 15:40:58 +01:00
qazal
cf21e27d78 little better VIEW simplifier pattern [pr] (#8954) 2025-02-07 12:55:54 +01:00
qazal
329013f577 fix UOp.metadata on KERNEL op [pr] (#8953)
* fix UOp.metadata on KERNEL op [pr]

* hotfix: is not None
2025-02-07 12:40:11 +01:00
George Hotz
4de084a835 cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952)
* cleanup ci [pr]

* testing_minimal

* add hypothesis to minimal

* fail tiktoken import okay

* add LLVM speed test

* llvm speed w/o beam
2025-02-07 19:01:59 +08:00
uuuvn
6090cbe3be Try to open llvm first when opening metal (#8949)
* Try to open llvm first when opening metal

* Use more specific FileNotFoundError
2025-02-07 18:58:37 +08:00
uuuvn
67b70e4f6c Fix incorrect __del__ (#8950)
CPython doesn't make any guarantees about order in which globals like
`msg` or `libobjc` are destroyed when the interpreter shuts down

https://github.com/tinygrad/tinygrad/pull/8949 triggered the
unlucky ordering which lead to a bunch of errors at exit

There is also a bunch of other places where similar problems exist
2025-02-07 18:21:44 +08:00
George Hotz
9ed2d0dfa2 refactor into subactions (#8946)
* refactor into subactions

* this work?

* add shell

* move install opencl

* valid?

* support mac os x

* refactor other osx

* fix linux/osx

* fixes

* cleanups

* used everywhere

* no quotes

* quotes on true

* bugfixes

* this run?

* hardcode

* that

* process replay action

* fix checkout

* restore to branch

* fix caching

* fix osx python cache

* does replace function exist

* Revert "does replace function exist"

This reverts commit 622177c5a0.

* Revert "fix osx python cache"

This reverts commit e70d55cd93.

* user on osx to fix untar issue

* that
2025-02-07 18:06:44 +08:00
Ahmed Harmouche
133cacadde Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646)
* Switch to dawn, all tests passing locally

* Use dawn-python

* Skip failing test

* Skip midcast and fix timestamp on metal ci

* Autogen webgpu

* Try fetch dawn lib again

* /usr/lib

* Without lib prefix

* Test autogen diff

* Delete webgpu support, move everything to ops_webgpu

* mypy fix

* Simplify, refactor

* Line savings

* No ResultContainer

* Type annotation for result

* Some more simplifications

* Why was this explicit sync used at all?

* Refactor: delete functions that are only used once

* Create shader module inline

* Clear unit tests cache, maybe that solves it

* That wasn't it

* Try deleting cache to pass failing weight compare

* weights_only=False for pytorch 2.6

* Simplify ctype array creation

* Remove nanosecond precision timestamps

* Simplify error handling

* Refactor, add back type annotations

* Deleted custom submit function, refactor

* read_buffer simplify

* Fix use after free, refactor

* Simplify supported_features

* Runtime docs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 15:16:59 +08:00
George Hotz
dbda72f91d hotfix: raise line limit to 11200 for new webgpu backend 2025-02-07 14:29:20 +08:00
George Hotz
b1e1319972 ci speed on the enterprise plan [pr] (#8942) 2025-02-07 11:18:12 +08:00
Bhavya Gada
3b67712892 [bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937)
* fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple

* remove expectedFailure

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 10:07:54 +08:00
George Hotz
f54242849d failing test for the devectorize [pr] (#8940)
* failing test for the devectorize [pr]

* add DEVECTORIZE to method_cache
2025-02-07 09:44:54 +08:00
nimlgen
ee1a0fb8ec am_smi: print device name (#8939) 2025-02-07 03:01:25 +03:00
chenyu
a092b6395d Tuple -> tuple, List -> list [pr] (#8936) 2025-02-06 14:21:19 -05:00
chenyu
d5183e1584 remove unneeded annotation import (#8934) 2025-02-06 13:12:35 -05:00
chenyu
00d72a5144 setitem isinstance cleanup [pr] (#8932) 2025-02-06 11:44:57 -05:00
qazal
81e241150a hotfix: save 1 line (#8931)
* hotfix: save 1 line

* no unwrap
2025-02-06 17:26:05 +02:00
qazal
eb1144be8b hotfix: only check current graph when excluding nodes in viz (#8930) 2025-02-06 16:58:53 +02:00
George Hotz
3cc05081f4 llvm no devectorize, the right way (#8901)
* closer

* env flag + transcendental issue
2025-02-06 22:53:49 +08:00
George Hotz
8b16c65bca add compile3 benchmark [pr] (#8929) 2025-02-06 22:49:31 +08:00
qazal
79fb5c6470 hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] (#8928) 2025-02-06 16:27:59 +02:00
George Hotz
1249e8dd3b objc fast msg, try 2 [pr] (#8927) 2025-02-06 19:06:21 +08:00
nimlgen
86feb98dcd am: add support for 7600 (#8910)
* am: start to add support for 7600

* test_tiny passes

* mmhub 3 0 2

* cleaner
2025-02-06 14:04:07 +03:00
George Hotz
ae45826758 hotfix: GRAPH_ONE_KERNEL + fix timing 2025-02-06 17:52:20 +08:00
George Hotz
1c53e8bf27 Revert "objc fast msg (#8922)" (#8926)
This reverts commit c3f99a727e.
2025-02-06 17:50:49 +08:00
George Hotz
c3f99a727e objc fast msg (#8922)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* new objc message style [pr]

* without sync

* no div 0

* lru cache that

* no sync in the profile

* fix

* update all to new style

* remove comment

* graph one kernel

* fix graph one kernel

* remove that sync
2025-02-06 17:49:06 +08:00
qazal
a2e7e49fe1 prepickle scheduler process replay [pr] (#8924) 2025-02-06 10:16:36 +01:00
qazal
89d7480b0c hotfix: don't sink views [pr] (#8923) 2025-02-06 09:15:12 +01:00
George Hotz
0cbb7d7f1e hotfix: metal has known sync issue 2025-02-06 14:29:41 +08:00
George Hotz
a8e54df363 benchmark single kernel launch (#8921)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* without sync

* no div 0

* lru cache that

* no sync in the profile
2025-02-06 13:35:34 +08:00
George Hotz
3e082d4a9d add float4 support to LLVM (#8920)
* add float4 support to LLVM

* is_bool
2025-02-06 12:15:50 +08:00
George Hotz
b05c536f74 cleanup some llvm stuff [pr] (#8919)
* cleanup some llvm stuff [pr]

* debug

* default to newer llvm

* repr
2025-02-06 11:45:03 +08:00
Josh Moore
44e0eab8fd Fix AttributeError occurring after ValueError in _apply_uop (#8905)
* Fix AttributeError occurring after ValueError in _apply_uop

* Update tensor.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-06 10:56:29 +08:00
chenyu
30695da256 remove Tensor._to_const_val (#8917)
* remove Tensor._to_const_val

added a TODO for advance indexing on const, which was the last place that checks const in Tensor

* that is not folding now

* one more
2025-02-05 21:44:39 -05:00
George Hotz
d09b5f801c don't use Tensor new, add to all_tensors after constructions [pr] (#8918) 2025-02-06 10:21:32 +08:00
FICTURE7
759b3f86bf Pass host CPU features to LLVM target (#8909)
* Pass host CPU features to LLVM target

This gets `test_gemm_fp16` to pass on Windows. It would fail because the
generated machine code would call compiler-rt functions to to perform
truncating. This gets the test to pass on some hardware, because LLVM
gets access to more instructions. Essentially this is similar to
`-march=native`.

Unless this was intentionally left as is to be re-implemented fully in
LLVM IR or something.

* Fix linter complaints
2025-02-06 10:19:30 +08:00
uuuvn
09ec33a578 Better errors when relocating against undefined symbol (#8902) 2025-02-06 10:13:44 +08:00
chenyu
488200f16c move more pow const to rewrite (#8916)
* move more pow const to rewrite

one less use of _to_const_val

* fix
2025-02-05 20:30:12 -05:00
chenyu
76671381aa move positive const ** t to a rewrite rule (#8914)
* move positive const ** t to a rewrite rule

* one more test
2025-02-05 19:30:12 -05:00
Ignacio Sica
cad44f5f42 add Half-Precision Accumulation Support for Tensor Cores in NV, CUDA, and PTX (#8680)
* ptx and nv rendering refactor to work with half acc

* ptx fix!

* use same reg for acc and out

* fix comment

* another fix

* minor change in commet

* fix

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-02-05 16:56:37 -05:00
nimlgen
17f9b1cef6 am: load fw based on versions (#8913)
* am: load fw based on versions

* ops

* ops2
2025-02-06 00:02:09 +03:00
chenyu
189bfa164e enable backward test for pow(neg const ** x) (#8912)
backward works now. 0**x still does not work because it's a special case fixed in transcendental
2025-02-05 15:35:21 -05:00
chenyu
9307572fe3 Ops.POW and transcendental (#8911) 2025-02-05 15:15:59 -05:00
nimlgen
bff7c70eef hcq: better var check (#8908) 2025-02-05 22:38:59 +03:00
Ignacio Sica
aec3b8d515 add regression test: test_get_kernel_actions_preserves_actions_state (#8907)
* test_get_kernel_actions_preserves_actions_state

* simplify

* simplify

* refactor assert message
2025-02-05 14:13:01 -05:00
qazal
e71497aabc move assign ShapeTracker check to pattern matcher [pr] (#8906)
* move assign ShapeTracker check to pattern matcher [pr]

* rename the st uop to view
2025-02-05 19:47:20 +01:00
Ignacio Sica
0f6109ec00 hotfix bug in get_kernel_actions after TC_SEARCH_OVER_SHAPE was introduced (#8904)
* hotfix search bug

* copy actions
2025-02-05 13:10:05 -05:00
Ignacio Sica
15f94ac964 TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793)
* squash search over search

* refactor assert

* init benchmark

* cleaner get_kernel_actions

* cleaner get_kernel_actions

* add comment
2025-02-05 11:03:46 -05:00
qazal
e7edadda54 construct the sched_sink with graph_rewrite [pr] (#8903)
* construct the sched_sink with graph_rewrite

* diff

* move break_sched
2025-02-05 15:16:48 +01:00