George Hotz
0568720a68
delete revectorize ( #9000 )
...
* delete revectorize
* test vectorized LLVM/CLANG
* idk about that
* was that the segfault?
2025-02-10 18:32:35 +08:00
qazal
fd9f9ec772
realized base tensors become RESHAPE(BUFFER) [pr] ( #8994 )
2025-02-10 10:17:54 +01:00
George Hotz
910ae260cd
dsp float4 fold + revectorize [pr] ( #8995 )
...
* dsp float4 fold [pr]
* revectorize
* fix reg issue
* no bool vectorize
* cleanups
* no need for that
2025-02-10 12:14:32 +08:00
George Hotz
e618efce22
COMMUTATIVE flipping is only for ints ( #8996 )
...
* COMMUTATIVE flipping is only for ints [pr]
* no pr
* comm fixes this
2025-02-10 12:01:28 +08:00
George Hotz
2983285315
use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] ( #8993 )
...
* use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr]
* add quantize test to dsp
* fix tests
* older onnx
* debug, let's see what's happening
2025-02-10 11:07:35 +08:00
chenyu
9119716761
update Tensor.maximum ( #8992 )
...
now it's just broadcast and UOp.maximum
2025-02-09 21:26:27 -05:00
nimlgen
88add71c25
amd: increase sdma copy size ( #8989 )
...
* amd: increase sdma max copy size
* rm this
* fix
* fx
* ops
2025-02-09 20:53:35 +03:00
qazal
7eba5fb413
Tensor.empty is RESHAPE(BUFFER) ( #8987 )
...
* empty is RESHAPE(BUFFER)
* eh
* add test_empty_buf
* can we unsupport this
* linter
* Revert "can we unsupport this"
This reverts commit 0f71e1aadb .
2025-02-09 18:42:51 +01:00
qazal
44479f8ad6
raise ValueError in view reshape for negative dims [pr] ( #8988 )
2025-02-09 17:27:15 +01:00
nimlgen
c6c2373bc0
replace libpciaccess autogen with just pci regs ( #8983 )
...
* replace libpciaccess autogen with just pci regs
* add pci.py
2025-02-09 18:40:45 +03:00
qazal
55351ebb31
minimal failing test for #8975 [pr] ( #8982 )
2025-02-09 14:10:37 +01:00
nimlgen
e5a3f60fc2
am: remove libpciaccess dep ( #8980 )
...
* am: remove libpciaccess dep
* offset in mockhwiface
* op
* fake regions
2025-02-09 16:06:55 +03:00
nimlgen
52a69dd5e9
Revert "use am in training benchmarks ( #8965 )" ( #8981 )
...
This reverts commit 107e616857 .
2025-02-09 15:43:45 +03:00
George Hotz
0b26cee2f1
fix some slow tests [pr] ( #8979 )
2025-02-09 15:57:04 +08:00
George Hotz
208097d488
try reducing testing deps [pr] ( #8976 )
...
* reduce testing deps
* break out test models
* add PR to models, add models to metal
* okay, not that
* mac cleanup
* mac typo
* other typo
2025-02-09 15:22:32 +08:00
George Hotz
6ffee2fca9
reduce speed example [pr] ( #8978 )
...
* reduce speed example
* fast like a nascar
2025-02-09 14:13:59 +08:00
Samuel Ayala
ac3765c043
use getpass instead of os.getlogin() ( #8972 )
2025-02-08 23:29:26 +03:00
qazal
308516e439
fix viz paginate + cleanups [pr] ( #8973 )
...
* fix viz paginate [pr]
* cleanups
* remove the extra font definition
* more work
* none for the first graph
2025-02-08 20:26:57 +01:00
nimlgen
107e616857
use am in training benchmarks ( #8965 )
...
* am in training benchmarks
* fix
* not needed anymore
2025-02-08 20:20:47 +03:00
nimlgen
79de980565
am: do not fork pci bars ( #8969 )
2025-02-08 19:03:17 +03:00
chenyu
0cac941af1
move xpow to sym instead of late_rewrite ( #8968 )
...
does not need to be in late_rewrite and can be simplified further
2025-02-08 10:09:24 -05:00
qazal
e7182bbb2c
fix "fatal bad object" log in process replay [pr] ( #8966 )
2025-02-08 11:57:38 +01:00
uuuvn
9b9c1e14da
Late MTLCompiler load ( #8963 )
...
Moved loading MTLCompiler (and trying to load normal llvm before it)
to MetalCompiler, like in CPUProgram with helper
2025-02-08 17:29:23 +08:00
George Hotz
a3c78d47b3
speed docs + upgrades [pr] ( #8964 )
...
* add some docs about speed [pr]
* better torch gemm
* enable locals on llvm/clang
* disable locals for beam speed on LLVM/CLANG
* 0x20 alignment in llvm allows ymm use
2025-02-08 17:28:52 +08:00
George Hotz
5bdd6a1cc4
increase CI speed with more runners [pr] ( #8961 )
...
* increase CI speed with more runners [pr]
* splits + cleanups [pr]
* more runners
* need that dep
* split that too
* can't be minimal
* move test readme
* bugfix + naming
* one more split
* bump to 22.04
2025-02-08 09:04:36 +08:00
nimlgen
11d50324d8
am: tiny cleanups ( #8958 )
...
* am: start cleanups
* am
2025-02-07 23:44:43 +03:00
chenyu
cfd28517df
move pow folding tests to test_schedule [pr] ( #8955 )
...
not really belongs to test_const_folding
2025-02-07 12:51:43 -05:00
George Hotz
c2b4c43edb
handle stride 0 reduce ( #8068 )
...
* handle stride 0 reduce [pr]
* more test fixups
* a few more
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-02-07 15:40:58 +01:00
qazal
cf21e27d78
little better VIEW simplifier pattern [pr] ( #8954 )
2025-02-07 12:55:54 +01:00
qazal
329013f577
fix UOp.metadata on KERNEL op [pr] ( #8953 )
...
* fix UOp.metadata on KERNEL op [pr]
* hotfix: is not None
2025-02-07 12:40:11 +01:00
George Hotz
4de084a835
cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] ( #8952 )
...
* cleanup ci [pr]
* testing_minimal
* add hypothesis to minimal
* fail tiktoken import okay
* add LLVM speed test
* llvm speed w/o beam
2025-02-07 19:01:59 +08:00
uuuvn
6090cbe3be
Try to open llvm first when opening metal ( #8949 )
...
* Try to open llvm first when opening metal
* Use more specific FileNotFoundError
2025-02-07 18:58:37 +08:00
uuuvn
67b70e4f6c
Fix incorrect __del__ ( #8950 )
...
CPython doesn't make any guarantees about order in which globals like
`msg` or `libobjc` are destroyed when the interpreter shuts down
https://github.com/tinygrad/tinygrad/pull/8949 triggered the
unlucky ordering which lead to a bunch of errors at exit
There is also a bunch of other places where similar problems exist
2025-02-07 18:21:44 +08:00
George Hotz
9ed2d0dfa2
refactor into subactions ( #8946 )
...
* refactor into subactions
* this work?
* add shell
* move install opencl
* valid?
* support mac os x
* refactor other osx
* fix linux/osx
* fixes
* cleanups
* used everywhere
* no quotes
* quotes on true
* bugfixes
* this run?
* hardcode
* that
* process replay action
* fix checkout
* restore to branch
* fix caching
* fix osx python cache
* does replace function exist
* Revert "does replace function exist"
This reverts commit 622177c5a0 .
* Revert "fix osx python cache"
This reverts commit e70d55cd93 .
* user on osx to fix untar issue
* that
2025-02-07 18:06:44 +08:00
Ahmed Harmouche
133cacadde
Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) ( #8646 )
...
* Switch to dawn, all tests passing locally
* Use dawn-python
* Skip failing test
* Skip midcast and fix timestamp on metal ci
* Autogen webgpu
* Try fetch dawn lib again
* /usr/lib
* Without lib prefix
* Test autogen diff
* Delete webgpu support, move everything to ops_webgpu
* mypy fix
* Simplify, refactor
* Line savings
* No ResultContainer
* Type annotation for result
* Some more simplifications
* Why was this explicit sync used at all?
* Refactor: delete functions that are only used once
* Create shader module inline
* Clear unit tests cache, maybe that solves it
* That wasn't it
* Try deleting cache to pass failing weight compare
* weights_only=False for pytorch 2.6
* Simplify ctype array creation
* Remove nanosecond precision timestamps
* Simplify error handling
* Refactor, add back type annotations
* Deleted custom submit function, refactor
* read_buffer simplify
* Fix use after free, refactor
* Simplify supported_features
* Runtime docs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-07 15:16:59 +08:00
George Hotz
dbda72f91d
hotfix: raise line limit to 11200 for new webgpu backend
2025-02-07 14:29:20 +08:00
George Hotz
b1e1319972
ci speed on the enterprise plan [pr] ( #8942 )
2025-02-07 11:18:12 +08:00
Bhavya Gada
3b67712892
[bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple ( #8937 )
...
* fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple
* remove expectedFailure
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-07 10:07:54 +08:00
George Hotz
f54242849d
failing test for the devectorize [pr] ( #8940 )
...
* failing test for the devectorize [pr]
* add DEVECTORIZE to method_cache
2025-02-07 09:44:54 +08:00
nimlgen
ee1a0fb8ec
am_smi: print device name ( #8939 )
2025-02-07 03:01:25 +03:00
chenyu
a092b6395d
Tuple -> tuple, List -> list [pr] ( #8936 )
2025-02-06 14:21:19 -05:00
chenyu
d5183e1584
remove unneeded annotation import ( #8934 )
2025-02-06 13:12:35 -05:00
chenyu
00d72a5144
setitem isinstance cleanup [pr] ( #8932 )
2025-02-06 11:44:57 -05:00
qazal
81e241150a
hotfix: save 1 line ( #8931 )
...
* hotfix: save 1 line
* no unwrap
2025-02-06 17:26:05 +02:00
qazal
eb1144be8b
hotfix: only check current graph when excluding nodes in viz ( #8930 )
2025-02-06 16:58:53 +02:00
George Hotz
3cc05081f4
llvm no devectorize, the right way ( #8901 )
...
* closer
* env flag + transcendental issue
2025-02-06 22:53:49 +08:00
George Hotz
8b16c65bca
add compile3 benchmark [pr] ( #8929 )
2025-02-06 22:49:31 +08:00
qazal
79fb5c6470
hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] ( #8928 )
2025-02-06 16:27:59 +02:00
George Hotz
1249e8dd3b
objc fast msg, try 2 [pr] ( #8927 )
2025-02-06 19:06:21 +08:00
nimlgen
86feb98dcd
am: add support for 7600 ( #8910 )
...
* am: start to add support for 7600
* test_tiny passes
* mmhub 3 0 2
* cleaner
2025-02-06 14:04:07 +03:00