Commit Graph

7817 Commits

Author SHA1 Message Date
qazal
c52cd2b437 kernel op cleanups + use ScheduleItem [pr] (#9009) 2025-02-10 17:54:30 +01:00
chenyu
25fa5e4d5f enable backward tests in test_std_one_in_axis [pr] (#9007)
still one correction=0 case is broken

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-02-10 10:44:05 -05:00
qazal
d426f1ad6e don't open devices in lowering (#9008) 2025-02-10 15:28:51 +01:00
qazal
cfd3db7862 construct the schedule sink 2 (#8925)
* work

* delete preload

* fix metadata

* this can keep existing

* assign pruning

* dedup early

* bfs

* cycle asserts

* move assign check

* once
2025-02-10 22:23:02 +08:00
nimlgen
3e005ca0c2 am: resize bar0 to max supported (#9006) 2025-02-10 16:48:44 +03:00
nimlgen
07cb7e701c am: fix gfx usage at 100% (#9003)
* am: fix gfx usage at 100%

* not need

* not needed

* fix power con

* not supported on 7600
2025-02-10 16:48:23 +03:00
nimlgen
f91409f038 am: fix proclogs (#9004) 2025-02-10 16:38:58 +03:00
qazal
cd77e51810 fix tensor realization bug in #8975 (#8984)
* fix tensor realization bug in #8975

* that's a reshape now

* work

* works

* give those tests better names

* test when multiple mops result in the same ShapeTracker

* test_become_existing_buf_complex is enough

* that too
2025-02-10 13:51:30 +01:00
qazal
b17ec42b56 remove const_arg (#9002)
* remove const_arg

* use -m pytest

* remove test_const_arg test, variable arg on CONST does not exist.

* use base in test_const_dtype
2025-02-10 12:45:11 +01:00
George Hotz
0568720a68 delete revectorize (#9000)
* delete revectorize

* test vectorized LLVM/CLANG

* idk about that

* was that the segfault?
2025-02-10 18:32:35 +08:00
qazal
fd9f9ec772 realized base tensors become RESHAPE(BUFFER) [pr] (#8994) 2025-02-10 10:17:54 +01:00
George Hotz
910ae260cd dsp float4 fold + revectorize [pr] (#8995)
* dsp float4 fold [pr]

* revectorize

* fix reg issue

* no bool vectorize

* cleanups

* no need for that
2025-02-10 12:14:32 +08:00
George Hotz
e618efce22 COMMUTATIVE flipping is only for ints (#8996)
* COMMUTATIVE flipping is only for ints [pr]

* no pr

* comm fixes this
2025-02-10 12:01:28 +08:00
George Hotz
2983285315 use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993)
* use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr]

* add quantize test to dsp

* fix tests

* older onnx

* debug, let's see what's happening
2025-02-10 11:07:35 +08:00
chenyu
9119716761 update Tensor.maximum (#8992)
now it's just broadcast and UOp.maximum
2025-02-09 21:26:27 -05:00
nimlgen
88add71c25 amd: increase sdma copy size (#8989)
* amd: increase sdma max copy size

* rm this

* fix

* fx

* ops
2025-02-09 20:53:35 +03:00
qazal
7eba5fb413 Tensor.empty is RESHAPE(BUFFER) (#8987)
* empty is RESHAPE(BUFFER)

* eh

* add test_empty_buf

* can we unsupport this

* linter

* Revert "can we unsupport this"

This reverts commit 0f71e1aadb.
2025-02-09 18:42:51 +01:00
qazal
44479f8ad6 raise ValueError in view reshape for negative dims [pr] (#8988) 2025-02-09 17:27:15 +01:00
nimlgen
c6c2373bc0 replace libpciaccess autogen with just pci regs (#8983)
* replace libpciaccess autogen with just pci regs

* add pci.py
2025-02-09 18:40:45 +03:00
qazal
55351ebb31 minimal failing test for #8975 [pr] (#8982) 2025-02-09 14:10:37 +01:00
nimlgen
e5a3f60fc2 am: remove libpciaccess dep (#8980)
* am: remove libpciaccess dep

* offset in mockhwiface

* op

* fake regions
2025-02-09 16:06:55 +03:00
nimlgen
52a69dd5e9 Revert "use am in training benchmarks (#8965)" (#8981)
This reverts commit 107e616857.
2025-02-09 15:43:45 +03:00
George Hotz
0b26cee2f1 fix some slow tests [pr] (#8979) 2025-02-09 15:57:04 +08:00
George Hotz
208097d488 try reducing testing deps [pr] (#8976)
* reduce testing deps

* break out test models

* add PR to models, add models to metal

* okay, not that

* mac cleanup

* mac typo

* other typo
2025-02-09 15:22:32 +08:00
George Hotz
6ffee2fca9 reduce speed example [pr] (#8978)
* reduce speed example

* fast like a nascar
2025-02-09 14:13:59 +08:00
Samuel Ayala
ac3765c043 use getpass instead of os.getlogin() (#8972) 2025-02-08 23:29:26 +03:00
qazal
308516e439 fix viz paginate + cleanups [pr] (#8973)
* fix viz paginate [pr]

* cleanups

* remove the extra font definition

* more work

* none for the first graph
2025-02-08 20:26:57 +01:00
nimlgen
107e616857 use am in training benchmarks (#8965)
* am in training benchmarks

* fix

* not needed anymore
2025-02-08 20:20:47 +03:00
nimlgen
79de980565 am: do not fork pci bars (#8969) 2025-02-08 19:03:17 +03:00
chenyu
0cac941af1 move xpow to sym instead of late_rewrite (#8968)
does not need to be in late_rewrite and can be simplified further
2025-02-08 10:09:24 -05:00
qazal
e7182bbb2c fix "fatal bad object" log in process replay [pr] (#8966) 2025-02-08 11:57:38 +01:00
uuuvn
9b9c1e14da Late MTLCompiler load (#8963)
Moved loading MTLCompiler (and trying to load normal llvm before it)
to MetalCompiler, like in CPUProgram with helper
2025-02-08 17:29:23 +08:00
George Hotz
a3c78d47b3 speed docs + upgrades [pr] (#8964)
* add some docs about speed [pr]

* better torch gemm

* enable locals on llvm/clang

* disable locals for beam speed on LLVM/CLANG

* 0x20 alignment in llvm allows ymm use
2025-02-08 17:28:52 +08:00
George Hotz
5bdd6a1cc4 increase CI speed with more runners [pr] (#8961)
* increase CI speed with more runners [pr]

* splits + cleanups [pr]

* more runners

* need that dep

* split that too

* can't be minimal

* move test readme

* bugfix + naming

* one more split

* bump to 22.04
2025-02-08 09:04:36 +08:00
nimlgen
11d50324d8 am: tiny cleanups (#8958)
* am: start cleanups

* am
2025-02-07 23:44:43 +03:00
chenyu
cfd28517df move pow folding tests to test_schedule [pr] (#8955)
not really belongs to test_const_folding
2025-02-07 12:51:43 -05:00
George Hotz
c2b4c43edb handle stride 0 reduce (#8068)
* handle stride 0 reduce [pr]

* more test fixups

* a few more

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-02-07 15:40:58 +01:00
qazal
cf21e27d78 little better VIEW simplifier pattern [pr] (#8954) 2025-02-07 12:55:54 +01:00
qazal
329013f577 fix UOp.metadata on KERNEL op [pr] (#8953)
* fix UOp.metadata on KERNEL op [pr]

* hotfix: is not None
2025-02-07 12:40:11 +01:00
George Hotz
4de084a835 cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952)
* cleanup ci [pr]

* testing_minimal

* add hypothesis to minimal

* fail tiktoken import okay

* add LLVM speed test

* llvm speed w/o beam
2025-02-07 19:01:59 +08:00
uuuvn
6090cbe3be Try to open llvm first when opening metal (#8949)
* Try to open llvm first when opening metal

* Use more specific FileNotFoundError
2025-02-07 18:58:37 +08:00
uuuvn
67b70e4f6c Fix incorrect __del__ (#8950)
CPython doesn't make any guarantees about order in which globals like
`msg` or `libobjc` are destroyed when the interpreter shuts down

https://github.com/tinygrad/tinygrad/pull/8949 triggered the
unlucky ordering which lead to a bunch of errors at exit

There is also a bunch of other places where similar problems exist
2025-02-07 18:21:44 +08:00
George Hotz
9ed2d0dfa2 refactor into subactions (#8946)
* refactor into subactions

* this work?

* add shell

* move install opencl

* valid?

* support mac os x

* refactor other osx

* fix linux/osx

* fixes

* cleanups

* used everywhere

* no quotes

* quotes on true

* bugfixes

* this run?

* hardcode

* that

* process replay action

* fix checkout

* restore to branch

* fix caching

* fix osx python cache

* does replace function exist

* Revert "does replace function exist"

This reverts commit 622177c5a0.

* Revert "fix osx python cache"

This reverts commit e70d55cd93.

* user on osx to fix untar issue

* that
2025-02-07 18:06:44 +08:00
Ahmed Harmouche
133cacadde Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646)
* Switch to dawn, all tests passing locally

* Use dawn-python

* Skip failing test

* Skip midcast and fix timestamp on metal ci

* Autogen webgpu

* Try fetch dawn lib again

* /usr/lib

* Without lib prefix

* Test autogen diff

* Delete webgpu support, move everything to ops_webgpu

* mypy fix

* Simplify, refactor

* Line savings

* No ResultContainer

* Type annotation for result

* Some more simplifications

* Why was this explicit sync used at all?

* Refactor: delete functions that are only used once

* Create shader module inline

* Clear unit tests cache, maybe that solves it

* That wasn't it

* Try deleting cache to pass failing weight compare

* weights_only=False for pytorch 2.6

* Simplify ctype array creation

* Remove nanosecond precision timestamps

* Simplify error handling

* Refactor, add back type annotations

* Deleted custom submit function, refactor

* read_buffer simplify

* Fix use after free, refactor

* Simplify supported_features

* Runtime docs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 15:16:59 +08:00
George Hotz
dbda72f91d hotfix: raise line limit to 11200 for new webgpu backend 2025-02-07 14:29:20 +08:00
George Hotz
b1e1319972 ci speed on the enterprise plan [pr] (#8942) 2025-02-07 11:18:12 +08:00
Bhavya Gada
3b67712892 [bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937)
* fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple

* remove expectedFailure

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 10:07:54 +08:00
George Hotz
f54242849d failing test for the devectorize [pr] (#8940)
* failing test for the devectorize [pr]

* add DEVECTORIZE to method_cache
2025-02-07 09:44:54 +08:00
nimlgen
ee1a0fb8ec am_smi: print device name (#8939) 2025-02-07 03:01:25 +03:00
chenyu
a092b6395d Tuple -> tuple, List -> list [pr] (#8936) 2025-02-06 14:21:19 -05:00