Commit Graph

10417 Commits

Author SHA1 Message Date
chenyu
d5183e1584 remove unneeded annotation import (#8934) 2025-02-06 13:12:35 -05:00
chenyu
00d72a5144 setitem isinstance cleanup [pr] (#8932) 2025-02-06 11:44:57 -05:00
qazal
81e241150a hotfix: save 1 line (#8931)
* hotfix: save 1 line

* no unwrap
2025-02-06 17:26:05 +02:00
qazal
eb1144be8b hotfix: only check current graph when excluding nodes in viz (#8930) 2025-02-06 16:58:53 +02:00
George Hotz
3cc05081f4 llvm no devectorize, the right way (#8901)
* closer

* env flag + transcendental issue
2025-02-06 22:53:49 +08:00
George Hotz
8b16c65bca add compile3 benchmark [pr] (#8929) 2025-02-06 22:49:31 +08:00
qazal
79fb5c6470 hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] (#8928) 2025-02-06 16:27:59 +02:00
George Hotz
1249e8dd3b objc fast msg, try 2 [pr] (#8927) 2025-02-06 19:06:21 +08:00
nimlgen
86feb98dcd am: add support for 7600 (#8910)
* am: start to add support for 7600

* test_tiny passes

* mmhub 3 0 2

* cleaner
2025-02-06 14:04:07 +03:00
George Hotz
ae45826758 hotfix: GRAPH_ONE_KERNEL + fix timing 2025-02-06 17:52:20 +08:00
George Hotz
1c53e8bf27 Revert "objc fast msg (#8922)" (#8926)
This reverts commit c3f99a727e.
2025-02-06 17:50:49 +08:00
George Hotz
c3f99a727e objc fast msg (#8922)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* new objc message style [pr]

* without sync

* no div 0

* lru cache that

* no sync in the profile

* fix

* update all to new style

* remove comment

* graph one kernel

* fix graph one kernel

* remove that sync
2025-02-06 17:49:06 +08:00
qazal
a2e7e49fe1 prepickle scheduler process replay [pr] (#8924) 2025-02-06 10:16:36 +01:00
qazal
89d7480b0c hotfix: don't sink views [pr] (#8923) 2025-02-06 09:15:12 +01:00
George Hotz
0cbb7d7f1e hotfix: metal has known sync issue 2025-02-06 14:29:41 +08:00
George Hotz
a8e54df363 benchmark single kernel launch (#8921)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* without sync

* no div 0

* lru cache that

* no sync in the profile
2025-02-06 13:35:34 +08:00
George Hotz
3e082d4a9d add float4 support to LLVM (#8920)
* add float4 support to LLVM

* is_bool
2025-02-06 12:15:50 +08:00
George Hotz
b05c536f74 cleanup some llvm stuff [pr] (#8919)
* cleanup some llvm stuff [pr]

* debug

* default to newer llvm

* repr
2025-02-06 11:45:03 +08:00
Josh Moore
44e0eab8fd Fix AttributeError occurring after ValueError in _apply_uop (#8905)
* Fix AttributeError occurring after ValueError in _apply_uop

* Update tensor.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-06 10:56:29 +08:00
chenyu
30695da256 remove Tensor._to_const_val (#8917)
* remove Tensor._to_const_val

added a TODO for advance indexing on const, which was the last place that checks const in Tensor

* that is not folding now

* one more
2025-02-05 21:44:39 -05:00
George Hotz
d09b5f801c don't use Tensor new, add to all_tensors after constructions [pr] (#8918) 2025-02-06 10:21:32 +08:00
FICTURE7
759b3f86bf Pass host CPU features to LLVM target (#8909)
* Pass host CPU features to LLVM target

This gets `test_gemm_fp16` to pass on Windows. It would fail because the
generated machine code would call compiler-rt functions to to perform
truncating. This gets the test to pass on some hardware, because LLVM
gets access to more instructions. Essentially this is similar to
`-march=native`.

Unless this was intentionally left as is to be re-implemented fully in
LLVM IR or something.

* Fix linter complaints
2025-02-06 10:19:30 +08:00
uuuvn
09ec33a578 Better errors when relocating against undefined symbol (#8902) 2025-02-06 10:13:44 +08:00
chenyu
488200f16c move more pow const to rewrite (#8916)
* move more pow const to rewrite

one less use of _to_const_val

* fix
2025-02-05 20:30:12 -05:00
chenyu
76671381aa move positive const ** t to a rewrite rule (#8914)
* move positive const ** t to a rewrite rule

* one more test
2025-02-05 19:30:12 -05:00
Ignacio Sica
cad44f5f42 add Half-Precision Accumulation Support for Tensor Cores in NV, CUDA, and PTX (#8680)
* ptx and nv rendering refactor to work with half acc

* ptx fix!

* use same reg for acc and out

* fix comment

* another fix

* minor change in commet

* fix

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-02-05 16:56:37 -05:00
nimlgen
17f9b1cef6 am: load fw based on versions (#8913)
* am: load fw based on versions

* ops

* ops2
2025-02-06 00:02:09 +03:00
chenyu
189bfa164e enable backward test for pow(neg const ** x) (#8912)
backward works now. 0**x still does not work because it's a special case fixed in transcendental
2025-02-05 15:35:21 -05:00
chenyu
9307572fe3 Ops.POW and transcendental (#8911) 2025-02-05 15:15:59 -05:00
nimlgen
bff7c70eef hcq: better var check (#8908) 2025-02-05 22:38:59 +03:00
Ignacio Sica
aec3b8d515 add regression test: test_get_kernel_actions_preserves_actions_state (#8907)
* test_get_kernel_actions_preserves_actions_state

* simplify

* simplify

* refactor assert message
2025-02-05 14:13:01 -05:00
qazal
e71497aabc move assign ShapeTracker check to pattern matcher [pr] (#8906)
* move assign ShapeTracker check to pattern matcher [pr]

* rename the st uop to view
2025-02-05 19:47:20 +01:00
Ignacio Sica
0f6109ec00 hotfix bug in get_kernel_actions after TC_SEARCH_OVER_SHAPE was introduced (#8904)
* hotfix search bug

* copy actions
2025-02-05 13:10:05 -05:00
Ignacio Sica
15f94ac964 TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793)
* squash search over search

* refactor assert

* init benchmark

* cleaner get_kernel_actions

* cleaner get_kernel_actions

* add comment
2025-02-05 11:03:46 -05:00
qazal
e7edadda54 construct the sched_sink with graph_rewrite [pr] (#8903)
* construct the sched_sink with graph_rewrite

* diff

* move break_sched
2025-02-05 15:16:48 +01:00
qazal
ef7ad3f077 simpler subbuffer construction + copyin is always base (#8900)
* realize copy

* cleanup buffer_view

* smaller
2025-02-05 09:10:20 +01:00
qazal
6f0cc2e9c5 rename to KernelContext and move the linearize_sched comment [pr] (#8899)
* rename to KernelContext and move that comment [pr]

* 500
2025-02-05 07:49:58 +01:00
geohotstan
6fb0e5751b hotfix test_onnx_imagenet (#8897)
* start

* log severity

* only change this

* change abstraction so it's more usable for huggingface

* WHOOPS

* actually this is more correct
2025-02-05 14:39:55 +08:00
George Hotz
c1c5227acb preserve size in dtype ptr [pr] (#8898) 2025-02-05 14:38:57 +08:00
George Hotz
5844883e59 bump master version v0.10.1 2025-02-05 09:08:28 +08:00
uuuvn
a51c688f39 Cleanup llvm cleanup (and some clang things too) (#8871)
* Cleanup llvm cleanup (and some clang things too)

* Tests

* Tests 2

* forgot mockgpu

* more print some sources
2025-02-05 07:49:05 +08:00
eliotgolding
bb5ded85cc Don't rewrite idiv to rshift when numerator is negative (#8885)
* more conditions for shift rewrite mul/idiv

* make ptx test uint so the new condition is true

* delete idiv test

* rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division

* mul/div by 2**(large count) is unsupported anyway
2025-02-05 07:47:33 +08:00
pedro
666b6149bc Use full soname for libgcc_s in CPUProgram (#8642) (#8896)
Number after .so is abi version, it is always 1 for libgcc_s.
Most linux systems set default library versions via symlinks that are
simply followed to get actual elf, however conda does it via linker
scripts which ctypes doesn't follow (below contents of libgcc_s.so):
```
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library.  */
GROUP ( libgcc_s.so.1 -lgcc )
```
ctypes.util.find_library thinks that this is the actual elf and
ctypes.CDLL just loads this text file as a shared library. The result
is:
```
  File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram
    helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s'))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header
```

Co-authored-by: uuuvn <83587632+uuuvn@users.noreply.github.com>
2025-02-05 07:45:48 +08:00
chenyu
48349efdc1 copy is already contiguous (#8886) 2025-02-04 17:53:33 -05:00
nimlgen
4c28235bd1 am: remove hardcodes (#8895)
* am: remove hardcodes for 7900

* h
2025-02-05 00:52:53 +03:00
geohotstan
057c70b05f add onnx_helpers to extra and add ort validate to benchmark_onnx (#8890)
* start

* log severity

* only change this

* change abstraction so it's more usable for huggingface

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-04 16:36:01 -05:00
chenyu
89eebd4bfb pow cleanups (#8894)
more readable
2025-02-04 15:52:57 -05:00
qazal
7a9e3247c2 simple start to the Kernel UOp [pr] (#8893)
* simple start to a kernel [pr]

* add the sched_sink and spec

* rename kernels to sinks

* pylint complains
2025-02-04 21:48:15 +01:00
qazal
b4e8878e01 remove tensor_uops tracking from ScheduleContext [pr] (#8892)
* remove tensor_uops tracking from ScheduleContext [pr]

* cleaner
2025-02-04 20:34:15 +01:00
qazal
6a0da51ed0 truncate process replay logs [pr] (#8891)
* truncate process replay logs [pr]

* work

* max_lines

* bump to 1K
2025-02-04 20:26:48 +01:00