Commit Graph

7748 Commits

Author SHA1 Message Date
chenyu
30695da256 remove Tensor._to_const_val (#8917)
* remove Tensor._to_const_val

added a TODO for advance indexing on const, which was the last place that checks const in Tensor

* that is not folding now

* one more
2025-02-05 21:44:39 -05:00
George Hotz
d09b5f801c don't use Tensor new, add to all_tensors after constructions [pr] (#8918) 2025-02-06 10:21:32 +08:00
FICTURE7
759b3f86bf Pass host CPU features to LLVM target (#8909)
* Pass host CPU features to LLVM target

This gets `test_gemm_fp16` to pass on Windows. It would fail because the
generated machine code would call compiler-rt functions to to perform
truncating. This gets the test to pass on some hardware, because LLVM
gets access to more instructions. Essentially this is similar to
`-march=native`.

Unless this was intentionally left as is to be re-implemented fully in
LLVM IR or something.

* Fix linter complaints
2025-02-06 10:19:30 +08:00
uuuvn
09ec33a578 Better errors when relocating against undefined symbol (#8902) 2025-02-06 10:13:44 +08:00
chenyu
488200f16c move more pow const to rewrite (#8916)
* move more pow const to rewrite

one less use of _to_const_val

* fix
2025-02-05 20:30:12 -05:00
chenyu
76671381aa move positive const ** t to a rewrite rule (#8914)
* move positive const ** t to a rewrite rule

* one more test
2025-02-05 19:30:12 -05:00
Ignacio Sica
cad44f5f42 add Half-Precision Accumulation Support for Tensor Cores in NV, CUDA, and PTX (#8680)
* ptx and nv rendering refactor to work with half acc

* ptx fix!

* use same reg for acc and out

* fix comment

* another fix

* minor change in commet

* fix

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-02-05 16:56:37 -05:00
nimlgen
17f9b1cef6 am: load fw based on versions (#8913)
* am: load fw based on versions

* ops

* ops2
2025-02-06 00:02:09 +03:00
chenyu
189bfa164e enable backward test for pow(neg const ** x) (#8912)
backward works now. 0**x still does not work because it's a special case fixed in transcendental
2025-02-05 15:35:21 -05:00
chenyu
9307572fe3 Ops.POW and transcendental (#8911) 2025-02-05 15:15:59 -05:00
nimlgen
bff7c70eef hcq: better var check (#8908) 2025-02-05 22:38:59 +03:00
Ignacio Sica
aec3b8d515 add regression test: test_get_kernel_actions_preserves_actions_state (#8907)
* test_get_kernel_actions_preserves_actions_state

* simplify

* simplify

* refactor assert message
2025-02-05 14:13:01 -05:00
qazal
e71497aabc move assign ShapeTracker check to pattern matcher [pr] (#8906)
* move assign ShapeTracker check to pattern matcher [pr]

* rename the st uop to view
2025-02-05 19:47:20 +01:00
Ignacio Sica
0f6109ec00 hotfix bug in get_kernel_actions after TC_SEARCH_OVER_SHAPE was introduced (#8904)
* hotfix search bug

* copy actions
2025-02-05 13:10:05 -05:00
Ignacio Sica
15f94ac964 TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793)
* squash search over search

* refactor assert

* init benchmark

* cleaner get_kernel_actions

* cleaner get_kernel_actions

* add comment
2025-02-05 11:03:46 -05:00
qazal
e7edadda54 construct the sched_sink with graph_rewrite [pr] (#8903)
* construct the sched_sink with graph_rewrite

* diff

* move break_sched
2025-02-05 15:16:48 +01:00
qazal
ef7ad3f077 simpler subbuffer construction + copyin is always base (#8900)
* realize copy

* cleanup buffer_view

* smaller
2025-02-05 09:10:20 +01:00
qazal
6f0cc2e9c5 rename to KernelContext and move the linearize_sched comment [pr] (#8899)
* rename to KernelContext and move that comment [pr]

* 500
2025-02-05 07:49:58 +01:00
geohotstan
6fb0e5751b hotfix test_onnx_imagenet (#8897)
* start

* log severity

* only change this

* change abstraction so it's more usable for huggingface

* WHOOPS

* actually this is more correct
2025-02-05 14:39:55 +08:00
George Hotz
c1c5227acb preserve size in dtype ptr [pr] (#8898) 2025-02-05 14:38:57 +08:00
George Hotz
5844883e59 bump master version v0.10.1 2025-02-05 09:08:28 +08:00
uuuvn
a51c688f39 Cleanup llvm cleanup (and some clang things too) (#8871)
* Cleanup llvm cleanup (and some clang things too)

* Tests

* Tests 2

* forgot mockgpu

* more print some sources
2025-02-05 07:49:05 +08:00
eliotgolding
bb5ded85cc Don't rewrite idiv to rshift when numerator is negative (#8885)
* more conditions for shift rewrite mul/idiv

* make ptx test uint so the new condition is true

* delete idiv test

* rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division

* mul/div by 2**(large count) is unsupported anyway
2025-02-05 07:47:33 +08:00
pedro
666b6149bc Use full soname for libgcc_s in CPUProgram (#8642) (#8896)
Number after .so is abi version, it is always 1 for libgcc_s.
Most linux systems set default library versions via symlinks that are
simply followed to get actual elf, however conda does it via linker
scripts which ctypes doesn't follow (below contents of libgcc_s.so):
```
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library.  */
GROUP ( libgcc_s.so.1 -lgcc )
```
ctypes.util.find_library thinks that this is the actual elf and
ctypes.CDLL just loads this text file as a shared library. The result
is:
```
  File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram
    helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s'))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header
```

Co-authored-by: uuuvn <83587632+uuuvn@users.noreply.github.com>
2025-02-05 07:45:48 +08:00
chenyu
48349efdc1 copy is already contiguous (#8886) 2025-02-04 17:53:33 -05:00
nimlgen
4c28235bd1 am: remove hardcodes (#8895)
* am: remove hardcodes for 7900

* h
2025-02-05 00:52:53 +03:00
geohotstan
057c70b05f add onnx_helpers to extra and add ort validate to benchmark_onnx (#8890)
* start

* log severity

* only change this

* change abstraction so it's more usable for huggingface

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-04 16:36:01 -05:00
chenyu
89eebd4bfb pow cleanups (#8894)
more readable
2025-02-04 15:52:57 -05:00
qazal
7a9e3247c2 simple start to the Kernel UOp [pr] (#8893)
* simple start to a kernel [pr]

* add the sched_sink and spec

* rename kernels to sinks

* pylint complains
2025-02-04 21:48:15 +01:00
qazal
b4e8878e01 remove tensor_uops tracking from ScheduleContext [pr] (#8892)
* remove tensor_uops tracking from ScheduleContext [pr]

* cleaner
2025-02-04 20:34:15 +01:00
qazal
6a0da51ed0 truncate process replay logs [pr] (#8891)
* truncate process replay logs [pr]

* work

* max_lines

* bump to 1K
2025-02-04 20:26:48 +01:00
qazal
c7c279a6bd unbind ShapeTrackers without maintaining a cache [pr] (#8889)
* replace with a try [pr]

* check vars

* ahaa
2025-02-04 19:43:41 +01:00
chenyu
61de654efa minor shard cleanup [pr] (#8888) 2025-02-04 13:22:31 -05:00
qazal
6ec7f1b00f replace UPat(name="x") with UPat.var("x") [pr] (#8887)
* replace UPat(name="x") with UPat.var("x") [pr]

* a few more
2025-02-04 19:12:40 +01:00
qazal
c26b06eaeb delete fold_img_cast [pr] (#8875) 2025-02-04 18:43:45 +01:00
qazal
acf0baefee process replay from tensor uops to kernel ast (#8883)
* process replay from tensor uops to kernel ast

* this dedups

* switch back to string key
2025-02-04 18:09:20 +01:00
Ignacio Sica
dcf104ee68 ptx wmma render refactor (#8873)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-04 11:01:23 -05:00
qazal
b92f36179d don't use set in schedule + add GroupOp.All [pr] (#8882)
* don't use set in schedule + add GroupOp.All [pr]

* update that
2025-02-04 08:19:27 +01:00
George Hotz
56fa5c1191 dsp simulator (#8869)
* dsp simulator

* progress

* fix

* close on test tiny

* working

* less waste

* line savings

* Device DSP compiler

* mock DSP at the bottom

* DSP tests

* docker caching

* test update

* need load

* skip that test for CI DSP

* last touch

* ugh
2025-02-04 09:45:04 +08:00
chenyu
836cf42c2e fix rand_like for multi (#8880) 2025-02-03 19:00:14 -05:00
chenyu
746d899dbd move multi axis to property (#8879)
also updated tests so that axis is known prior to realize
2025-02-03 16:02:09 -05:00
nimlgen
fa90079370 amd: reallocate scratch (#8872)
* amd: reallocate scratch

* use it

* oops

* allocate default

* mypy

* ops

* address realloc from none better

* types correct

* this better

* ops

* rm
2025-02-03 23:21:37 +03:00
chenyu
ec447a31e7 factor out get_axis in multi [pr] (#8878)
ALU/REDUCE_AXIS/RESHAPE/PERMUTE can change axis. prereq to move this logic to ops.py
2025-02-03 14:39:08 -05:00
chenyu
cce26009f0 simplify pow to not call cos (#8877)
use %2 instead of cos to detect even numbers
2025-02-03 12:54:18 -05:00
geohotstan
d1aa9f30bc copy onnx_ops into onnx (#8876)
* just copy it over

* make OnnxOps a global var

* some small style stuff

* rerun CI but also some small clean up

* some comments
2025-02-03 12:15:07 -05:00
Ali Ladjevardi
73c75d6ee1 DEFINE_LOCAL variable names start from temp0, not temp1 (#8870) 2025-02-03 22:50:38 +08:00
qazal
b6c617272a New schedule.py Order [pr] (#8874) 2025-02-03 14:59:11 +02:00
George Hotz
b075aefc12 hotfix: revert llvm host_arch 2025-02-03 16:46:19 +08:00
George Hotz
a5753095dc llvm cleanups [pr] (#8867) 2025-02-03 15:32:41 +08:00
George Hotz
f484db0e63 dsp cleanups [pr] (#8866) 2025-02-03 15:18:53 +08:00