* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* new objc message style [pr]
* without sync
* no div 0
* lru cache that
* no sync in the profile
* fix
* update all to new style
* remove comment
* graph one kernel
* fix graph one kernel
* remove that sync
* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* without sync
* no div 0
* lru cache that
* no sync in the profile
* remove Tensor._to_const_val
added a TODO for advance indexing on const, which was the last place that checks const in Tensor
* that is not folding now
* one more
* Pass host CPU features to LLVM target
This gets `test_gemm_fp16` to pass on Windows. It would fail because the
generated machine code would call compiler-rt functions to to perform
truncating. This gets the test to pass on some hardware, because LLVM
gets access to more instructions. Essentially this is similar to
`-march=native`.
Unless this was intentionally left as is to be re-implemented fully in
LLVM IR or something.
* Fix linter complaints
* ptx and nv rendering refactor to work with half acc
* ptx fix!
* use same reg for acc and out
* fix comment
* another fix
* minor change in commet
* fix
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
* more conditions for shift rewrite mul/idiv
* make ptx test uint so the new condition is true
* delete idiv test
* rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division
* mul/div by 2**(large count) is unsupported anyway
Number after .so is abi version, it is always 1 for libgcc_s.
Most linux systems set default library versions via symlinks that are
simply followed to get actual elf, however conda does it via linker
scripts which ctypes doesn't follow (below contents of libgcc_s.so):
```
/* GNU ld script
Use the shared library, but some functions are only in
the static library. */
GROUP ( libgcc_s.so.1 -lgcc )
```
ctypes.util.find_library thinks that this is the actual elf and
ctypes.CDLL just loads this text file as a shared library. The result
is:
```
File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram
helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header
```
Co-authored-by: uuuvn <83587632+uuuvn@users.noreply.github.com>
* start
* log severity
* only change this
* change abstraction so it's more usable for huggingface
---------
Co-authored-by: chenyu <chenyu@fastmail.com>