chenyu
488200f16c
move more pow const to rewrite ( #8916 )
...
* move more pow const to rewrite
one less use of _to_const_val
* fix
2025-02-05 20:30:12 -05:00
chenyu
76671381aa
move positive const ** t to a rewrite rule ( #8914 )
...
* move positive const ** t to a rewrite rule
* one more test
2025-02-05 19:30:12 -05:00
Ignacio Sica
cad44f5f42
add Half-Precision Accumulation Support for Tensor Cores in NV, CUDA, and PTX ( #8680 )
...
* ptx and nv rendering refactor to work with half acc
* ptx fix!
* use same reg for acc and out
* fix comment
* another fix
* minor change in commet
* fix
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-02-05 16:56:37 -05:00
nimlgen
17f9b1cef6
am: load fw based on versions ( #8913 )
...
* am: load fw based on versions
* ops
* ops2
2025-02-06 00:02:09 +03:00
chenyu
189bfa164e
enable backward test for pow(neg const ** x) ( #8912 )
...
backward works now. 0**x still does not work because it's a special case fixed in transcendental
2025-02-05 15:35:21 -05:00
chenyu
9307572fe3
Ops.POW and transcendental ( #8911 )
2025-02-05 15:15:59 -05:00
nimlgen
bff7c70eef
hcq: better var check ( #8908 )
2025-02-05 22:38:59 +03:00
Ignacio Sica
aec3b8d515
add regression test: test_get_kernel_actions_preserves_actions_state ( #8907 )
...
* test_get_kernel_actions_preserves_actions_state
* simplify
* simplify
* refactor assert message
2025-02-05 14:13:01 -05:00
qazal
e71497aabc
move assign ShapeTracker check to pattern matcher [pr] ( #8906 )
...
* move assign ShapeTracker check to pattern matcher [pr]
* rename the st uop to view
2025-02-05 19:47:20 +01:00
Ignacio Sica
0f6109ec00
hotfix bug in get_kernel_actions after TC_SEARCH_OVER_SHAPE was introduced ( #8904 )
...
* hotfix search bug
* copy actions
2025-02-05 13:10:05 -05:00
Ignacio Sica
15f94ac964
TC_SEARCH_OVER_SHAPE to search multiple TC shapes ( #8793 )
...
* squash search over search
* refactor assert
* init benchmark
* cleaner get_kernel_actions
* cleaner get_kernel_actions
* add comment
2025-02-05 11:03:46 -05:00
qazal
e7edadda54
construct the sched_sink with graph_rewrite [pr] ( #8903 )
...
* construct the sched_sink with graph_rewrite
* diff
* move break_sched
2025-02-05 15:16:48 +01:00
qazal
ef7ad3f077
simpler subbuffer construction + copyin is always base ( #8900 )
...
* realize copy
* cleanup buffer_view
* smaller
2025-02-05 09:10:20 +01:00
qazal
6f0cc2e9c5
rename to KernelContext and move the linearize_sched comment [pr] ( #8899 )
...
* rename to KernelContext and move that comment [pr]
* 500
2025-02-05 07:49:58 +01:00
geohotstan
6fb0e5751b
hotfix test_onnx_imagenet ( #8897 )
...
* start
* log severity
* only change this
* change abstraction so it's more usable for huggingface
* WHOOPS
* actually this is more correct
2025-02-05 14:39:55 +08:00
George Hotz
c1c5227acb
preserve size in dtype ptr [pr] ( #8898 )
2025-02-05 14:38:57 +08:00
George Hotz
5844883e59
bump master version
v0.10.1
2025-02-05 09:08:28 +08:00
uuuvn
a51c688f39
Cleanup llvm cleanup (and some clang things too) ( #8871 )
...
* Cleanup llvm cleanup (and some clang things too)
* Tests
* Tests 2
* forgot mockgpu
* more print some sources
2025-02-05 07:49:05 +08:00
eliotgolding
bb5ded85cc
Don't rewrite idiv to rshift when numerator is negative ( #8885 )
...
* more conditions for shift rewrite mul/idiv
* make ptx test uint so the new condition is true
* delete idiv test
* rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division
* mul/div by 2**(large count) is unsupported anyway
2025-02-05 07:47:33 +08:00
pedro
666b6149bc
Use full soname for libgcc_s in CPUProgram ( #8642 ) ( #8896 )
...
Number after .so is abi version, it is always 1 for libgcc_s.
Most linux systems set default library versions via symlinks that are
simply followed to get actual elf, however conda does it via linker
scripts which ctypes doesn't follow (below contents of libgcc_s.so):
```
/* GNU ld script
Use the shared library, but some functions are only in
the static library. */
GROUP ( libgcc_s.so.1 -lgcc )
```
ctypes.util.find_library thinks that this is the actual elf and
ctypes.CDLL just loads this text file as a shared library. The result
is:
```
File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram
helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header
```
Co-authored-by: uuuvn <83587632+uuuvn@users.noreply.github.com >
2025-02-05 07:45:48 +08:00
chenyu
48349efdc1
copy is already contiguous ( #8886 )
2025-02-04 17:53:33 -05:00
nimlgen
4c28235bd1
am: remove hardcodes ( #8895 )
...
* am: remove hardcodes for 7900
* h
2025-02-05 00:52:53 +03:00
geohotstan
057c70b05f
add onnx_helpers to extra and add ort validate to benchmark_onnx ( #8890 )
...
* start
* log severity
* only change this
* change abstraction so it's more usable for huggingface
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-04 16:36:01 -05:00
chenyu
89eebd4bfb
pow cleanups ( #8894 )
...
more readable
2025-02-04 15:52:57 -05:00
qazal
7a9e3247c2
simple start to the Kernel UOp [pr] ( #8893 )
...
* simple start to a kernel [pr]
* add the sched_sink and spec
* rename kernels to sinks
* pylint complains
2025-02-04 21:48:15 +01:00
qazal
b4e8878e01
remove tensor_uops tracking from ScheduleContext [pr] ( #8892 )
...
* remove tensor_uops tracking from ScheduleContext [pr]
* cleaner
2025-02-04 20:34:15 +01:00
qazal
6a0da51ed0
truncate process replay logs [pr] ( #8891 )
...
* truncate process replay logs [pr]
* work
* max_lines
* bump to 1K
2025-02-04 20:26:48 +01:00
qazal
c7c279a6bd
unbind ShapeTrackers without maintaining a cache [pr] ( #8889 )
...
* replace with a try [pr]
* check vars
* ahaa
2025-02-04 19:43:41 +01:00
chenyu
61de654efa
minor shard cleanup [pr] ( #8888 )
2025-02-04 13:22:31 -05:00
qazal
6ec7f1b00f
replace UPat(name="x") with UPat.var("x") [pr] ( #8887 )
...
* replace UPat(name="x") with UPat.var("x") [pr]
* a few more
2025-02-04 19:12:40 +01:00
qazal
c26b06eaeb
delete fold_img_cast [pr] ( #8875 )
2025-02-04 18:43:45 +01:00
qazal
acf0baefee
process replay from tensor uops to kernel ast ( #8883 )
...
* process replay from tensor uops to kernel ast
* this dedups
* switch back to string key
2025-02-04 18:09:20 +01:00
Ignacio Sica
dcf104ee68
ptx wmma render refactor ( #8873 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-04 11:01:23 -05:00
qazal
b92f36179d
don't use set in schedule + add GroupOp.All [pr] ( #8882 )
...
* don't use set in schedule + add GroupOp.All [pr]
* update that
2025-02-04 08:19:27 +01:00
George Hotz
56fa5c1191
dsp simulator ( #8869 )
...
* dsp simulator
* progress
* fix
* close on test tiny
* working
* less waste
* line savings
* Device DSP compiler
* mock DSP at the bottom
* DSP tests
* docker caching
* test update
* need load
* skip that test for CI DSP
* last touch
* ugh
2025-02-04 09:45:04 +08:00
chenyu
836cf42c2e
fix rand_like for multi ( #8880 )
2025-02-03 19:00:14 -05:00
chenyu
746d899dbd
move multi axis to property ( #8879 )
...
also updated tests so that axis is known prior to realize
2025-02-03 16:02:09 -05:00
nimlgen
fa90079370
amd: reallocate scratch ( #8872 )
...
* amd: reallocate scratch
* use it
* oops
* allocate default
* mypy
* ops
* address realloc from none better
* types correct
* this better
* ops
* rm
2025-02-03 23:21:37 +03:00
chenyu
ec447a31e7
factor out get_axis in multi [pr] ( #8878 )
...
ALU/REDUCE_AXIS/RESHAPE/PERMUTE can change axis. prereq to move this logic to ops.py
2025-02-03 14:39:08 -05:00
chenyu
cce26009f0
simplify pow to not call cos ( #8877 )
...
use %2 instead of cos to detect even numbers
2025-02-03 12:54:18 -05:00
geohotstan
d1aa9f30bc
copy onnx_ops into onnx ( #8876 )
...
* just copy it over
* make OnnxOps a global var
* some small style stuff
* rerun CI but also some small clean up
* some comments
2025-02-03 12:15:07 -05:00
Ali Ladjevardi
73c75d6ee1
DEFINE_LOCAL variable names start from temp0, not temp1 ( #8870 )
2025-02-03 22:50:38 +08:00
qazal
b6c617272a
New schedule.py Order [pr] ( #8874 )
2025-02-03 14:59:11 +02:00
George Hotz
b075aefc12
hotfix: revert llvm host_arch
2025-02-03 16:46:19 +08:00
George Hotz
a5753095dc
llvm cleanups [pr] ( #8867 )
2025-02-03 15:32:41 +08:00
George Hotz
f484db0e63
dsp cleanups [pr] ( #8866 )
2025-02-03 15:18:53 +08:00
George Hotz
af2c2837f6
hotfix: skip broken test, add KERNEL Op
2025-02-03 14:02:55 +08:00
qazal
565c37c681
start simplifying the scheduler context [pr] ( #8830 )
2025-02-02 18:11:36 +02:00
qazal
d64af3c884
reorder simplifier and grouper logic in scheduler [pr] ( #8861 )
2025-02-02 17:19:52 +02:00
qazal
83a904aaad
just schedule in test_recursive_pad [pr] ( #8860 )
2025-02-02 15:01:24 +02:00