nimlgen
a830d37881
nv: check wpr2 is inited ( #11226 )
2025-07-14 11:46:14 +03:00
chenyu
0387bb9630
clean up image upcast in hcopt [pr] ( #11220 )
...
GLOBAL+LOCAL for upcast
GROUP_REDUCE+REDUCE for unroll
2025-07-13 18:06:43 -04:00
chenyu
85ddd72038
simpler grouptop in hcopt ( #11219 )
...
* simpler grouptop in hcopt
keep the only perf relevant conditions and the rest is handled by try except
* update openpilot read image count
2025-07-13 16:06:09 -04:00
qazal
40847ca29c
viz: prune out of screen rects ( #11217 )
2025-07-13 21:49:59 +03:00
chenyu
674dc28505
remove Kernel.full_unupcasted_shape [pr] ( #11215 )
...
decomp to shape_len and first_upcast to get the last upcast-able dim
2025-07-13 13:56:23 -04:00
chenyu
9575cf6c6e
shave more hcopt [pr] ( #11213 )
...
start to use AxisType for conditions
2025-07-13 12:43:58 -04:00
Alisher Zhubanyshev
4ef6b46b34
hcq: reduce launch overhead ( #11193 )
...
* nv: improve mmio creation speed
* add memoryview test
* fix indents
* move mv bench to `test_helpers`, remove comparison
2025-07-13 19:25:50 +03:00
nimlgen
1cc2b3f845
nv: use wait_cond ( #11212 )
2025-07-13 19:25:20 +03:00
nimlgen
6cce3a5d58
generic wait_cond ( #11210 )
...
* generic wait_cond
* fix linter
* fix linter
2025-07-13 16:59:21 +03:00
chenyu
e11ccf2342
update float4 condition in hcopt ( #11211 )
...
don't need all upcast candidates to be upcast-able, only check the actual one
2025-07-13 09:51:45 -04:00
nimlgen
55c54d9745
nv: sync after gpfifo setup ( #11209 )
2025-07-13 14:40:11 +03:00
chenyu
d90d837013
clean up hcopt [pr] ( #11205 )
...
removed one condition that's always true
2025-07-12 23:10:27 -04:00
chenyu
2b48b961be
fix a few broken AMX tests ( #11204 )
2025-07-12 21:42:38 -04:00
wozeparrot
667c7a9fa6
clean: keccak cleanups + explicit shapes ( #11202 )
2025-07-12 18:17:14 -07:00
chenyu
a0438012af
remove Kernel.get_program [pr] ( #11203 )
2025-07-12 20:50:29 -04:00
George Hotz
d67c8e7b42
local metal on metal in uop syntax ( #11185 )
...
* local metal on metal in uop syntax
* TODO: just put the axis_info in the kernelinfo
* local
* amd_matmul works @ 28 TFLOPS
* clean up matmul
* kernel8 works
* remove that
* locals
* axistype innovation
* work
* cleanup
* kernel3 regs
* cleanup kernel3
* work
* why is it broken
* no beam
* reenable
* permutes
2025-07-12 16:31:19 -07:00
uuuvn
40da5f0c81
fix silent mypy failure in ci ( #11201 )
...
Example: https://github.com/tinygrad/tinygrad/actions/runs/16215577171/job/45784110543?pr=11177#step:7:20
Caused by footguny exception in how `set -e` works:
```bash
python -m mypy --strict-equality --lineprecision-report . && cat lineprecision.txt
```
Will fail (and have non-zero exit code if run in interactive mode) but
because there is `&&` it won't count as script-terminating failure in a
script with `set -e` and instead as a test (similar to how fail of a
command in if condition won't count as a script-terminating failure
despite having non-zero exit code)
2025-07-12 15:12:25 -04:00
chenyu
73caa5dd1b
remove Kernel.membufs [pr] ( #11200 )
2025-07-12 14:48:47 -04:00
geohotstan
5ce278b245
OnnxRunner file as input ( #10789 )
...
* file path as input and have parse be in OnnxRunner.__init__
* modelproto_to_onnxrunner -> modelproto_to_runner
* whoops, fix import
* oh flakiness again, is it because it's getting gc-ed?
* small changes
* CI flaky so just move compile4 fix in
* copy typing of onnx_load
* actually can just import onnx_load instead of onnx.load
* fix external_benchmark_openpilot
* fix onnx_runner test to use onnx_helper
* rerun CI
* try run_modelproto
* spam CI a few times
* revert run_modelproto since that's flaky also
* no external onnx_load usage except onnx.py
* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?
* model_benchmark 193s -> 80s, add OnnxRunner.to()...
* minimize diff and clean up
* device can be None, weird but eh
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-12 14:27:46 -04:00
nimlgen
110cff3f2e
fix device arg to Tensor.randn ( #11194 )
...
* fix device arg to Tensor.randn
* simpler test
* self.assertEqual
2025-07-12 13:51:59 -04:00
chenyu
6283d50224
DEPRECATED_linearize -> to_program [pr] ( #11198 )
2025-07-12 13:46:20 -04:00
George Hotz
770a558585
lil cleanups from uop branch [pr] ( #11197 )
2025-07-12 09:46:28 -07:00
George Hotz
5625e1904b
axis types in KernelInfo ( #11196 )
...
* axis types in KernelInfo [pr]
* simpler lowerer
* fix tests
2025-07-12 09:36:20 -07:00
nimlgen
ea7f2f779c
hcq: p2p nv-amd ( #11195 )
...
* hcq: p2p between diff devices
* fix
2025-07-12 18:53:34 +03:00
qazal
6a9f059b21
viz: early convert to cpu time ( #11192 )
2025-07-12 17:19:41 +03:00
chenyu
12b04efd69
remove a TODO prod(k.full_shape[k.first_upcast:]) ( #11191 )
...
IMAGE=2 test/test_ops.py works now
2025-07-12 10:16:56 -04:00
nimlgen
6f5250d158
nv: fix typing in rpc_rm_control ( #11189 )
2025-07-12 16:09:42 +03:00
qazal
c0a5490c72
viz: minor profiler cleanup ( #11190 )
2025-07-12 14:18:24 +03:00
chenyu
fdcc25e392
some noop hand_coded_optimizations cleanup [pr] ( #11188 )
2025-07-12 00:09:23 -04:00
chenyu
1ad852a892
break up Kernel.reshape_and_permute [pr] ( #11187 )
2025-07-11 18:08:08 -04:00
uuuvn
d11b20129d
DMARef infra ( #10753 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-11 14:09:47 -07:00
chenyu
b072be0e2d
hotfix whisper main script ( #11184 )
2025-07-11 12:34:00 -04:00
qazal
0b7e9b5db7
viz: bugfix for multiple rewrites with the same name ( #11182 )
2025-07-11 18:26:12 +03:00
nimlgen
f9e4c4e57a
nv: nvpci blackwell support ( #11127 )
...
* nv: start 5090
* gsp init 5090
* mmu
* works
* after merge
* clenaer
* rwk
* x
* fx
* finish?
* fix
* unrelated
* fix
* commenbt
2025-07-11 17:02:09 +03:00
qazal
1d85323572
viz: absolute scaling of memory graph ( #11181 )
2025-07-11 16:39:11 +03:00
nimlgen
c7f6b617b4
nv: do not hardcode lv0 pd size ( #11180 )
2025-07-11 16:26:18 +03:00
nimlgen
27922c986a
nv: generic mmu impl ( #11179 )
2025-07-11 16:26:09 +03:00
qazal
d3ec63a5c3
viz: add base class for unittests ( #11178 )
2025-07-11 13:58:03 +03:00
qazal
b791ea117d
viz: enable scrolling in profiler ( #11169 )
...
* viz: add scrollbar to profiler
* using margin fixes the layout bug
* s/profiler.clientHeight/profiler.scrollHeight, it's important
* closer
* scrolling on the device list also works
2025-07-11 11:30:13 +03:00
chenyu
b219e47bef
remove Kernel.upcasted_axis [pr] ( #11175 )
2025-07-10 23:19:21 -04:00
George Hotz
ccd382bc6f
use axis_types more [pr] ( #11172 )
...
* use axis_types more
* fix local shape
* simpler clause
* fix local shape
2025-07-10 15:05:13 -07:00
nimlgen
fb278c6a02
do not recreate Compiled.profile_events in helper_collect_profile ( #11171 )
2025-07-10 23:55:12 +03:00
George Hotz
5c5eb92ed4
tc unroll after upcast [pr] ( #11170 )
2025-07-10 13:43:50 -07:00
George Hotz
05613c8cac
use shape str for tensor cores upcast/reduce [pr] ( #11168 )
...
* use shape str for tensor cores upcast/reduce [pr]
* reduce axis count isn't fixed
2025-07-10 13:10:58 -07:00
nimlgen
cc6ed30f4f
nv: relative lv addressing in NVPageTableEntry ( #11164 )
2025-07-10 22:35:50 +03:00
chenyu
439d033af9
update the README matmul example ( #11167 )
...
don't call rand and numpy to show that it's indeed one kernel
2025-07-10 14:47:29 -04:00
qazal
bde80c0cdf
record GraphEvents in metal graph ( #11145 )
...
* record GraphEvents in metal graph
* add TestProfiler.test_graph, revert old stuff
* move profile capture to MetalGraph
* comment
* don't double record graph command buffers
* wait_check
* explicit delete
2025-07-10 21:32:06 +03:00
George Hotz
8ce3d5906b
use shape_str for tensor cores ( #11165 )
2025-07-10 09:10:36 -07:00
nimlgen
581397110f
nv: use classes in GSP_IP ( #11163 )
2025-07-10 17:47:12 +03:00
nimlgen
705de6b8a6
nv: parse sizes of ctx buffers ( #11161 )
2025-07-10 17:46:48 +03:00