Commit Graph

10633 Commits

Author SHA1 Message Date
chenyu
b6662096cb remove more first_reduce [pr] (#11239) 2025-07-14 19:13:44 -04:00
chenyu
eb8e17ef59 remove most of the first_upcast [pr] (#11238) 2025-07-14 16:54:24 -04:00
qazal
c78b1cbae7 viz profiler cleanups (#11234)
* move all render calls to zoom callback

* cleanup the naming

* require transform arg
2025-07-14 19:06:33 +03:00
chenyu
36ce883c7d update heuristic to use k.upcastable_dims and k.unrollable_dims [pr] (#11233)
idea is to make it behave the same regardless of axis order and with empty 1s in shape.

not quite fully remove all first_upcast yet because some conditions used already upcasted size which need a separate benchmark to remove.
2025-07-14 11:10:30 -04:00
qazal
c0c695dd89 viz: remove extra transform (#11232) 2025-07-14 16:51:47 +03:00
chenyu
da219199f5 minor hcopt cleanup [pr] (#11231) 2025-07-14 09:36:25 -04:00
nimlgen
756ba1a5f9 nv: support ampere in nvpci (#11230) 2025-07-14 15:35:44 +03:00
uuuvn
b2cc6cfa1b JIT_BATCH_SIZE is a ContextVar (#11228) 2025-07-14 14:03:45 +03:00
nimlgen
c4a920d95c nv: use last signature (#11227) 2025-07-14 13:00:39 +03:00
nimlgen
a830d37881 nv: check wpr2 is inited (#11226) 2025-07-14 11:46:14 +03:00
chenyu
0387bb9630 clean up image upcast in hcopt [pr] (#11220)
GLOBAL+LOCAL for upcast
GROUP_REDUCE+REDUCE for unroll
2025-07-13 18:06:43 -04:00
chenyu
85ddd72038 simpler grouptop in hcopt (#11219)
* simpler grouptop in hcopt

keep the only perf relevant conditions and the rest is handled by try except

* update openpilot read image count
2025-07-13 16:06:09 -04:00
qazal
40847ca29c viz: prune out of screen rects (#11217) 2025-07-13 21:49:59 +03:00
chenyu
674dc28505 remove Kernel.full_unupcasted_shape [pr] (#11215)
decomp to shape_len and first_upcast to get the last upcast-able dim
2025-07-13 13:56:23 -04:00
chenyu
9575cf6c6e shave more hcopt [pr] (#11213)
start to use AxisType for conditions
2025-07-13 12:43:58 -04:00
Alisher Zhubanyshev
4ef6b46b34 hcq: reduce launch overhead (#11193)
* nv: improve mmio creation speed

* add memoryview test

* fix indents

* move mv bench to `test_helpers`, remove comparison
2025-07-13 19:25:50 +03:00
nimlgen
1cc2b3f845 nv: use wait_cond (#11212) 2025-07-13 19:25:20 +03:00
nimlgen
6cce3a5d58 generic wait_cond (#11210)
* generic wait_cond

* fix linter

* fix linter
2025-07-13 16:59:21 +03:00
chenyu
e11ccf2342 update float4 condition in hcopt (#11211)
don't need all upcast candidates to be upcast-able, only check the actual one
2025-07-13 09:51:45 -04:00
nimlgen
55c54d9745 nv: sync after gpfifo setup (#11209) 2025-07-13 14:40:11 +03:00
chenyu
d90d837013 clean up hcopt [pr] (#11205)
removed one condition that's always true
2025-07-12 23:10:27 -04:00
chenyu
2b48b961be fix a few broken AMX tests (#11204) 2025-07-12 21:42:38 -04:00
wozeparrot
667c7a9fa6 clean: keccak cleanups + explicit shapes (#11202) 2025-07-12 18:17:14 -07:00
chenyu
a0438012af remove Kernel.get_program [pr] (#11203) 2025-07-12 20:50:29 -04:00
George Hotz
d67c8e7b42 local metal on metal in uop syntax (#11185)
* local metal on metal in uop syntax

* TODO: just put the axis_info in the kernelinfo

* local

* amd_matmul works @ 28 TFLOPS

* clean up matmul

* kernel8 works

* remove that

* locals

* axistype innovation

* work

* cleanup

* kernel3 regs

* cleanup kernel3

* work

* why is it broken

* no beam

* reenable

* permutes
2025-07-12 16:31:19 -07:00
uuuvn
40da5f0c81 fix silent mypy failure in ci (#11201)
Example: https://github.com/tinygrad/tinygrad/actions/runs/16215577171/job/45784110543?pr=11177#step:7:20

Caused by footguny exception in how `set -e` works:

```bash
python -m mypy --strict-equality --lineprecision-report . && cat lineprecision.txt
```

Will fail (and have non-zero exit code if run in interactive mode) but
because there is `&&` it won't count as script-terminating failure in a
script with `set -e` and instead as a test (similar to how fail of a
command in if condition won't count as a script-terminating failure
despite having non-zero exit code)
2025-07-12 15:12:25 -04:00
chenyu
73caa5dd1b remove Kernel.membufs [pr] (#11200) 2025-07-12 14:48:47 -04:00
geohotstan
5ce278b245 OnnxRunner file as input (#10789)
* file path as input and have parse be in OnnxRunner.__init__

* modelproto_to_onnxrunner -> modelproto_to_runner

* whoops, fix import

* oh flakiness again, is it because it's getting gc-ed?

* small changes

* CI flaky so just move compile4 fix in

* copy typing of onnx_load

* actually can just import onnx_load instead of onnx.load

* fix external_benchmark_openpilot

* fix onnx_runner test to use onnx_helper

* rerun CI

* try run_modelproto

* spam CI a few times

* revert run_modelproto since that's flaky also

* no external onnx_load usage except onnx.py

* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?

* model_benchmark 193s -> 80s, add OnnxRunner.to()...

* minimize diff and clean up

* device can be None, weird but eh

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-12 14:27:46 -04:00
nimlgen
110cff3f2e fix device arg to Tensor.randn (#11194)
* fix device arg to Tensor.randn

* simpler test

* self.assertEqual
2025-07-12 13:51:59 -04:00
chenyu
6283d50224 DEPRECATED_linearize -> to_program [pr] (#11198) 2025-07-12 13:46:20 -04:00
George Hotz
770a558585 lil cleanups from uop branch [pr] (#11197) 2025-07-12 09:46:28 -07:00
George Hotz
5625e1904b axis types in KernelInfo (#11196)
* axis types in KernelInfo [pr]

* simpler lowerer

* fix tests
2025-07-12 09:36:20 -07:00
nimlgen
ea7f2f779c hcq: p2p nv-amd (#11195)
* hcq: p2p between diff devices

* fix
2025-07-12 18:53:34 +03:00
qazal
6a9f059b21 viz: early convert to cpu time (#11192) 2025-07-12 17:19:41 +03:00
chenyu
12b04efd69 remove a TODO prod(k.full_shape[k.first_upcast:]) (#11191)
IMAGE=2 test/test_ops.py works now
2025-07-12 10:16:56 -04:00
nimlgen
6f5250d158 nv: fix typing in rpc_rm_control (#11189) 2025-07-12 16:09:42 +03:00
qazal
c0a5490c72 viz: minor profiler cleanup (#11190) 2025-07-12 14:18:24 +03:00
chenyu
fdcc25e392 some noop hand_coded_optimizations cleanup [pr] (#11188) 2025-07-12 00:09:23 -04:00
chenyu
1ad852a892 break up Kernel.reshape_and_permute [pr] (#11187) 2025-07-11 18:08:08 -04:00
uuuvn
d11b20129d DMARef infra (#10753)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-11 14:09:47 -07:00
chenyu
b072be0e2d hotfix whisper main script (#11184) 2025-07-11 12:34:00 -04:00
qazal
0b7e9b5db7 viz: bugfix for multiple rewrites with the same name (#11182) 2025-07-11 18:26:12 +03:00
nimlgen
f9e4c4e57a nv: nvpci blackwell support (#11127)
* nv: start 5090

* gsp init 5090

* mmu

* works

* after merge

* clenaer

* rwk

* x

* fx

* finish?

* fix

* unrelated

* fix

* commenbt
2025-07-11 17:02:09 +03:00
qazal
1d85323572 viz: absolute scaling of memory graph (#11181) 2025-07-11 16:39:11 +03:00
nimlgen
c7f6b617b4 nv: do not hardcode lv0 pd size (#11180) 2025-07-11 16:26:18 +03:00
nimlgen
27922c986a nv: generic mmu impl (#11179) 2025-07-11 16:26:09 +03:00
qazal
d3ec63a5c3 viz: add base class for unittests (#11178) 2025-07-11 13:58:03 +03:00
qazal
b791ea117d viz: enable scrolling in profiler (#11169)
* viz: add scrollbar to profiler

* using margin fixes the layout bug

* s/profiler.clientHeight/profiler.scrollHeight, it's important

* closer

* scrolling on the device list also works
2025-07-11 11:30:13 +03:00
chenyu
b219e47bef remove Kernel.upcasted_axis [pr] (#11175) 2025-07-10 23:19:21 -04:00
George Hotz
ccd382bc6f use axis_types more [pr] (#11172)
* use axis_types more

* fix local shape

* simpler clause

* fix local shape
2025-07-10 15:05:13 -07:00