Commit Graph

10417 Commits

Author SHA1 Message Date
nimlgen
1cc2b3f845 nv: use wait_cond (#11212) 2025-07-13 19:25:20 +03:00
nimlgen
6cce3a5d58 generic wait_cond (#11210)
* generic wait_cond

* fix linter

* fix linter
2025-07-13 16:59:21 +03:00
chenyu
e11ccf2342 update float4 condition in hcopt (#11211)
don't need all upcast candidates to be upcast-able, only check the actual one
2025-07-13 09:51:45 -04:00
nimlgen
55c54d9745 nv: sync after gpfifo setup (#11209) 2025-07-13 14:40:11 +03:00
chenyu
d90d837013 clean up hcopt [pr] (#11205)
removed one condition that's always true
2025-07-12 23:10:27 -04:00
chenyu
2b48b961be fix a few broken AMX tests (#11204) 2025-07-12 21:42:38 -04:00
wozeparrot
667c7a9fa6 clean: keccak cleanups + explicit shapes (#11202) 2025-07-12 18:17:14 -07:00
chenyu
a0438012af remove Kernel.get_program [pr] (#11203) 2025-07-12 20:50:29 -04:00
George Hotz
d67c8e7b42 local metal on metal in uop syntax (#11185)
* local metal on metal in uop syntax

* TODO: just put the axis_info in the kernelinfo

* local

* amd_matmul works @ 28 TFLOPS

* clean up matmul

* kernel8 works

* remove that

* locals

* axistype innovation

* work

* cleanup

* kernel3 regs

* cleanup kernel3

* work

* why is it broken

* no beam

* reenable

* permutes
2025-07-12 16:31:19 -07:00
uuuvn
40da5f0c81 fix silent mypy failure in ci (#11201)
Example: https://github.com/tinygrad/tinygrad/actions/runs/16215577171/job/45784110543?pr=11177#step:7:20

Caused by footguny exception in how `set -e` works:

```bash
python -m mypy --strict-equality --lineprecision-report . && cat lineprecision.txt
```

Will fail (and have non-zero exit code if run in interactive mode) but
because there is `&&` it won't count as script-terminating failure in a
script with `set -e` and instead as a test (similar to how fail of a
command in if condition won't count as a script-terminating failure
despite having non-zero exit code)
2025-07-12 15:12:25 -04:00
chenyu
73caa5dd1b remove Kernel.membufs [pr] (#11200) 2025-07-12 14:48:47 -04:00
geohotstan
5ce278b245 OnnxRunner file as input (#10789)
* file path as input and have parse be in OnnxRunner.__init__

* modelproto_to_onnxrunner -> modelproto_to_runner

* whoops, fix import

* oh flakiness again, is it because it's getting gc-ed?

* small changes

* CI flaky so just move compile4 fix in

* copy typing of onnx_load

* actually can just import onnx_load instead of onnx.load

* fix external_benchmark_openpilot

* fix onnx_runner test to use onnx_helper

* rerun CI

* try run_modelproto

* spam CI a few times

* revert run_modelproto since that's flaky also

* no external onnx_load usage except onnx.py

* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?

* model_benchmark 193s -> 80s, add OnnxRunner.to()...

* minimize diff and clean up

* device can be None, weird but eh

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-12 14:27:46 -04:00
nimlgen
110cff3f2e fix device arg to Tensor.randn (#11194)
* fix device arg to Tensor.randn

* simpler test

* self.assertEqual
2025-07-12 13:51:59 -04:00
chenyu
6283d50224 DEPRECATED_linearize -> to_program [pr] (#11198) 2025-07-12 13:46:20 -04:00
George Hotz
770a558585 lil cleanups from uop branch [pr] (#11197) 2025-07-12 09:46:28 -07:00
George Hotz
5625e1904b axis types in KernelInfo (#11196)
* axis types in KernelInfo [pr]

* simpler lowerer

* fix tests
2025-07-12 09:36:20 -07:00
nimlgen
ea7f2f779c hcq: p2p nv-amd (#11195)
* hcq: p2p between diff devices

* fix
2025-07-12 18:53:34 +03:00
qazal
6a9f059b21 viz: early convert to cpu time (#11192) 2025-07-12 17:19:41 +03:00
chenyu
12b04efd69 remove a TODO prod(k.full_shape[k.first_upcast:]) (#11191)
IMAGE=2 test/test_ops.py works now
2025-07-12 10:16:56 -04:00
nimlgen
6f5250d158 nv: fix typing in rpc_rm_control (#11189) 2025-07-12 16:09:42 +03:00
qazal
c0a5490c72 viz: minor profiler cleanup (#11190) 2025-07-12 14:18:24 +03:00
chenyu
fdcc25e392 some noop hand_coded_optimizations cleanup [pr] (#11188) 2025-07-12 00:09:23 -04:00
chenyu
1ad852a892 break up Kernel.reshape_and_permute [pr] (#11187) 2025-07-11 18:08:08 -04:00
uuuvn
d11b20129d DMARef infra (#10753)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-11 14:09:47 -07:00
chenyu
b072be0e2d hotfix whisper main script (#11184) 2025-07-11 12:34:00 -04:00
qazal
0b7e9b5db7 viz: bugfix for multiple rewrites with the same name (#11182) 2025-07-11 18:26:12 +03:00
nimlgen
f9e4c4e57a nv: nvpci blackwell support (#11127)
* nv: start 5090

* gsp init 5090

* mmu

* works

* after merge

* clenaer

* rwk

* x

* fx

* finish?

* fix

* unrelated

* fix

* commenbt
2025-07-11 17:02:09 +03:00
qazal
1d85323572 viz: absolute scaling of memory graph (#11181) 2025-07-11 16:39:11 +03:00
nimlgen
c7f6b617b4 nv: do not hardcode lv0 pd size (#11180) 2025-07-11 16:26:18 +03:00
nimlgen
27922c986a nv: generic mmu impl (#11179) 2025-07-11 16:26:09 +03:00
qazal
d3ec63a5c3 viz: add base class for unittests (#11178) 2025-07-11 13:58:03 +03:00
qazal
b791ea117d viz: enable scrolling in profiler (#11169)
* viz: add scrollbar to profiler

* using margin fixes the layout bug

* s/profiler.clientHeight/profiler.scrollHeight, it's important

* closer

* scrolling on the device list also works
2025-07-11 11:30:13 +03:00
chenyu
b219e47bef remove Kernel.upcasted_axis [pr] (#11175) 2025-07-10 23:19:21 -04:00
George Hotz
ccd382bc6f use axis_types more [pr] (#11172)
* use axis_types more

* fix local shape

* simpler clause

* fix local shape
2025-07-10 15:05:13 -07:00
nimlgen
fb278c6a02 do not recreate Compiled.profile_events in helper_collect_profile (#11171) 2025-07-10 23:55:12 +03:00
George Hotz
5c5eb92ed4 tc unroll after upcast [pr] (#11170) 2025-07-10 13:43:50 -07:00
George Hotz
05613c8cac use shape str for tensor cores upcast/reduce [pr] (#11168)
* use shape str for tensor cores upcast/reduce [pr]

* reduce axis count isn't fixed
2025-07-10 13:10:58 -07:00
nimlgen
cc6ed30f4f nv: relative lv addressing in NVPageTableEntry (#11164) 2025-07-10 22:35:50 +03:00
chenyu
439d033af9 update the README matmul example (#11167)
don't call rand and numpy to show that it's indeed one kernel
2025-07-10 14:47:29 -04:00
qazal
bde80c0cdf record GraphEvents in metal graph (#11145)
* record GraphEvents in metal graph

* add TestProfiler.test_graph, revert old stuff

* move profile capture to MetalGraph

* comment

* don't double record graph command buffers

* wait_check

* explicit delete
2025-07-10 21:32:06 +03:00
George Hotz
8ce3d5906b use shape_str for tensor cores (#11165) 2025-07-10 09:10:36 -07:00
nimlgen
581397110f nv: use classes in GSP_IP (#11163) 2025-07-10 17:47:12 +03:00
nimlgen
705de6b8a6 nv: parse sizes of ctx buffers (#11161) 2025-07-10 17:46:48 +03:00
qazal
dcc9704b6b viz: profile RewriteSteps in TINY device (#11125)
* viz: profile RewriteSteps in TINY device

* use TracingKey with category

* split by whitespace

* add tracing.py

* work

* tracing_key

* TRACK_MATCH_STATS=3, can this be in defaults?

* fallback name

* work

* javascript

* measure text is slow

* checkout

* profile graph_rewrite/graph_rewrite_map

* change that

* no as

* finally

* work

* linking works
2025-07-10 17:45:57 +03:00
Pyry Kovanen
32117402dd metal: fix incorrect _free on interpreter exit (#11158) 2025-07-10 14:01:30 +03:00
qazal
3d610f6d2b viz: small ui cleanup (#11157)
* viz: small ui cleanup

* 2
2025-07-10 11:43:36 +03:00
chenyu
7db07e5f2c don't narrow range of CAST on bool/unsigned (#11156) 2025-07-09 22:20:09 -04:00
George Hotz
e154a66f43 unroll axis 0 in tensor core (#11155)
* unroll is 0 in tc [pr]

* flip order of upcast/reduce in tensor core

* Revert "flip order of upcast/reduce in tensor core"

This reverts commit e564e38bcd.
2025-07-09 17:28:23 -07:00
George Hotz
b7742ad9e4 migrate to string swizzle [pr] (#11154) 2025-07-09 16:57:53 -07:00
George Hotz
4156baee93 break swizzle into three chunks [pr] (#11153)
* break swizzle into three chunks [pr]

* test failed
2025-07-09 15:30:34 -07:00