Commit Graph

4043 Commits

Author SHA1 Message Date
George Hotz
09431d4ad1 make DEFINE_REG behave like the others (#11273)
* simpler define reg

* cast

* PTRCAT define_acc

* cleanups

* fix uops stats

* fix linearizer tests

* llvm

* define reg sets const

* define reg sets const

* no assign

* collapse that

* fix test_max_pool2d_bigger_stride_dilation

* use index, fix webgpu

* devec

* fix tests

* fix webgpu

* fix llvm

* threads for python

* fix ops_python

* only for reg

* acc_half is real now in the emulator

* fix llvm

* fix webgpu init

* fix wgpu test

* fix some tests

* fix ptx

* fix ptx bool acc

* cleanups

* broken, meh. will fix with ENDRANGE

* line count
2025-07-22 13:53:56 -07:00
chenyu
4535908679 update keccak test_long (#11331)
it should compare with arg "shake_128"
2025-07-22 16:08:01 -04:00
George Hotz
affd83961c small changes from define_reg (#11327)
* small changes from define_reg

* fix webgpu
2025-07-22 11:11:48 -07:00
chenyu
2d7c28de6a clean up dup lambdas in helper_test_exception (#11325) 2025-07-22 12:21:57 -04:00
chenyu
c6aa8e58ca fix TestDropoutProbabilityEdgeCases (#11322) 2025-07-22 11:13:56 -04:00
chenyu
fb42c84365 merge TestRollEdgeCases into test_ops (#11321) 2025-07-22 10:55:57 -04:00
chenyu
1d8b3e9d1c movementop only Tensor.roll (#11317)
* movementop only Tensor.roll

* fixed
2025-07-22 10:34:15 -04:00
chenyu
a41140241b truncate unsigned const in cstyle (#11318)
it can be a warning or a hard error in clang

PTX and PYTHON also need fix, skipping for now
2025-07-22 08:02:12 -04:00
qazal
6668d6d241 fix word_wrap with newlines in input string [pr] (#11319) 2025-07-22 12:03:13 +03:00
George Hotz
3b674df34b generic changes from define_reg_2 (#11315)
* generic changes from define_reg_2

* fix for ptx

* ugh, that one
2025-07-21 15:14:06 -07:00
chenyu
6e9506e6fd Tensor.roll supports dims=None (#11313) 2025-07-21 17:29:23 -04:00
George Hotz
108aac8af4 use AddrSpace instead of local (#11314)
* use AddrSpace instead of local

* addrspace in test
2025-07-21 14:00:06 -07:00
chenyu
d3a93185a6 clean up test_roll (#11312) 2025-07-21 16:00:50 -04:00
George Hotz
532b52fcef store has a dtype, like assign (#11309)
* store has a dtype, like assign

* fix upat

* fix test
2025-07-21 12:50:01 -07:00
geohotstan
445ff8de56 ONNX onnx_parser and buffer_parse clean up (#11000)
* start

* remove onnx.load from compile4 and move np to dropout

* clean up and enable test

* clean up

* move WebGPU ONNX test into MacOS (WebGPU)

* leave test in ONNX (CPU)

* fix raw_data init None, and simplify onnx_runner test a little?

* THESE TESTS ARE SO UGLY UGHH

* need to really think about how to structure the test

* wow LLMs are quite something

* not always on disk now

* also add external data loading test

* cleaner tests

* minimize diff and add const folding tests

* add external data loading too

* whoops add webgpu back.. but why was it not needed in the first place?

* better comment

* move webgpu test to macos(webgpu)?

* llm english so much better than me wow

* trigger CI to check flakiness

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-21 15:10:25 -04:00
George Hotz
842184a1ab rename kernelize to schedule, try 2 (#11305) 2025-07-21 11:18:36 -07:00
wozeparrot
30ce16a424 feat: failing test for long keccak (#11292) 2025-07-21 12:49:23 -04:00
uuuvn
178dbf3f66 Remote scheduler changes (#11177) 2025-07-21 09:29:44 -07:00
nimlgen
cc3c1e4c14 hcq: move cpu to hcq (#11262)
* hcq: move cpu to hcq

* import time

* upd

* fix

* windows support

* hm

* cleaner

* fix timer

* fix timing

* std is ns

* skip profiler

* mypy

* cleaner

* cleanups

* after merge

* default is back
2025-07-21 15:10:38 +03:00
qazal
3002c63b1e process replay: optionally pass tinygrad import error (#11289)
* process replay: optionally pass tinygrad import error

* gate all tinygrad internals

* s/getenv/os.getenv pre import

* diff
2025-07-20 22:57:56 +03:00
chenyu
54924f9969 type remove Union and Optional [pr] (#11283)
use `|` for consistency
2025-07-19 14:05:52 -04:00
nimlgen
188ed38315 replace from_mv with lightweight mv_address (#11280) 2025-07-19 13:50:51 +03:00
chenyu
ec3efd2919 move upcast before reduce (#11250)
* move upcast before reduce

upcast goes to end of global+local+upcast

* r_196_32_4_24_8
2025-07-18 14:42:15 -04:00
nimlgen
9a88bd841c hcq: refactor into peer_groups (#11277)
* hcq: refactor into peer_groups

* fix fors

* fixes

* ooops

* mypy

* tiny fixes
2025-07-18 16:34:18 +03:00
chenyu
c5a5d74642 Revert "image_dot of 2 half inputs returns half (#11007)" (#11274)
This reverts commit fa8e08f922.
2025-07-17 17:34:18 -04:00
Utkarsh Gill
fa8e08f922 image_dot of 2 half inputs returns half (#11007)
* cast after sum

* comment out skipif

* minor fix

* only test IMAGE

* IMAGE is supported now

* simpler

* simplerr

* only cast if dtype is None

* dont need to change base_imaeg_type

* only cast when dtype is half

* add explicit test

* actually no, workflow seems better

* actually, keep both

* move test

* fix indent

---------

Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>
2025-07-17 13:47:22 -07:00
geohotstan
536b254df4 Bump onnx to 1.18.0 (#11266)
* bump

* thou hast implement functions

* hacked in domain support

* some clean ups

* hack quantize_onnx_test too

* add helper lol, why onnx tests why

* better dispatcher, but need tests and better naming

* flaky ci

* change some names

* small clean ups

* make it easier to clean up tests once ORT supports 1.18.0

* nits

* fix bug of Softmax_1 being registered in onnx_ops

* need a default value

* resolve_const is better name

* fix OnnxRunner.to

* use proper domain names
2025-07-17 15:35:41 -04:00
qazal
e68af3b336 disable flaky assert in test_cpu_profile (#11270) 2025-07-17 06:50:39 +03:00
chenyu
522dc72f08 remove Kernel.local_dims [pr] (#11268)
* remove Kernel.local_dims [pr]

also not needed

* fix test_matvec
2025-07-16 17:46:19 -04:00
uuuvn
6f0ddcc24c Remote cross-host graph (#11229) 2025-07-16 13:27:54 -07:00
quortus
924bc7c9ae Fix test_uop_spec (#11259) 2025-07-16 11:02:31 +03:00
chenyu
c8e5c4d7c3 insert_before -> insert_at [pr] (#11257)
more precise
2025-07-15 17:44:34 -04:00
leopf
557ca7d757 testing SimpleTokenizer against OASST1 (#11214) 2025-07-14 17:09:31 -07:00
wozeparrot
5878b189b8 don't const fold shape changing bitcast (#11236) 2025-07-14 16:42:16 -07:00
chenyu
b6662096cb remove more first_reduce [pr] (#11239) 2025-07-14 19:13:44 -04:00
chenyu
eb8e17ef59 remove most of the first_upcast [pr] (#11238) 2025-07-14 16:54:24 -04:00
chenyu
674dc28505 remove Kernel.full_unupcasted_shape [pr] (#11215)
decomp to shape_len and first_upcast to get the last upcast-able dim
2025-07-13 13:56:23 -04:00
Alisher Zhubanyshev
4ef6b46b34 hcq: reduce launch overhead (#11193)
* nv: improve mmio creation speed

* add memoryview test

* fix indents

* move mv bench to `test_helpers`, remove comparison
2025-07-13 19:25:50 +03:00
chenyu
2b48b961be fix a few broken AMX tests (#11204) 2025-07-12 21:42:38 -04:00
chenyu
a0438012af remove Kernel.get_program [pr] (#11203) 2025-07-12 20:50:29 -04:00
chenyu
73caa5dd1b remove Kernel.membufs [pr] (#11200) 2025-07-12 14:48:47 -04:00
geohotstan
5ce278b245 OnnxRunner file as input (#10789)
* file path as input and have parse be in OnnxRunner.__init__

* modelproto_to_onnxrunner -> modelproto_to_runner

* whoops, fix import

* oh flakiness again, is it because it's getting gc-ed?

* small changes

* CI flaky so just move compile4 fix in

* copy typing of onnx_load

* actually can just import onnx_load instead of onnx.load

* fix external_benchmark_openpilot

* fix onnx_runner test to use onnx_helper

* rerun CI

* try run_modelproto

* spam CI a few times

* revert run_modelproto since that's flaky also

* no external onnx_load usage except onnx.py

* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?

* model_benchmark 193s -> 80s, add OnnxRunner.to()...

* minimize diff and clean up

* device can be None, weird but eh

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-12 14:27:46 -04:00
nimlgen
110cff3f2e fix device arg to Tensor.randn (#11194)
* fix device arg to Tensor.randn

* simpler test

* self.assertEqual
2025-07-12 13:51:59 -04:00
chenyu
6283d50224 DEPRECATED_linearize -> to_program [pr] (#11198) 2025-07-12 13:46:20 -04:00
nimlgen
ea7f2f779c hcq: p2p nv-amd (#11195)
* hcq: p2p between diff devices

* fix
2025-07-12 18:53:34 +03:00
qazal
d3ec63a5c3 viz: add base class for unittests (#11178) 2025-07-11 13:58:03 +03:00
nimlgen
fb278c6a02 do not recreate Compiled.profile_events in helper_collect_profile (#11171) 2025-07-10 23:55:12 +03:00
qazal
bde80c0cdf record GraphEvents in metal graph (#11145)
* record GraphEvents in metal graph

* add TestProfiler.test_graph, revert old stuff

* move profile capture to MetalGraph

* comment

* don't double record graph command buffers

* wait_check

* explicit delete
2025-07-10 21:32:06 +03:00
chenyu
7db07e5f2c don't narrow range of CAST on bool/unsigned (#11156) 2025-07-09 22:20:09 -04:00
George Hotz
4156baee93 break swizzle into three chunks [pr] (#11153)
* break swizzle into three chunks [pr]

* test failed
2025-07-09 15:30:34 -07:00