Commit Graph

4667 Commits

Author SHA1 Message Date
geohotstan
536b254df4 Bump onnx to 1.18.0 (#11266)
* bump

* thou hast implement functions

* hacked in domain support

* some clean ups

* hack quantize_onnx_test too

* add helper lol, why onnx tests why

* better dispatcher, but need tests and better naming

* flaky ci

* change some names

* small clean ups

* make it easier to clean up tests once ORT supports 1.18.0

* nits

* fix bug of Softmax_1 being registered in onnx_ops

* need a default value

* resolve_const is better name

* fix OnnxRunner.to

* use proper domain names
2025-07-17 15:35:41 -04:00
qazal
e68af3b336 disable flaky assert in test_cpu_profile (#11270) 2025-07-17 06:50:39 +03:00
chenyu
522dc72f08 remove Kernel.local_dims [pr] (#11268)
* remove Kernel.local_dims [pr]

also not needed

* fix test_matvec
2025-07-16 17:46:19 -04:00
uuuvn
6f0ddcc24c Remote cross-host graph (#11229) 2025-07-16 13:27:54 -07:00
quortus
924bc7c9ae Fix test_uop_spec (#11259) 2025-07-16 11:02:31 +03:00
chenyu
c8e5c4d7c3 insert_before -> insert_at [pr] (#11257)
more precise
2025-07-15 17:44:34 -04:00
leopf
557ca7d757 testing SimpleTokenizer against OASST1 (#11214) 2025-07-14 17:09:31 -07:00
wozeparrot
5878b189b8 don't const fold shape changing bitcast (#11236) 2025-07-14 16:42:16 -07:00
chenyu
b6662096cb remove more first_reduce [pr] (#11239) 2025-07-14 19:13:44 -04:00
chenyu
eb8e17ef59 remove most of the first_upcast [pr] (#11238) 2025-07-14 16:54:24 -04:00
chenyu
674dc28505 remove Kernel.full_unupcasted_shape [pr] (#11215)
decomp to shape_len and first_upcast to get the last upcast-able dim
2025-07-13 13:56:23 -04:00
Alisher Zhubanyshev
4ef6b46b34 hcq: reduce launch overhead (#11193)
* nv: improve mmio creation speed

* add memoryview test

* fix indents

* move mv bench to `test_helpers`, remove comparison
2025-07-13 19:25:50 +03:00
chenyu
2b48b961be fix a few broken AMX tests (#11204) 2025-07-12 21:42:38 -04:00
chenyu
a0438012af remove Kernel.get_program [pr] (#11203) 2025-07-12 20:50:29 -04:00
chenyu
73caa5dd1b remove Kernel.membufs [pr] (#11200) 2025-07-12 14:48:47 -04:00
geohotstan
5ce278b245 OnnxRunner file as input (#10789)
* file path as input and have parse be in OnnxRunner.__init__

* modelproto_to_onnxrunner -> modelproto_to_runner

* whoops, fix import

* oh flakiness again, is it because it's getting gc-ed?

* small changes

* CI flaky so just move compile4 fix in

* copy typing of onnx_load

* actually can just import onnx_load instead of onnx.load

* fix external_benchmark_openpilot

* fix onnx_runner test to use onnx_helper

* rerun CI

* try run_modelproto

* spam CI a few times

* revert run_modelproto since that's flaky also

* no external onnx_load usage except onnx.py

* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?

* model_benchmark 193s -> 80s, add OnnxRunner.to()...

* minimize diff and clean up

* device can be None, weird but eh

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-12 14:27:46 -04:00
nimlgen
110cff3f2e fix device arg to Tensor.randn (#11194)
* fix device arg to Tensor.randn

* simpler test

* self.assertEqual
2025-07-12 13:51:59 -04:00
chenyu
6283d50224 DEPRECATED_linearize -> to_program [pr] (#11198) 2025-07-12 13:46:20 -04:00
nimlgen
ea7f2f779c hcq: p2p nv-amd (#11195)
* hcq: p2p between diff devices

* fix
2025-07-12 18:53:34 +03:00
qazal
d3ec63a5c3 viz: add base class for unittests (#11178) 2025-07-11 13:58:03 +03:00
nimlgen
fb278c6a02 do not recreate Compiled.profile_events in helper_collect_profile (#11171) 2025-07-10 23:55:12 +03:00
qazal
bde80c0cdf record GraphEvents in metal graph (#11145)
* record GraphEvents in metal graph

* add TestProfiler.test_graph, revert old stuff

* move profile capture to MetalGraph

* comment

* don't double record graph command buffers

* wait_check

* explicit delete
2025-07-10 21:32:06 +03:00
chenyu
7db07e5f2c don't narrow range of CAST on bool/unsigned (#11156) 2025-07-09 22:20:09 -04:00
George Hotz
4156baee93 break swizzle into three chunks [pr] (#11153)
* break swizzle into three chunks [pr]

* test failed
2025-07-09 15:30:34 -07:00
George Hotz
53ae153404 tc should be in opt (#11148)
* tc should be in opt [pr]

* fix import
2025-07-09 14:12:21 -07:00
nimlgen
b6981404ed memory: use page shifts in memory manager (#11149)
* memory: use page shifts in memory manager

* fix
2025-07-09 22:05:00 +03:00
qazal
5c1d215b41 viz: add Graph stream (#11144)
* viz: stack an event for the entire batch

* multi

* whitespace

* work

* multi graph, Graph gets its own row
2025-07-09 20:56:46 +03:00
George Hotz
2893feb9f6 cleanups for kernel.py (#11143)
* cleanups for kernel.py

* fixups
2025-07-08 18:10:25 -07:00
George Hotz
359bed74f8 axis type tracking [pr] (#11137)
* axis type tracking [pr]

* keep update_info

* keep legacy colors

* update tests to apply_opt
2025-07-08 14:16:25 -07:00
chenyu
dada3f5bf3 skip some new onnx tests (#11135)
these fails on master with latest onnx
2025-07-08 16:12:48 -04:00
qazal
3dfc0ff887 move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126)
* move cpu_profile and shared ProfileEvents to helpers [pr]

* TestProfiler.test_cpu_profile

* update test_viz.py

* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
George Hotz
f7d4638e05 start LLM app, tons of clean up required. target is 200 line ollama (#11068)
* start LLM app, tons of clean up required. target is 200 line ollama

* kind of works

* simpler

* add k/v cache

* with SYM=1, it loops

* no rope cache

* simpler

* more cleanups

* cleanups

* works

* argparse and comments

* from gguf

* generate is a function

* no copy from cpu

* fix max context pass in

* test

* improve test

* ai2_arc

* fix 8B, use less ram

* 136 lines
2025-07-07 17:09:46 -07:00
chenyu
341a686799 Tensor.diagonal (#11122)
only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant
2025-07-07 16:21:26 -04:00
Sieds Lykles
584fd6af5a Fix division by zero and mask bug in add views (#11088)
* merge view infinite loop test

* adjust condition in `x//d -> x//(-d)*-1`

* Fix division by zero in add views

* adjust offset end

* fix typo in comment

* add target to test_merge_views_variable

* fix view incorrectly being masked

* ssimplify strides and offset of the new view to canonicalize

* remove print in test

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-07-07 10:05:47 -07:00
Nino Risteski
a1a146a499 adding enable_gqa in SDPA (#11097)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-06 23:25:33 -07:00
chenyu
7468959f4b Tensor.argsort (#11112) 2025-07-06 13:56:35 -04:00
kevvz
b7af9cf849 clean svd tests, set full_matrices false in torch backend (#11113)
* clean tests, set full_matrices false

* add more shape asserts
2025-07-06 13:55:49 -04:00
chenyu
ba88ec3ad0 pipe linalg svd to torch (#11109)
and found a bug in svd
2025-07-06 08:37:25 -04:00
chenyu
845a4d32bc Tensor.diag (#11108)
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00
ttomsa
4905af4ae0 remove invalid int div test (#11106)
* rm test

* also rm this
2025-07-05 18:57:55 -04:00
qazal
81781dc12b viz: renames and spacing changes to tracing (#11102) 2025-07-05 18:40:39 +03:00
qazal
7619bf35e7 cleanup: remove disabled TestIndexingOrdering (#11101)
* cleanup: remove disabled TestIndexingOrdering

* don't import kernelize internals
2025-07-05 18:14:37 +03:00
qazal
4fcfaa0ef7 viz: switch to TracingKey (#11100)
* viz: switch to TracingKey

* tuple

* order is name, keys, fmt

* add test_tracing_key
2025-07-05 17:46:18 +03:00
qazal
3d8569f6d8 hotfix: infinite loop in tracking pattern matcher (#11094)
* failing test

* fix that

* given matchers
2025-07-04 19:55:26 +03:00
nimlgen
01f3c4f44d memory: simpler paddr allocation logic (#11090)
* memory: new paddr allocation logic

* am fix

* am refactrros

* fix

* mypy

* use it

* am
2025-07-04 17:00:36 +03:00
qazal
988540f401 support capturing cpu_profile on error (#11078)
* support capturing cpu_profile on error

* spacing

* pylint complains
2025-07-04 11:53:12 +03:00
chenyu
a2f5a54458 move sparse_categorical_crossentropy to test_ops (#11083)
also flattened the tests
2025-07-03 21:40:54 -04:00
chenyu
7c8ccb0267 sparse_categorical_crossentropy cleanup [pr] (#11082) 2025-07-03 18:32:52 -04:00
chenyu
678cabc6f2 use argfix in Tensor.stack (#11077)
works for multiple Tensor args or single tuple/list of Tensors, but not the mixed
2025-07-03 12:15:11 -04:00
qazal
b695e8c4d6 viz: remove support for naming with self (#11076) 2025-07-03 17:29:14 +03:00