geohotstan
536b254df4
Bump onnx to 1.18.0 ( #11266 )
...
* bump
* thou hast implement functions
* hacked in domain support
* some clean ups
* hack quantize_onnx_test too
* add helper lol, why onnx tests why
* better dispatcher, but need tests and better naming
* flaky ci
* change some names
* small clean ups
* make it easier to clean up tests once ORT supports 1.18.0
* nits
* fix bug of Softmax_1 being registered in onnx_ops
* need a default value
* resolve_const is better name
* fix OnnxRunner.to
* use proper domain names
2025-07-17 15:35:41 -04:00
qazal
e68af3b336
disable flaky assert in test_cpu_profile ( #11270 )
2025-07-17 06:50:39 +03:00
chenyu
522dc72f08
remove Kernel.local_dims [pr] ( #11268 )
...
* remove Kernel.local_dims [pr]
also not needed
* fix test_matvec
2025-07-16 17:46:19 -04:00
uuuvn
6f0ddcc24c
Remote cross-host graph ( #11229 )
2025-07-16 13:27:54 -07:00
quortus
924bc7c9ae
Fix test_uop_spec ( #11259 )
2025-07-16 11:02:31 +03:00
chenyu
c8e5c4d7c3
insert_before -> insert_at [pr] ( #11257 )
...
more precise
2025-07-15 17:44:34 -04:00
leopf
557ca7d757
testing SimpleTokenizer against OASST1 ( #11214 )
2025-07-14 17:09:31 -07:00
wozeparrot
5878b189b8
don't const fold shape changing bitcast ( #11236 )
2025-07-14 16:42:16 -07:00
chenyu
b6662096cb
remove more first_reduce [pr] ( #11239 )
2025-07-14 19:13:44 -04:00
chenyu
eb8e17ef59
remove most of the first_upcast [pr] ( #11238 )
2025-07-14 16:54:24 -04:00
chenyu
674dc28505
remove Kernel.full_unupcasted_shape [pr] ( #11215 )
...
decomp to shape_len and first_upcast to get the last upcast-able dim
2025-07-13 13:56:23 -04:00
Alisher Zhubanyshev
4ef6b46b34
hcq: reduce launch overhead ( #11193 )
...
* nv: improve mmio creation speed
* add memoryview test
* fix indents
* move mv bench to `test_helpers`, remove comparison
2025-07-13 19:25:50 +03:00
chenyu
2b48b961be
fix a few broken AMX tests ( #11204 )
2025-07-12 21:42:38 -04:00
chenyu
a0438012af
remove Kernel.get_program [pr] ( #11203 )
2025-07-12 20:50:29 -04:00
chenyu
73caa5dd1b
remove Kernel.membufs [pr] ( #11200 )
2025-07-12 14:48:47 -04:00
geohotstan
5ce278b245
OnnxRunner file as input ( #10789 )
...
* file path as input and have parse be in OnnxRunner.__init__
* modelproto_to_onnxrunner -> modelproto_to_runner
* whoops, fix import
* oh flakiness again, is it because it's getting gc-ed?
* small changes
* CI flaky so just move compile4 fix in
* copy typing of onnx_load
* actually can just import onnx_load instead of onnx.load
* fix external_benchmark_openpilot
* fix onnx_runner test to use onnx_helper
* rerun CI
* try run_modelproto
* spam CI a few times
* revert run_modelproto since that's flaky also
* no external onnx_load usage except onnx.py
* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?
* model_benchmark 193s -> 80s, add OnnxRunner.to()...
* minimize diff and clean up
* device can be None, weird but eh
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-12 14:27:46 -04:00
nimlgen
110cff3f2e
fix device arg to Tensor.randn ( #11194 )
...
* fix device arg to Tensor.randn
* simpler test
* self.assertEqual
2025-07-12 13:51:59 -04:00
chenyu
6283d50224
DEPRECATED_linearize -> to_program [pr] ( #11198 )
2025-07-12 13:46:20 -04:00
nimlgen
ea7f2f779c
hcq: p2p nv-amd ( #11195 )
...
* hcq: p2p between diff devices
* fix
2025-07-12 18:53:34 +03:00
qazal
d3ec63a5c3
viz: add base class for unittests ( #11178 )
2025-07-11 13:58:03 +03:00
nimlgen
fb278c6a02
do not recreate Compiled.profile_events in helper_collect_profile ( #11171 )
2025-07-10 23:55:12 +03:00
qazal
bde80c0cdf
record GraphEvents in metal graph ( #11145 )
...
* record GraphEvents in metal graph
* add TestProfiler.test_graph, revert old stuff
* move profile capture to MetalGraph
* comment
* don't double record graph command buffers
* wait_check
* explicit delete
2025-07-10 21:32:06 +03:00
chenyu
7db07e5f2c
don't narrow range of CAST on bool/unsigned ( #11156 )
2025-07-09 22:20:09 -04:00
George Hotz
4156baee93
break swizzle into three chunks [pr] ( #11153 )
...
* break swizzle into three chunks [pr]
* test failed
2025-07-09 15:30:34 -07:00
George Hotz
53ae153404
tc should be in opt ( #11148 )
...
* tc should be in opt [pr]
* fix import
2025-07-09 14:12:21 -07:00
nimlgen
b6981404ed
memory: use page shifts in memory manager ( #11149 )
...
* memory: use page shifts in memory manager
* fix
2025-07-09 22:05:00 +03:00
qazal
5c1d215b41
viz: add Graph stream ( #11144 )
...
* viz: stack an event for the entire batch
* multi
* whitespace
* work
* multi graph, Graph gets its own row
2025-07-09 20:56:46 +03:00
George Hotz
2893feb9f6
cleanups for kernel.py ( #11143 )
...
* cleanups for kernel.py
* fixups
2025-07-08 18:10:25 -07:00
George Hotz
359bed74f8
axis type tracking [pr] ( #11137 )
...
* axis type tracking [pr]
* keep update_info
* keep legacy colors
* update tests to apply_opt
2025-07-08 14:16:25 -07:00
chenyu
dada3f5bf3
skip some new onnx tests ( #11135 )
...
these fails on master with latest onnx
2025-07-08 16:12:48 -04:00
qazal
3dfc0ff887
move cpu_profile and shared ProfileEvents from device.py to helpers [pr] ( #11126 )
...
* move cpu_profile and shared ProfileEvents to helpers [pr]
* TestProfiler.test_cpu_profile
* update test_viz.py
* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
George Hotz
f7d4638e05
start LLM app, tons of clean up required. target is 200 line ollama ( #11068 )
...
* start LLM app, tons of clean up required. target is 200 line ollama
* kind of works
* simpler
* add k/v cache
* with SYM=1, it loops
* no rope cache
* simpler
* more cleanups
* cleanups
* works
* argparse and comments
* from gguf
* generate is a function
* no copy from cpu
* fix max context pass in
* test
* improve test
* ai2_arc
* fix 8B, use less ram
* 136 lines
2025-07-07 17:09:46 -07:00
chenyu
341a686799
Tensor.diagonal ( #11122 )
...
only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant
2025-07-07 16:21:26 -04:00
Sieds Lykles
584fd6af5a
Fix division by zero and mask bug in add views ( #11088 )
...
* merge view infinite loop test
* adjust condition in `x//d -> x//(-d)*-1`
* Fix division by zero in add views
* adjust offset end
* fix typo in comment
* add target to test_merge_views_variable
* fix view incorrectly being masked
* ssimplify strides and offset of the new view to canonicalize
* remove print in test
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-07-07 10:05:47 -07:00
Nino Risteski
a1a146a499
adding enable_gqa in SDPA ( #11097 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-06 23:25:33 -07:00
chenyu
7468959f4b
Tensor.argsort ( #11112 )
2025-07-06 13:56:35 -04:00
kevvz
b7af9cf849
clean svd tests, set full_matrices false in torch backend ( #11113 )
...
* clean tests, set full_matrices false
* add more shape asserts
2025-07-06 13:55:49 -04:00
chenyu
ba88ec3ad0
pipe linalg svd to torch ( #11109 )
...
and found a bug in svd
2025-07-06 08:37:25 -04:00
chenyu
845a4d32bc
Tensor.diag ( #11108 )
...
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00
ttomsa
4905af4ae0
remove invalid int div test ( #11106 )
...
* rm test
* also rm this
2025-07-05 18:57:55 -04:00
qazal
81781dc12b
viz: renames and spacing changes to tracing ( #11102 )
2025-07-05 18:40:39 +03:00
qazal
7619bf35e7
cleanup: remove disabled TestIndexingOrdering ( #11101 )
...
* cleanup: remove disabled TestIndexingOrdering
* don't import kernelize internals
2025-07-05 18:14:37 +03:00
qazal
4fcfaa0ef7
viz: switch to TracingKey ( #11100 )
...
* viz: switch to TracingKey
* tuple
* order is name, keys, fmt
* add test_tracing_key
2025-07-05 17:46:18 +03:00
qazal
3d8569f6d8
hotfix: infinite loop in tracking pattern matcher ( #11094 )
...
* failing test
* fix that
* given matchers
2025-07-04 19:55:26 +03:00
nimlgen
01f3c4f44d
memory: simpler paddr allocation logic ( #11090 )
...
* memory: new paddr allocation logic
* am fix
* am refactrros
* fix
* mypy
* use it
* am
2025-07-04 17:00:36 +03:00
qazal
988540f401
support capturing cpu_profile on error ( #11078 )
...
* support capturing cpu_profile on error
* spacing
* pylint complains
2025-07-04 11:53:12 +03:00
chenyu
a2f5a54458
move sparse_categorical_crossentropy to test_ops ( #11083 )
...
also flattened the tests
2025-07-03 21:40:54 -04:00
chenyu
7c8ccb0267
sparse_categorical_crossentropy cleanup [pr] ( #11082 )
2025-07-03 18:32:52 -04:00
chenyu
678cabc6f2
use argfix in Tensor.stack ( #11077 )
...
works for multiple Tensor args or single tuple/list of Tensors, but not the mixed
2025-07-03 12:15:11 -04:00
qazal
b695e8c4d6
viz: remove support for naming with self ( #11076 )
2025-07-03 17:29:14 +03:00