Commit Graph

3999 Commits

Author SHA1 Message Date
nimlgen
ea7f2f779c hcq: p2p nv-amd (#11195)
* hcq: p2p between diff devices

* fix
2025-07-12 18:53:34 +03:00
qazal
d3ec63a5c3 viz: add base class for unittests (#11178) 2025-07-11 13:58:03 +03:00
nimlgen
fb278c6a02 do not recreate Compiled.profile_events in helper_collect_profile (#11171) 2025-07-10 23:55:12 +03:00
qazal
bde80c0cdf record GraphEvents in metal graph (#11145)
* record GraphEvents in metal graph

* add TestProfiler.test_graph, revert old stuff

* move profile capture to MetalGraph

* comment

* don't double record graph command buffers

* wait_check

* explicit delete
2025-07-10 21:32:06 +03:00
chenyu
7db07e5f2c don't narrow range of CAST on bool/unsigned (#11156) 2025-07-09 22:20:09 -04:00
George Hotz
4156baee93 break swizzle into three chunks [pr] (#11153)
* break swizzle into three chunks [pr]

* test failed
2025-07-09 15:30:34 -07:00
George Hotz
53ae153404 tc should be in opt (#11148)
* tc should be in opt [pr]

* fix import
2025-07-09 14:12:21 -07:00
nimlgen
b6981404ed memory: use page shifts in memory manager (#11149)
* memory: use page shifts in memory manager

* fix
2025-07-09 22:05:00 +03:00
qazal
5c1d215b41 viz: add Graph stream (#11144)
* viz: stack an event for the entire batch

* multi

* whitespace

* work

* multi graph, Graph gets its own row
2025-07-09 20:56:46 +03:00
George Hotz
2893feb9f6 cleanups for kernel.py (#11143)
* cleanups for kernel.py

* fixups
2025-07-08 18:10:25 -07:00
George Hotz
359bed74f8 axis type tracking [pr] (#11137)
* axis type tracking [pr]

* keep update_info

* keep legacy colors

* update tests to apply_opt
2025-07-08 14:16:25 -07:00
chenyu
dada3f5bf3 skip some new onnx tests (#11135)
these fails on master with latest onnx
2025-07-08 16:12:48 -04:00
qazal
3dfc0ff887 move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126)
* move cpu_profile and shared ProfileEvents to helpers [pr]

* TestProfiler.test_cpu_profile

* update test_viz.py

* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
George Hotz
f7d4638e05 start LLM app, tons of clean up required. target is 200 line ollama (#11068)
* start LLM app, tons of clean up required. target is 200 line ollama

* kind of works

* simpler

* add k/v cache

* with SYM=1, it loops

* no rope cache

* simpler

* more cleanups

* cleanups

* works

* argparse and comments

* from gguf

* generate is a function

* no copy from cpu

* fix max context pass in

* test

* improve test

* ai2_arc

* fix 8B, use less ram

* 136 lines
2025-07-07 17:09:46 -07:00
chenyu
341a686799 Tensor.diagonal (#11122)
only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant
2025-07-07 16:21:26 -04:00
Sieds Lykles
584fd6af5a Fix division by zero and mask bug in add views (#11088)
* merge view infinite loop test

* adjust condition in `x//d -> x//(-d)*-1`

* Fix division by zero in add views

* adjust offset end

* fix typo in comment

* add target to test_merge_views_variable

* fix view incorrectly being masked

* ssimplify strides and offset of the new view to canonicalize

* remove print in test

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-07-07 10:05:47 -07:00
Nino Risteski
a1a146a499 adding enable_gqa in SDPA (#11097)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-06 23:25:33 -07:00
chenyu
7468959f4b Tensor.argsort (#11112) 2025-07-06 13:56:35 -04:00
kevvz
b7af9cf849 clean svd tests, set full_matrices false in torch backend (#11113)
* clean tests, set full_matrices false

* add more shape asserts
2025-07-06 13:55:49 -04:00
chenyu
ba88ec3ad0 pipe linalg svd to torch (#11109)
and found a bug in svd
2025-07-06 08:37:25 -04:00
chenyu
845a4d32bc Tensor.diag (#11108)
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00
ttomsa
4905af4ae0 remove invalid int div test (#11106)
* rm test

* also rm this
2025-07-05 18:57:55 -04:00
qazal
81781dc12b viz: renames and spacing changes to tracing (#11102) 2025-07-05 18:40:39 +03:00
qazal
7619bf35e7 cleanup: remove disabled TestIndexingOrdering (#11101)
* cleanup: remove disabled TestIndexingOrdering

* don't import kernelize internals
2025-07-05 18:14:37 +03:00
qazal
4fcfaa0ef7 viz: switch to TracingKey (#11100)
* viz: switch to TracingKey

* tuple

* order is name, keys, fmt

* add test_tracing_key
2025-07-05 17:46:18 +03:00
qazal
3d8569f6d8 hotfix: infinite loop in tracking pattern matcher (#11094)
* failing test

* fix that

* given matchers
2025-07-04 19:55:26 +03:00
nimlgen
01f3c4f44d memory: simpler paddr allocation logic (#11090)
* memory: new paddr allocation logic

* am fix

* am refactrros

* fix

* mypy

* use it

* am
2025-07-04 17:00:36 +03:00
qazal
988540f401 support capturing cpu_profile on error (#11078)
* support capturing cpu_profile on error

* spacing

* pylint complains
2025-07-04 11:53:12 +03:00
chenyu
a2f5a54458 move sparse_categorical_crossentropy to test_ops (#11083)
also flattened the tests
2025-07-03 21:40:54 -04:00
chenyu
7c8ccb0267 sparse_categorical_crossentropy cleanup [pr] (#11082) 2025-07-03 18:32:52 -04:00
chenyu
678cabc6f2 use argfix in Tensor.stack (#11077)
works for multiple Tensor args or single tuple/list of Tensors, but not the mixed
2025-07-03 12:15:11 -04:00
qazal
b695e8c4d6 viz: remove support for naming with self (#11076) 2025-07-03 17:29:14 +03:00
Sieds Lykles
53985297bd add test, fix rewrite rule and raise error on division by zero (#11073) 2025-07-03 08:25:06 -04:00
George Hotz
d049639221 little setitem test (#11064)
* setitem has one less realize, why broken

* put realize back
2025-07-02 15:10:24 -07:00
George Hotz
3b85534df0 outerworld range test [pr] (#11059)
* outerworld range test [pr]

* bound range

* grad acc test

* more tests

* 5 steps is fine
2025-07-02 14:28:44 -07:00
qazal
ad155f5454 print inputs to get_program in process replay [pr] (#11051)
* print inputs to get_program in process replay [pr]

* colors

* keep dataclass default escapes

* Revert "keep dataclass default escapes"

This reverts commit c6db7e8a7a.

* note for ast_repr

* add that back
2025-07-02 20:20:01 +03:00
qazal
a919b8325b add test_kernel_info (#11054)
* add test_kernel_info

* reorder
2025-07-02 19:48:12 +03:00
kevvz
3b041d188f [bounty] Singular Value Decomposition (#10875)
* inital commit

* add qr + expand svd to full matrix

* add odd number support

* add linalg tests

* qr supports dims of arbitrary size

* add qr tests

* svd supports dims of arbitrary size

* small cleanip

* improvements over svd batch handling

* improve linalg tests

* make u_pad match q shape

* add nonfull matrix tests

* little less verbose nonfull svd test

* added dtypes on svd + return vt instead of vt

* lint

* more lint

* lint + set seed

* small fix

* small lint

* lint

* add int casting to indices and shapes

* remove int from shape tuple in svd

* small cleanup

* add return types

* reuse inverse_permute

* refactoring

* whitespace

* remove regularization term to prevent bad outputs on ill conditioned matrices

* remove seed

* refactor

* lint

* refactor

* spacing

* remove clone

* line reduction

* smarter heuristic for iterations_per_round

* add big test

* lint

* turns out no constant needed?

* wrap tests

* some small matrices need the constant

* remove realize

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-07-02 09:06:03 -07:00
Ahmed Harmouche
e992ed10dc WebGPU on Windows (#10890)
* WebGPU on Windows

* Fix dawn-python install

* New test

* pydeps

* Minor fix

* Only install dawn-python on windows webgpu

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-07-02 08:38:45 -07:00
chenyu
4626e9c172 is_numpy_ndarray helper [pr] (#11050) 2025-07-02 09:12:53 -04:00
qazal
452b22c9b6 fix process replay diff in PYTHON device [pr] (#11052)
* fix process replay diff in PYTHON device [pr]

The PYTHON backend pickles and encodes UOps, the encoded binary can't be
directly diffed in process replay.

* note
2025-07-02 11:06:46 +03:00
geohotstan
8ebf0abaae ONNX external_test_onnx_backend use PYTHON device for model (#10915)
* try

* ruff check --fix

* no skip test

* hmmmmmmm I don't get this D:

* run CI again

* why is PYTHON device faster than CPU?

* run ci again and fix lint

* actually doesn't PYTHON device make sense here?

* see cpu speed again

* Revert "see cpu speed again"

This reverts commit 1e366f2256.

* trigger CI

* pretty good

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-01 12:11:17 -04:00
qazal
8b0871ac31 viz: test for no lockup on infinite loop (#11041)
* viz: add test infinite loop fallback

* assert

* continue til the end

* work

* bring that back

* fallback to nop
2025-07-01 17:44:20 +03:00
b1tg
fcbefde8f5 fix DiskDevice reuse (#11039)
* fix DiskDevice reuse

* fix mypy and DiskDevice.count

* mypy

* add test

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-07-01 10:29:21 -04:00
George Hotz
0597735f28 remove TC=3 not porting this (#11045) 2025-06-30 15:12:49 -07:00
George Hotz
cccfe6b422 hotfix: test_no_inf_loop_bottom_up 2025-06-30 14:21:45 -07:00
George Hotz
b829331219 infinite loop detect in fixed_point_rewrite [pr] (#11038) 2025-06-30 08:57:29 -07:00
George Hotz
cb531dba42 detect infinite loop in graph rewrite [pr] (#11036) 2025-06-30 08:15:13 -07:00
qazal
2ea4737930 viz: fix newlines breaking label colors (#11030)
* viz: fix newlines breaking label colors

* TestViz.test_colored_label

* TestWordWrap
2025-06-30 13:39:44 +03:00
George Hotz
5911b71404 early support for bidirectional pattern matcher (#11027)
* early support for bidirectional pattern matcher

* expose it and add a test

* no bottom up arg there

* disable flaky test
2025-06-29 16:54:07 -07:00