Commit Graph

10490 Commits

Author SHA1 Message Date
qazal
9aff00a6ea switch viz command line args to pathlib (#11922) 2025-08-30 18:13:47 +03:00
qazal
c86ee5bfaf viz: canonicalize device name colors (#11921) 2025-08-30 18:12:30 +03:00
nimlgen
a4f05ebd1a ci: rebuild gpuocelot with boost libs (#11920) 2025-08-30 17:24:19 +03:00
qazal
bf0d055b39 viz: color by name (#11919) 2025-08-30 16:04:58 +03:00
Sieds Lykles
0bc34c000f simplify range mod its own upper bound (#11917)
* add rules

* add tests
2025-08-30 08:37:35 +02:00
chenyu
561318fea7 Tensor.cos in test_stype_alu (#11916)
* Tensor.cos in test_stype_alu

* need this fix anyway
2025-08-29 20:26:36 -04:00
NoahKusaba
0838021753 remove np from beautiful_cifar (#10988)
* remove np from beautiful_cifar

* remove np from cifar

* rename variable and rename tensor.arrange to just tensor.randperm

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-29 19:34:16 -04:00
nimlgen
cf9d8c8142 ci: pin boost for macos runners (#11910) 2025-08-30 01:38:06 +03:00
nimlgen
c6e342cdac mockgpu: no hang if gpuocelot failed (#11915) 2025-08-30 00:44:49 +03:00
chenyu
26d03a86a1 test_symbolic_ops.py cleanup (#11895) 2025-08-29 17:11:59 -04:00
b1tg
b2cc06218a python bfloat16 (#11912)
* python bf16

* _to_torch_storage_type

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-29 15:18:02 -04:00
George Hotz
afad7d0cd1 remove dtype from range, it will be dtypes.index soon [pr] (#11914)
* remove dtype from range, it will be dtypes.index soon [pr]

* a few more
2025-08-29 09:52:07 -07:00
qazal
30e72d5820 multi device and copy tracing for NULL device (#11913)
* add device name to NULL programs

* trace transfers
2025-08-29 15:31:00 +03:00
qazal
d8e1e4dc61 tracing: show NULL programs (#11911) 2025-08-29 14:09:33 +03:00
nimlgen
75678b2cbe amd: retire pm4 xcc sync (#11835)
* amd: aql default when several xccs

* amd: retire om4 xcc sync

* remove more

* more

* more
2025-08-29 09:56:27 +03:00
George Hotz
394c2d1db1 update Kernel API in tests + move optimize_local_size (#11907) 2025-08-28 15:12:47 -07:00
nimlgen
fa695ac1ce ci: mac gpuocelot (#11906)
* gm

* fix?

* ops

* imp

* xx

* add file
2025-08-28 23:29:43 +03:00
George Hotz
b9b438c516 small updates from postopt (#11903)
* tests from postopt

* modernize

* skip lin tests

* that's fixed?

* skip, not failure
2025-08-28 12:34:52 -07:00
nimlgen
bb55a3001f nv: flush reset message (#11897) 2025-08-28 22:17:20 +03:00
nimlgen
e8289c75b1 ci: do not reinstall existing pkgs in macos (#11900) 2025-08-28 21:20:15 +03:00
chenyu
134cf56904 update cache name for gpuocelot (#11896) 2025-08-28 13:11:10 -04:00
Ben Waldron
ea1be2e4cd [bounty] Remove using reshape to register symbolic shape (#11771)
* Modify tests and start work towards removing symbolic reshape

* Refactor symbolic reshape

* fix small error

* much cleaner + fix more tests

* Can remove this now

* Update test_symbolic_ops and test_tiny

* Couple more tests

* Unused import

* More tests and add EXPAND to Tensor.empty

* Fix test beam search

* all int

* Fix rangeify by adding shrink

* Remove OOB check and so fix test_symbolic_jit

* test_symbolic_jit doesn't need OOB Context anymore either

* Should remove that test now

* Cleanups part 1

* fix linters

* Final cleanups

* Don't reassign inside for loop

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 12:30:49 -04:00
qazal
53853ae49b viz: switch to Path2D (#11892) 2025-08-28 18:58:16 +03:00
nimlgen
874c1db4af am: init support for aql (#11888) 2025-08-28 18:41:46 +03:00
Ben Waldron
17ecaf4682 Add test_variable_empty (#11889)
* Add test_variable_empty

* Move test and add TODO

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 11:38:27 -04:00
Nino Risteski
54be477152 rope cache optim for jit prune in llm.py (#11678)
* rope cache optim for jit prune

* rope test

* tests in test attention

* Revert "rope test"

This reverts commit 69ede543d0.

* lint
2025-08-28 08:31:29 -07:00
quortus
5f8fe9a331 Replace ASSIGN with STORE in test_linearizer (#11821) 2025-08-28 07:33:20 -07:00
geohotstan
4e8370309c Support onnx If OP (#11648)
* start

* tiny clean up

* whoops, didn't mean to accidentally fix this

* fix .to(device), kinda hacky and this fix makes it slower?

* merge properly

* FINALLY figured out slowness, also hack pylint for now

* add DEBUGONNX print for subgraph

* oops

* WOOOOOOOO SHAPE CACHE 50% SPEED INCREASE

* small fix, but maybe all deterministic Tensor creation in fp should be cached

* cache condition

* sliiiightly cleaner

* better abstraction?

* remove sam from model_benchmark

* remove shape cache speed up for now

* less lines

* isinstance fix

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 10:17:35 -04:00
George Hotz
6d6f0dada7 support for tuple ranges (#11890)
* support for tuple ranges

* breaks it
2025-08-28 07:02:31 -07:00
nimlgen
60dd9a162c memory: tiny tlsf cleanup (#11887) 2025-08-28 14:07:18 +03:00
chenyu
beb5982165 FUSE_ATTENTION (#11884) 2025-08-27 19:59:17 -04:00
George Hotz
cb5295168d postrange boilerplate work (#11881) 2025-08-27 15:22:59 -07:00
George Hotz
fd579433bc pre expander shouldn't go in gpudims (#11880) 2025-08-27 14:52:24 -07:00
nimlgen
44816218b5 memplan: fix large buffers planning (#11878)
* memplan: fix large buffers planning

* fix

* fix dsp
2025-08-27 23:54:27 +03:00
nimlgen
4006366752 Revert "memplan: fix large buffers planning (#11876)" (#11877)
This reverts commit 7f90497efc.
2025-08-27 22:36:14 +03:00
nimlgen
7f90497efc memplan: fix large buffers planning (#11876)
* memplan: fix large buffers planning

* fix
2025-08-27 22:04:15 +03:00
George Hotz
e4afdf9ea1 improve DEBUG=2 string with TB/s and TFLOPS [pr] (#11875) 2025-08-27 11:42:41 -07:00
Jordan Chalupka
e9789d8a70 Add mxfp4 support (#11873)
* bump ggml url

* map mxfp4 to tensor

* tests
2025-08-27 10:56:56 -07:00
qazal
884eb53e89 tracing: fix types (#11871)
* tracing: fix types

* /profiler isn't a thing

* return list
2025-08-27 15:50:43 +03:00
Sieds Lykles
d39365809a add ctx to z3_renderer arg (#11867)
* add ctx to z3_renderer arg

* update symbolic fuzzer

* rewrite u1,u2,u3

* update fuzz_fast_idiv

* remove imports
2025-08-27 03:38:15 +02:00
George Hotz
24c00a4061 darken hex on viz (#11865)
* darken hex on viz

* more readable
2025-08-26 15:57:50 -07:00
qazal
f38e4af226 viz: add custom zoom filter (#11861) 2025-08-27 01:30:29 +03:00
nimlgen
62df6c39af amd: correct handling of relocations (#11863)
* amd: correct handling of relocations

* ops

* add
2025-08-27 01:26:45 +03:00
George Hotz
d261458ecd add colors to range (#11860) 2025-08-26 14:32:12 -07:00
Sieds Lykles
7dfc7e4abc uops_to_z3 helper(#11859) 2025-08-26 22:58:05 +02:00
chenyu
1bbb578afd named expression for POW and MAX gradient (#11858) 2025-08-26 16:03:03 -04:00
chenyu
7028cb4167 clean up TestBitcastConstFolding (#11856) 2025-08-26 15:26:47 -04:00
George Hotz
d4154e0349 split devectorizing of buf/index (#11855) 2025-08-26 12:05:48 -07:00
George Hotz
b268755d51 small changes from postopt (#11854) 2025-08-26 11:56:16 -07:00
Sieds Lykles
a3aeef45cc associative variation of where branch-merging (#11851)
* add rule and test

* change comment
2025-08-26 19:27:05 +02:00