Commit Graph

9954 Commits

Author SHA1 Message Date
George Hotz
1e7b29ff22 improve DEBUG=2 string with TB/s and TFLOPS [pr] 2025-08-27 11:24:09 -07:00
Jordan Chalupka
e9789d8a70 Add mxfp4 support (#11873)
* bump ggml url

* map mxfp4 to tensor

* tests
2025-08-27 10:56:56 -07:00
qazal
884eb53e89 tracing: fix types (#11871)
* tracing: fix types

* /profiler isn't a thing

* return list
2025-08-27 15:50:43 +03:00
Sieds Lykles
d39365809a add ctx to z3_renderer arg (#11867)
* add ctx to z3_renderer arg

* update symbolic fuzzer

* rewrite u1,u2,u3

* update fuzz_fast_idiv

* remove imports
2025-08-27 03:38:15 +02:00
George Hotz
24c00a4061 darken hex on viz (#11865)
* darken hex on viz

* more readable
2025-08-26 15:57:50 -07:00
qazal
f38e4af226 viz: add custom zoom filter (#11861) 2025-08-27 01:30:29 +03:00
nimlgen
62df6c39af amd: correct handling of relocations (#11863)
* amd: correct handling of relocations

* ops

* add
2025-08-27 01:26:45 +03:00
George Hotz
d261458ecd add colors to range (#11860) 2025-08-26 14:32:12 -07:00
Sieds Lykles
7dfc7e4abc uops_to_z3 helper(#11859) 2025-08-26 22:58:05 +02:00
chenyu
1bbb578afd named expression for POW and MAX gradient (#11858) 2025-08-26 16:03:03 -04:00
chenyu
7028cb4167 clean up TestBitcastConstFolding (#11856) 2025-08-26 15:26:47 -04:00
George Hotz
d4154e0349 split devectorizing of buf/index (#11855) 2025-08-26 12:05:48 -07:00
George Hotz
b268755d51 small changes from postopt (#11854) 2025-08-26 11:56:16 -07:00
Sieds Lykles
a3aeef45cc associative variation of where branch-merging (#11851)
* add rule and test

* change comment
2025-08-26 19:27:05 +02:00
chenyu
aabe7756be fix type in fold_bitcast [pr] (#11853) 2025-08-26 13:22:30 -04:00
Jordan Chalupka
4785cd959a [TYPED=1] cvar should allow dtype as a tuple (#11770)
* cvar dtype:DType|tuple[DType, ...]|None=None

* fmt

* add a test

* list typeguard as a dep for CI

* extra step to install mypy

* fix venv

* ci fixes

* mv typeguard to testing install group

* simpler TYPED=1 test

* add typeguard to lint group
2025-08-26 12:49:51 -04:00
qazal
b111076301 viz: fixup click on overlay rect (#11850) 2025-08-26 19:25:42 +03:00
b1tg
1dd613cb89 test float_to_bf16 round-to-even behavior (#11849)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-26 12:16:10 -04:00
b1tg
409399c609 fix nan in float_to_bf16 (#11843)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-26 11:42:25 -04:00
qazal
43d5d66d34 viz: add UOp ports to edges (#11847)
* viz: add UOp ports to edges

* one edge label

* g.tag styling

* replace with NodeList
2025-08-26 18:31:52 +03:00
chenyu
f28f613f85 improved float_to_bf16 (#11848)
round instead of truncate
2025-08-26 11:14:06 -04:00
nimlgen
afe14ccbfa amd: aql default when several xccs (#11832) 2025-08-26 15:16:36 +03:00
qazal
3674c0754e viz: small uop click changes (#11846)
* also highlight self

* can always unselect by clicking outside

* less layout
2025-08-26 14:56:13 +03:00
qazal
f2a3c27372 viz: g.edges() once (#11845) 2025-08-26 13:29:59 +03:00
qazal
b0df3e62a8 viz: light up srcs and paths on UOp click (#11844)
* viz: light up srcs and paths on UOp click

* safari doesn't have context-stroke

* safari also has a bug

* safari acceptance
2025-08-26 09:03:09 +03:00
qazal
6236749867 viz: move rect styles to classes (#11842)
* viz: move rect styles to classes

* add rect
2025-08-26 07:55:34 +03:00
qazal
81ffa07439 viz: pass through nodes without a link (#11841) 2025-08-26 07:00:43 +03:00
Sieds Lykles
265d287615 add decomp for !x&!y -> !(x|y) (#11836) 2025-08-26 05:21:06 +02:00
chenyu
337e979a59 call dtypes.as_const in Tensor(list) (#11840) 2025-08-25 22:08:26 -04:00
George Hotz
215818379b new (post) group for reduce (#11837)
* new (post) group for reduce

* fixes

* leave if

* fix locals

* size

* no vectorized buf

* image fixes

* don't track that

* fix ptx

* name buffer with reduce range

* remove unused in lowerer

* yay DEFINE_REG refactor
2025-08-25 18:03:00 -07:00
chenyu
ac3449b0c8 truncate_fp16 cleanup (#11838)
native `@` is default
2025-08-25 19:03:41 -04:00
qazal
e146418f65 hotfix: profiler content-type is application/octet-stream (#11831) 2025-08-25 15:56:42 +03:00
qazal
a1f6823060 viz: memory layout in client side (#11830)
* viz: memory layout in client side

* update test_viz
2025-08-25 14:49:33 +03:00
George Hotz
a6dbb09058 changes for postrange (#11828) 2025-08-24 17:37:07 -07:00
George Hotz
27701ef823 add locals support to rangeify (#11826) 2025-08-24 14:03:12 -07:00
Sieds Lykles
a286a1a6f7 Fast idiv try removing factors of two before cast (#11824)
* try removing factors of two

* dont return if None

* add test
2025-08-24 20:04:25 +02:00
George Hotz
a03b930339 hotfix: green v2 in docs 2025-08-24 10:25:14 -07:00
George Hotz
6540bb32a6 move into codegen late [pr] (#11823) 2025-08-24 10:23:25 -07:00
nimlgen
bba088ef11 amd aql queue (#11708)
* amd aql queue

* xcc

* fiz

* aql better

* llvm

* no for aql

* wrap

* is_sql

* am support

* complete

* fix

* mypy

* minor
2025-08-24 19:53:00 +03:00
George Hotz
1fa09d9ede BLOCK_REORDER is context var, heuristic cleanups [pr] (#11819)
* BLOCK_REORDER is context var, heuristic cleanups [pr]

* split get opt and do opt

* oops, should be on
2025-08-24 09:41:34 -07:00
qazal
8b18cc2a94 viz memory layout cleanup (#11820)
* rename to dtype_size

* cleanr memory shape creator
2025-08-24 19:37:31 +03:00
Sieds Lykles
dd69114573 Revert "Better div nesting (#11811)" (#11818)
This reverts commit 952f729b07.
2025-08-24 18:11:24 +02:00
nimlgen
e19f901330 amd: rptr/wptr in create_queue (#11817) 2025-08-24 18:03:45 +03:00
nimlgen
d71444857e amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM (#11816)
* amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM

* fix
2025-08-24 17:48:40 +03:00
George Hotz
44bc7dc73d remove KernelInfo from GROUP_REDUCE (#11814) 2025-08-23 19:55:41 -07:00
George Hotz
229adfb7c3 Revert "remove KernelInfo from gpudims (#11809)" (#11813)
This reverts commit 846753f343.
2025-08-23 19:37:10 -07:00
Sieds Lykles
952f729b07 Better div nesting (#11811)
* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line
2025-08-24 04:17:40 +02:00
Sieds Lykles
e652062f92 tweak divmod_folding condition (#11810) 2025-08-24 02:59:02 +02:00
George Hotz
846753f343 remove KernelInfo from gpudims (#11809)
* remove KernelInfo from gpudims

* that's good in there
2025-08-23 16:32:45 -07:00
Sieds Lykles
07d4ed7e4c one more symbolic add variation (#11807) 2025-08-24 01:15:04 +02:00