George Hotz
1e7b29ff22
improve DEBUG=2 string with TB/s and TFLOPS [pr]
2025-08-27 11:24:09 -07:00
Jordan Chalupka
e9789d8a70
Add mxfp4 support ( #11873 )
...
* bump ggml url
* map mxfp4 to tensor
* tests
2025-08-27 10:56:56 -07:00
qazal
884eb53e89
tracing: fix types ( #11871 )
...
* tracing: fix types
* /profiler isn't a thing
* return list
2025-08-27 15:50:43 +03:00
Sieds Lykles
d39365809a
add ctx to z3_renderer arg ( #11867 )
...
* add ctx to z3_renderer arg
* update symbolic fuzzer
* rewrite u1,u2,u3
* update fuzz_fast_idiv
* remove imports
2025-08-27 03:38:15 +02:00
George Hotz
24c00a4061
darken hex on viz ( #11865 )
...
* darken hex on viz
* more readable
2025-08-26 15:57:50 -07:00
qazal
f38e4af226
viz: add custom zoom filter ( #11861 )
2025-08-27 01:30:29 +03:00
nimlgen
62df6c39af
amd: correct handling of relocations ( #11863 )
...
* amd: correct handling of relocations
* ops
* add
2025-08-27 01:26:45 +03:00
George Hotz
d261458ecd
add colors to range ( #11860 )
2025-08-26 14:32:12 -07:00
Sieds Lykles
7dfc7e4abc
uops_to_z3 helper( #11859 )
2025-08-26 22:58:05 +02:00
chenyu
1bbb578afd
named expression for POW and MAX gradient ( #11858 )
2025-08-26 16:03:03 -04:00
chenyu
7028cb4167
clean up TestBitcastConstFolding ( #11856 )
2025-08-26 15:26:47 -04:00
George Hotz
d4154e0349
split devectorizing of buf/index ( #11855 )
2025-08-26 12:05:48 -07:00
George Hotz
b268755d51
small changes from postopt ( #11854 )
2025-08-26 11:56:16 -07:00
Sieds Lykles
a3aeef45cc
associative variation of where branch-merging ( #11851 )
...
* add rule and test
* change comment
2025-08-26 19:27:05 +02:00
chenyu
aabe7756be
fix type in fold_bitcast [pr] ( #11853 )
2025-08-26 13:22:30 -04:00
Jordan Chalupka
4785cd959a
[TYPED=1] cvar should allow dtype as a tuple ( #11770 )
...
* cvar dtype:DType|tuple[DType, ...]|None=None
* fmt
* add a test
* list typeguard as a dep for CI
* extra step to install mypy
* fix venv
* ci fixes
* mv typeguard to testing install group
* simpler TYPED=1 test
* add typeguard to lint group
2025-08-26 12:49:51 -04:00
qazal
b111076301
viz: fixup click on overlay rect ( #11850 )
2025-08-26 19:25:42 +03:00
b1tg
1dd613cb89
test float_to_bf16 round-to-even behavior ( #11849 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-26 12:16:10 -04:00
b1tg
409399c609
fix nan in float_to_bf16 ( #11843 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-26 11:42:25 -04:00
qazal
43d5d66d34
viz: add UOp ports to edges ( #11847 )
...
* viz: add UOp ports to edges
* one edge label
* g.tag styling
* replace with NodeList
2025-08-26 18:31:52 +03:00
chenyu
f28f613f85
improved float_to_bf16 ( #11848 )
...
round instead of truncate
2025-08-26 11:14:06 -04:00
nimlgen
afe14ccbfa
amd: aql default when several xccs ( #11832 )
2025-08-26 15:16:36 +03:00
qazal
3674c0754e
viz: small uop click changes ( #11846 )
...
* also highlight self
* can always unselect by clicking outside
* less layout
2025-08-26 14:56:13 +03:00
qazal
f2a3c27372
viz: g.edges() once ( #11845 )
2025-08-26 13:29:59 +03:00
qazal
b0df3e62a8
viz: light up srcs and paths on UOp click ( #11844 )
...
* viz: light up srcs and paths on UOp click
* safari doesn't have context-stroke
* safari also has a bug
* safari acceptance
2025-08-26 09:03:09 +03:00
qazal
6236749867
viz: move rect styles to classes ( #11842 )
...
* viz: move rect styles to classes
* add rect
2025-08-26 07:55:34 +03:00
qazal
81ffa07439
viz: pass through nodes without a link ( #11841 )
2025-08-26 07:00:43 +03:00
Sieds Lykles
265d287615
add decomp for !x&!y -> !(x|y) ( #11836 )
2025-08-26 05:21:06 +02:00
chenyu
337e979a59
call dtypes.as_const in Tensor(list) ( #11840 )
2025-08-25 22:08:26 -04:00
George Hotz
215818379b
new (post) group for reduce ( #11837 )
...
* new (post) group for reduce
* fixes
* leave if
* fix locals
* size
* no vectorized buf
* image fixes
* don't track that
* fix ptx
* name buffer with reduce range
* remove unused in lowerer
* yay DEFINE_REG refactor
2025-08-25 18:03:00 -07:00
chenyu
ac3449b0c8
truncate_fp16 cleanup ( #11838 )
...
native `@` is default
2025-08-25 19:03:41 -04:00
qazal
e146418f65
hotfix: profiler content-type is application/octet-stream ( #11831 )
2025-08-25 15:56:42 +03:00
qazal
a1f6823060
viz: memory layout in client side ( #11830 )
...
* viz: memory layout in client side
* update test_viz
2025-08-25 14:49:33 +03:00
George Hotz
a6dbb09058
changes for postrange ( #11828 )
2025-08-24 17:37:07 -07:00
George Hotz
27701ef823
add locals support to rangeify ( #11826 )
2025-08-24 14:03:12 -07:00
Sieds Lykles
a286a1a6f7
Fast idiv try removing factors of two before cast ( #11824 )
...
* try removing factors of two
* dont return if None
* add test
2025-08-24 20:04:25 +02:00
George Hotz
a03b930339
hotfix: green v2 in docs
2025-08-24 10:25:14 -07:00
George Hotz
6540bb32a6
move into codegen late [pr] ( #11823 )
2025-08-24 10:23:25 -07:00
nimlgen
bba088ef11
amd aql queue ( #11708 )
...
* amd aql queue
* xcc
* fiz
* aql better
* llvm
* no for aql
* wrap
* is_sql
* am support
* complete
* fix
* mypy
* minor
2025-08-24 19:53:00 +03:00
George Hotz
1fa09d9ede
BLOCK_REORDER is context var, heuristic cleanups [pr] ( #11819 )
...
* BLOCK_REORDER is context var, heuristic cleanups [pr]
* split get opt and do opt
* oops, should be on
2025-08-24 09:41:34 -07:00
qazal
8b18cc2a94
viz memory layout cleanup ( #11820 )
...
* rename to dtype_size
* cleanr memory shape creator
2025-08-24 19:37:31 +03:00
Sieds Lykles
dd69114573
Revert "Better div nesting ( #11811 )" ( #11818 )
...
This reverts commit 952f729b07 .
2025-08-24 18:11:24 +02:00
nimlgen
e19f901330
amd: rptr/wptr in create_queue ( #11817 )
2025-08-24 18:03:45 +03:00
nimlgen
d71444857e
amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM ( #11816 )
...
* amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM
* fix
2025-08-24 17:48:40 +03:00
George Hotz
44bc7dc73d
remove KernelInfo from GROUP_REDUCE ( #11814 )
2025-08-23 19:55:41 -07:00
George Hotz
229adfb7c3
Revert "remove KernelInfo from gpudims ( #11809 )" ( #11813 )
...
This reverts commit 846753f343 .
2025-08-23 19:37:10 -07:00
Sieds Lykles
952f729b07
Better div nesting ( #11811 )
...
* remove check
* use fold_divmod_congruence instead of simplify
* adjust tests
* shorten line
2025-08-24 04:17:40 +02:00
Sieds Lykles
e652062f92
tweak divmod_folding condition ( #11810 )
2025-08-24 02:59:02 +02:00
George Hotz
846753f343
remove KernelInfo from gpudims ( #11809 )
...
* remove KernelInfo from gpudims
* that's good in there
2025-08-23 16:32:45 -07:00
Sieds Lykles
07d4ed7e4c
one more symbolic add variation ( #11807 )
2025-08-24 01:15:04 +02:00