George Hotz
82aa943cd4
fix that test
2025-11-19 08:48:49 -08:00
George Hotz
e16782cf9e
Merge branch 'master' into python_speed
2025-11-19 08:41:40 -08:00
George Hotz
1c47ee729e
fix names of rewrite rules
2025-11-19 08:41:34 -08:00
George Hotz
a8f9e69bd9
work on python speed
2025-11-19 08:34:15 -08:00
George Hotz
385618d45b
skip process replay by default ( #13353 )
2025-11-19 08:25:34 -08:00
George Hotz
ffff194e93
skip process replay by default
2025-11-19 08:14:44 -08:00
chenyu
fba4535289
remove hacks for threefry long removal when padded [pr] ( #13352 )
2025-11-19 11:11:39 -05:00
George Hotz
225eb1500f
generic range changes that work for str + int ( #13350 )
...
* generic range changes that work for str + int
* opt range counts up
2025-11-19 08:07:49 -08:00
chenyu
1a72ac16a6
move where same false branch rule to symbolic_simple [pr] ( #13349 )
2025-11-19 10:15:38 -05:00
chenyu
79055ddb8b
clean propagate_invalid more [pr] ( #13347 )
2025-11-19 09:47:50 -05:00
nimlgen
0c9fbf87e1
nvioctl: classes ( #13346 )
2025-11-19 16:14:15 +03:00
qazal
f2221130bb
viz: pick shape by event type ( #13279 )
2025-11-19 20:15:52 +08:00
wozeparrot
be72b78dcb
tk: small fixes ( #13345 )
...
* fix: handle case where final uop isn't a tk wrapped one
* clean: remove after from mma
2025-11-19 00:58:50 -08:00
wozeparrot
e4fbde5b3b
fix: extra options need to go on second step too ( #13344 )
2025-11-19 00:58:09 -08:00
George Hotz
1a332afa76
spec test on 3.14 ( #12957 )
2025-11-19 00:43:04 -08:00
Christopher Milan
a438c277de
autogen tests for 3.14 ( #13343 )
2025-11-18 22:16:59 -05:00
chenyu
722e7a16ed
remove rule in propagate_invalid [pr] ( #13342 )
2025-11-18 21:38:33 -05:00
George Hotz
1afa3c0877
vmap on full model ( #13340 )
...
* vmap on full model
* vmap gemm
* reduce sums on end
* outer reduce
* only if there's ranges
* put those rules in symbolic
* ranges
* do opt later
* add zero range
2025-11-18 16:06:06 -08:00
chenyu
46cb65e692
delete rules from sym [pr] ( #13339 )
2025-11-18 14:57:35 -05:00
George Hotz
9c59b3d19e
vmap grad needs reduce_backward ( #13336 )
...
* vmap grad needs reduce_backward
* fuse and outer
2025-11-18 10:08:30 -08:00
qazal
a647c9eca6
sqtt ui minor fixes ( #13335 )
...
* roc.py cleanups
* direct append
* viz index cleanup
* simd row details
2025-11-19 01:27:56 +08:00
George Hotz
06e39a88a9
outer vmap works ( #13334 )
...
* outer vmap works
* fuse works
* vmap outer works
* outer ranges work
* grad work
* should be good to merge
2025-11-18 09:27:48 -08:00
chenyu
805de27e07
no load substitute in uop_given_valid [pr] ( #13333 )
2025-11-18 11:47:58 -05:00
chenyu
05294bc648
fix some mypy cast [pr] ( #13331 )
2025-11-18 09:23:42 -05:00
qazal
5623e765c8
VIZ=2 enables SQTT ( #13330 )
2025-11-18 22:20:31 +08:00
nimlgen
331f70aa75
roc: ctrlc ( #13255 )
...
* roc: ctrl-c works
* rm
2025-11-18 19:29:28 +08:00
George Hotz
583560ab72
this is the right way to write vmap ( #13328 )
2025-11-17 20:20:52 -08:00
Christopher Milan
8e8e53c886
int8_t is c_byte ( #13326 )
2025-11-17 21:25:40 -05:00
George Hotz
e4fead8a86
write scan in uops ( #13321 )
...
* write scan in uops
* ops range
* no need for variable
* meh, later
* shorter
2025-11-17 16:58:08 -08:00
wozeparrot
8894a5409d
feat: hipcc compiler ( #13319 )
2025-11-17 15:13:32 -08:00
George Hotz
6d3385c284
print special ops in postrange ( #13318 )
...
* print special ops in postrange
* fix on OSX
2025-11-17 14:43:23 -08:00
chenyu
b637093be9
remove a few rules in pm_lower_index_dtype [pr] ( #13317 )
2025-11-17 17:04:56 -05:00
George Hotz
98e9e73286
hotfix: amd_uop_matmul getenvs
2025-11-17 13:26:01 -08:00
qazal
e7e1935225
cleanup sqtt/test_timing ( #13315 )
2025-11-18 04:28:05 +08:00
wozeparrot
33773fda87
tk initial mi350 ( #13289 )
2025-11-17 11:46:32 -08:00
nimlgen
e2cee64050
Revert "hcq: add tag to exec events ( #13311 )" ( #13314 )
...
This reverts commit f63ded5817 .
2025-11-17 22:15:31 +03:00
chenyu
646372490c
move tiktoken import in llama3 ( #13316 )
...
only Tokenizer requires that
2025-11-17 14:09:37 -05:00
qazal
a37f221e44
viz: visualize waves in the timeline ( #13292 )
...
* viz: visualize waves in the timeline
* timeline in format
* per step
* rm that
2025-11-17 22:04:21 +08:00
nimlgen
f63ded5817
hcq: add tag to exec events ( #13311 )
...
* hcq: add tag to exec events
* f
* fix
* fix
2025-11-17 16:59:30 +03:00
qazal
50a443f558
viz: add shader engine to wave exec payload ( #13310 )
...
* viz: show sqtt shader engine
* order it from smallest unit
* easier to config
2025-11-17 19:11:34 +08:00
nimlgen
9bb17c53ea
amd: timer fix ( #13267 )
2025-11-17 13:59:03 +03:00
George Hotz
55be95da15
cleanup sqtt raw parser ( #13309 )
...
* cleanup sqtt raw parser
* better names (don't merge yet)
* clean up amd
* a few more names
* one more filter
2025-11-16 13:11:51 -08:00
George Hotz
cabd4add48
more work parsing SQTT, separate VIZ/PROFILE ( #13308 )
...
* more work parsing SQTT
* more minimal runner
* sep VIZ/PROFILE
* parse print new
* improve parser
* more filter
* that
* split them
* lil cleanup
* skip flaky test
* AQL in mmapeak
2025-11-16 10:40:39 -08:00
qazal
13efdf8c31
test s_nop stall ( #13307 )
2025-11-17 00:59:39 +08:00
George Hotz
295600dc5a
saturday coffee shop work parsing the att format ( #13295 )
...
* saturday coffee shop work parsing the att format
* add examples
* parser
* classes of packets
* fully vibe coded parser
* vibing
* empty
* some vibe names
* vibes
* most of these are wrong
* more vibes
* better names
* parsing
* parse
* cleanup parser
* touchups
2025-11-16 08:25:51 -08:00
Christopher Milan
a9ed241172
properly suppress NIRRenderer.__del__ error ( #13299 )
2025-11-16 18:58:04 +03:00
qazal
c70b06ec19
sqtt test_timing work ( #13304 )
...
* sqtt test_timing cleanups
* only the instruction
* v_mfma_f32_16x16x32_f16 16 cycles, only after second one though
2025-11-16 23:49:24 +08:00
chenyu
8f0e747b3a
Tensor._tri with arange ( #13297 )
2025-11-16 10:21:16 -05:00
chenyu
6372c95094
disable benchmark MobileNetV2 on DSP ( #13305 )
...
failed on tinyc2
2025-11-16 09:42:52 -05:00
Christopher Milan
61625a3898
fix objc finalizing bug ( #13296 )
2025-11-16 12:43:04 +03:00