George Hotz
d32f5e9f3a
improve rendering of shapes in viz + investigate symbolic [pr] ( #10091 )
2025-04-28 16:44:09 -04:00
Sieds Lykles
dbb7aee02e
Split constant in div with negative x ( #10088 )
...
* add rule
* change test
* lower complexity limit
* remove offset in fold_unrolled_divs
* remove import
* add one more condition
2025-04-28 16:24:14 -04:00
chenyu
610ee79b22
cherry pick mlperf5.0 branch to master ( #10089 )
2025-04-28 15:36:56 -04:00
chenyu
459a223202
simpler Literal annotation in code_for_workitem [pr] ( #10087 )
2025-04-28 14:59:25 -04:00
nimlgen
dcd9a633c3
am: load minimum fw ( #10083 )
...
* am: load minimum psp parts
* try thos
* remove me & pfp
2025-04-28 21:28:05 +03:00
George Hotz
ecff82a698
fixing single kernel softmax: resolve ( #10086 )
...
* fixing single kernel softmax: resolve
* add failing lin test
2025-04-28 13:46:20 -04:00
George Hotz
4c242b0483
hotfix: tests all pass on metal local
2025-04-28 12:09:00 -04:00
George Hotz
690dac79b5
don't modify the ranges on reduce rewrite ( #10062 )
...
* bug in div range folding
* simpler
* oh, this is right for indexing, but the div mod folding needs to be fixed
* reenable
* Passing test_complexity_w_unroll2 (#10068 )
* Passing
* remove non_folded_divs
* Add check for negative tern in div folding
* Add test
* bump that limit
* fix casted
---------
Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com >
2025-04-28 12:01:19 -04:00
quortus
5130759605
Make sure clang always inline batched functions ( #10037 )
2025-04-28 10:48:24 -04:00
George Hotz
c4a50f9d89
fix full shape in kernel.py [pr] ( #10085 )
...
* fix full shape in kernel.py
* fix that heuristic
* full shape in shapetracker is fast
* fix process replay [pr]
* simpler
* this
* i'm just going to ignore that one
2025-04-28 09:32:58 -04:00
qazal
ac37510f60
remu: only write v_cmp result if exec is set ( #10084 )
2025-04-28 20:31:52 +08:00
qazal
d6b436a815
remu bugfix with -0.0 negation ( #10082 )
2025-04-28 15:46:42 +08:00
nimlgen
15e4302784
am: optimize zeroing out boot structs ( #10081 )
2025-04-28 10:15:32 +03:00
nimlgen
68e5ab8552
am: fix typo in fw loading ( #10080 )
2025-04-28 09:45:00 +03:00
chenyu
e996584685
olmoe in mac benchmark ( #10077 )
2025-04-27 21:07:02 -04:00
George Hotz
732e172961
don't require contiguous after fuse ( #10074 )
2025-04-27 13:17:22 -04:00
qazal
1aed04ec12
cpu is ground truth in VALIDATE_WITH_CPU=1 [pr] ( #10067 )
2025-04-28 01:14:21 +08:00
George Hotz
129bddde74
lin failure from SINGLE_KERNEL_SOFTMAX ( #10073 )
...
* lin failure from SINGLE_KERNEL_SOFTMAX
* fix lin issue
* more pure diff
2025-04-27 13:02:10 -04:00
George Hotz
b341296304
hotfix: save sdxl ram
2025-04-27 12:09:45 -04:00
George Hotz
68c5f7ba80
load fast in sdxl ( #10072 )
...
* load fast in sdxl
* back to that with the ret
* no context
2025-04-27 11:58:51 -04:00
George Hotz
768eb94c3e
disable debug for load_state_dict [pr] ( #10070 )
2025-04-27 11:11:56 -04:00
George Hotz
4b8ef6ce78
hotfix: sdxl corealize
2025-04-27 10:41:46 -04:00
George Hotz
b6d2effaf5
assign is contiguous ( #10066 )
...
* assign is contiguous
* disable process replay for SDXL
2025-04-27 08:40:33 -04:00
George Hotz
1253819151
make beautiful indexing use a Variable ( #10063 )
...
* make beautiful indexing use a Variable
* stunning test
* better color
* training is broken
* fix tests
* fix variable indexing
* fix test
* no contiguous
* revert that
* revert that too
* indexing two bind
* skip for webgpu
* make not slow
2025-04-27 08:22:38 -04:00
Rory Clear
a13a43c4fe
yolo 416 to 640 res ( #10047 )
2025-04-26 20:45:58 -04:00
chenyu
4c1ce1a299
don't simplify if div folding resulted in negative numerator ( #10064 )
...
* don't simplify if div folding resulted in negative numerator
* test
2025-04-26 17:01:18 -04:00
George Hotz
1805403821
fix rand arange folding ( #10060 )
...
* test rand range
* --amend
* fix rand arange folding
* reduce_rangeless fix
2025-04-26 12:24:05 -04:00
qazal
d13c100981
don't sort dims in verify_sink_dims [pr] ( #10059 )
...
* don't sort dims in verify_sink_dims [pr]
* 1 can exist with n
* put process_replay warn last
* assert shape is the same
* bring that back
2025-04-26 23:24:30 +08:00
George Hotz
c80fe6d5fc
handle some fancier reduces ( #10057 )
...
* reduce_unparented
* handle fancier reduces
* fold more
* bugfix
2025-04-26 11:20:15 -04:00
nimlgen
e08270c1ba
nv: fix program init for no-args kernels ( #10058 )
2025-04-26 18:08:53 +03:00
George Hotz
11113c9d07
reduce_unparented ( #10056 )
2025-04-26 09:48:16 -04:00
George Hotz
ea5dddc537
reduce collapse generic ( #10045 )
...
* reduce collapse generic
* new arange folder
* new range folding
* correct with sym
* all tests pass
* indexing ops passes
* failing tests
* fix tests, remove unused
* revert that
* torch indexing is fast
* skip on webgpu
* touchups
* comments
2025-04-26 09:13:24 -04:00
quortus
5cdc96409e
Update outdated renderer.render calls ( #10044 )
2025-04-26 07:35:19 -04:00
nimlgen
e055b9422f
am: fix mmap failures ( #10054 )
2025-04-26 14:21:28 +03:00
qazal
e1d2b64e92
remu new instructions ( #10050 )
...
* remu new instructions
* test_ds_store_half
* test_v_mul_f16
2025-04-26 02:04:12 +03:00
qazal
bba5d0a3e4
remu refactors ( #10028 )
...
* remu refactors
* scc is sgpr 253
* remove that
* rename to vcc_lo
* run cargo test in CI
* llvm-mc
* meh
* work
* work_group work 1
* seeded_lanes is dumb
* better than seeded_lanes
* does not need to be address
* 128 sgpr per wave
* scc is sgpr, we don't know which one
* null_src once more
* derive clone, wave init is cleaner
* init comes first
2025-04-26 04:31:10 +08:00
nimlgen
0fc85a2b0a
hcqfuzz: init ( #10049 )
...
* hcqfuzz: init
* fix fuzz
* linter
* graph
* taht test
* update readme
2025-04-25 23:19:21 +03:00
qazal
b30050e287
fix amdgpu_disassemble on mac [pr] ( #10042 )
2025-04-25 15:23:11 +08:00
George Hotz
a197aa4ef3
upat reduce syntax [pr] ( #10040 )
...
* upat reduce syntax [pr]
* switch z3 to graph_rewrite
2025-04-24 22:05:28 -04:00
Ignacio Sica
76a86735c0
hotfix amd bf16 is supported case ( #10039 )
...
* hotfix amd and amd_llvm
* bf16 not supported in ci
* hotfix amd_llvm is not a device
* remove default
* dont gate on ci and amd_llvm
* minor cleanup
* skip bf16 tc test for amd_llvm
2025-04-24 21:29:27 -03:00
Ignacio Sica
b4f823acbe
fix helper_tc_allclose ( #9606 )
...
* fix helper_tc_allclose
* cleanup
* hotfix
* cleanup
* cleanup
* check real buffer and add cast for bf16
* cleanup
* fix padded for ops_python
* avoid assert on amd emulated tc
* swap dimensions
* revert, should have nothing to do with padded
* revert fix, should not go in this pr
* remove skip
2025-04-24 18:36:40 -03:00
Rory Clear
3a189fa561
More yolo processing in tinygrad ( #9928 )
...
* more tg less np
* update webgpu html for new compile
* resize boxes
* remove text
* add back note
* fix indentation
* fix indentation
* remove magic num
* remove now unused funcs
* back to numpy nms
* no loop
* fix iou suppression
* update test
* dont suppress other classes
* add working scale
* fix expected value, rounded up 0.24 was being counted
* add postprocess bool for onnx test
* fix indents
* clean
* clean
* fix indent
* remove print
* fix indent
* remove unused import
* remove hardcoded 0.25
* space
* spacing
* clean label_predictions func
* remove single item lists
* space
* use postprocess output in test
* space
* clean
* clean
* remove redundant threshold
* remove redundant threshold
* clean
* rename var
* move loop into func
* unhardcode iou_threshold
* remove unused values
* clean
* add note
* clean
* keep const
* move back funcs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 16:21:46 -04:00
chenyu
74c6cf8be3
lint mlperf model_train ( #10038 )
2025-04-24 16:19:44 -04:00
Ignacio Sica
51ca19d061
set test_tensor_cores_padded_amd to expectedFailure ( #10036 )
...
* init
* add expected failure to correctly track progres
* hotfix
* skip for amd_llvm as well
* add skip
* add pr number
* move comment to amd test
* change reason
2025-04-24 17:11:40 -03:00
b1tg
914d89fa0b
fix tensor cores for gfx1201 ( #9838 )
...
* fix tensor cores for gfx1201
* fix typo
* fix python wmma
* AMDLLVMRenderer with arch + AMDLLVM tensor_cores
* fix ci
* clean up
* more tensor cores for RDNA4
* fix half/half, bfloat16/float, bfloat16/bfloat16 for amd_llvm
---------
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 14:57:41 -04:00
uuuvn
779aa1e2e9
Enable image tests on cloud if clouddev supports image ( #9903 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 14:30:12 -04:00
uuuvn
29a12b19ea
Add macos CLOUD tests ( #10033 )
...
A lot more work is required to enable all of them and move into osxtests
matrix, for now i created a separate runner for them (copied from WebGPU)
Will add test/test_graph.py to those tests in #9876
2025-04-24 14:14:13 -04:00
Nishant Rajadhyaksha
55942a8d8e
[Bounty] moved index_tensor off cpu in torch_backend ( #9916 )
...
* moved index tensor off cpu in torch_backend
* added support for None based indexing
* fix_to_pass_tests
* fix segfault tests
2025-04-24 14:12:37 -04:00
Ignacio Sica
373ca59b7f
use is_dtype_supported to check dtype support in tc tests ( #10035 )
2025-04-24 14:59:14 -03:00
Ignacio Sica
93a1e9eeb9
improve bf16 case for is_dtype_supported [pr] ( #10034 )
...
* fix is_dtype_supported for bf16
* hotfix
* add llvm and amd_llvm
* gate on machine
* separate gpu vs cpu cases
* add arm case
2025-04-24 14:03:57 -03:00