Commit Graph

8961 Commits

Author SHA1 Message Date
chenyu
51dc7eedb0 correct use AM for resnet run_and_time (#10524) 2025-05-26 15:33:11 -04:00
chenyu
c1919ad55f use AM for resnet run_and_time (#10523) 2025-05-26 14:50:49 -04:00
George Hotz
e9bb2052cf hotfix: update readme 2025-05-26 10:28:16 -07:00
qazal
6d07087fe1 remove contiguous from MSELECT 2 (#10522)
* remove contiguous from MSELECT

* test_shrink_on_shard_axis

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-05-26 19:19:01 +03:00
geohotstan
602a145f8f Add Tensor.unfold (#10518)
* yoinked 10272

* eitanturok's fixes

* hmmm should size be sint?

* add test
2025-05-26 11:15:44 -04:00
qazal
9169dcfb49 do not create kernels with more inputs than the backend allows (#10510)
* work

* no itertools + top down pass

* clean viz

* python can do that

* webgpu

* gbarrier of gbarrier is gbarrier

* device can be tuple

* bug in toposort

* failing test for gated toposort

* contiguous of gbarrier is gbarrier

* check for binops

* Revert "check for binops"

This reverts commit 53e3cdf720.

* viz + match on gbarrier, self exists by default

* alt

* green now

* cleanup
2025-05-26 18:02:03 +03:00
nimlgen
deb369417c am_smi: print device usage (#10520)
* am_smi: print device usage

* tiny comments
2025-05-26 17:17:56 +03:00
chenyu
2d50efb92b set -e on mlperf run_and_time scripts (#10519) 2025-05-26 09:22:30 -04:00
Sieds Lykles
478c76f4b7 More div conditions (#10432)
* add condition

* add test

* use Variable
2025-05-26 07:36:05 -04:00
Sieds Lykles
c6c7882bdf bugfix: seperate rule for x//d<-c (#10148)
* Add rule

* Add test

* Add test for edge case 0

* Merge patterns

* update comment

* consistent whitespace

* whitespace

* update comment
2025-05-26 07:35:41 -04:00
chenyu
2eeea373af add BENCHMARK_LOG for mlperf resnet cron (#10516) 2025-05-25 22:00:29 -04:00
b1tg
a1f64af92d ci: setup llvm for amdremote (#10507)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-05-25 21:52:27 -04:00
geohotstan
fd9f236a82 move test over (#10508) 2025-05-25 21:51:51 -04:00
wozeparrot
7c81f9f95e fix: gate mlperf workflow (#10515) 2025-05-25 17:06:21 -07:00
Panagiotis Kourouklidis
4941486cb0 Add method to return field masks for AMDReg (#10511) 2025-05-25 14:47:20 -07:00
nimlgen
88c5864bf3 nv: do not hardcode sass version (#10513) 2025-05-25 22:41:15 +03:00
George Hotz
941cbd3471 hotfix: amd works on arch linux w/o rocm 2025-05-24 16:47:13 -07:00
nimlgen
d90ddcc365 nv: blackwell support (#10487)
* nv: blackwell support

* fixes

* hm

* h

* fixes

* mypy

* xx

* yy

* arr

* revert

* oops

* unrelated
2025-05-24 18:23:53 +03:00
chenyu
dc6309242d WallTimeEvent for mlperf ci (#10506) 2025-05-24 10:56:03 -04:00
qazal
dd5601af68 readable COPY(VIEW) reordering [pr] (#10505)
* readable COPY(VIEW) reordering [pr]

* assert that

* spec

* resolve

* Revert "resolve"

This reverts commit f5629fbef8.

* arg
2025-05-24 17:08:58 +03:00
Ahmed Harmouche
bbb6deff53 Increase op limit in test_index_mnist to pass on webgpu (#10504)
* Increase op limit to enable  mnist indexing on webgpu

* Only relax op_limit on WebGPU
2025-05-24 09:37:31 -04:00
nimlgen
c472ab636c nv: use regcount from meta (#10503) 2025-05-24 14:14:33 +03:00
qazal
82b444796d fix display of kernel args in viz [pr] (#10502) 2025-05-24 14:09:52 +03:00
qazal
a9d0bf5c4c proper error for device mismatch (#10500)
* failing test

* use bufs

* buf_uop

* not on cpu
2025-05-24 12:17:41 +03:00
qazal
fc1300f5e3 top down create_kernels + delete "replace assign sources" (#10478)
* rebase from #10468

* fixup metadata 2

* that too

* comments for metadata

* remove_gbarrier is not needed anymore

* skip that

* break metadata more

* delete more metadata fixups

* err, fix kernelize diamond

* unskip metadata

* new_map

* roots

* replace metadata of roots

* check empty

* replace globals is better
2025-05-24 09:50:06 +03:00
George Hotz
9eee5ae276 its copying the dataset every time (#10498)
* its copying the dataset every time

* add comment

* expect failure

* todo
2025-05-23 21:25:53 -07:00
George Hotz
4467b52721 remove all self copies after Tensor.clone fix (#10494) 2025-05-23 19:04:20 -07:00
George Hotz
b58f2d4544 fix tests (#10493) 2025-05-23 18:38:07 -07:00
George Hotz
6b8eb5fec2 split mlperf to its own red benchmark run (#10492)
* Add mmapeak implementation for 7900 XTX

* Change identation

* Use a template instead of multiple assebly files

* Fix output formatting

* Reduce register file bank conflicts

* More accurate measurement for quick instructions

* Add support for gfx1201

* RDNA4 wmma requires less VGRPs

* RDNA4 does not have s_cmpk instructions

* Add v_wmma_i32_16x16x32_iu4 for gfx1201

* Add sparse wmma instructions

* split to tinybox red MLPerf Benchmark

---------

Co-authored-by: Panagiotis Kourouklidis <panagiotis.kourouklidis@gmail.com>
2025-05-23 17:12:41 -07:00
Panagiotis Kourouklidis
e21836952d mmapeak implementation for 7900 XTX (#10417)
* Add mmapeak implementation for 7900 XTX

* Change identation

* Use a template instead of multiple assebly files

* Fix output formatting

* Reduce register file bank conflicts

* More accurate measurement for quick instructions

* Add support for gfx1201

* RDNA4 wmma requires less VGRPs

* RDNA4 does not have s_cmpk instructions

* Add v_wmma_i32_16x16x32_iu4 for gfx1201

* Add sparse wmma instructions

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-23 16:26:12 -07:00
George Hotz
0a313d98a0 add rocm 6.4 support (#10491)
* add rocm 6.4 support

* update to newer amdcomgr, assert lang is right

* fix aux-triple
2025-05-23 16:20:54 -07:00
wozeparrot
a18963d9e7 feat: use tinygrad useragent (#10488) 2025-05-23 15:44:40 -07:00
George Hotz
0ebd440872 add mselect op (#10453)
* add mselect op

* more work

* that shouldn't be contiguous

* remove junk

* it segfaults...

* more correct

* test fail

* inserting a contiguous fixes it

* fix children in mselect

* complain

* error

* push RESHAPE through MSELECT

* no copy arg, use mselect
2025-05-23 14:11:37 -07:00
George Hotz
bf2a0907be gate the mockdsp behind MOCKDSP=1 [pr] (#10486) 2025-05-23 11:44:02 -07:00
uuuvn
3ca5680920 Test remote in benchmark (#10304)
hlb cifar is fast so added it, can add bert too if you think it's ok

6 real gpus to test multigraph and transfers + accuracy validation

should probably be added to tinystats too, i don't know how though

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-23 12:12:57 -04:00
qazal
7a762f01ab s/shape_spec/ast_spec [pr] (#10485) 2025-05-23 15:43:54 +03:00
qazal
127a7c8aee assert AST views only exist in the edges (#10484)
* assert AST views only exist in the edges

* valid without device
2025-05-23 15:27:09 +03:00
qazal
e491168685 add metadata note + whitespace fixup [pr] (#10483)
* add metadata note + whitespace fixup [pr]

* TestSchedule.test_kernelize_diamond
2025-05-23 14:37:45 +03:00
chenyu
c5acb4e06e run mlperf resnet daily (#10482)
Runs at 08:05 UTC (12:05 AM Pacific Time)
2025-05-23 07:16:20 -04:00
Sieds Lykles
ce6ebfb8ee verify rewrites in test_uop_symbolic (#10430)
* verify rewrites in test_uop_symbolic

* use global context
2025-05-23 06:57:29 -04:00
qazal
52e8b69d98 create_kernels only matches on GBARRIER and ASSIGN [pr] (#10480) 2025-05-23 11:11:57 +03:00
George Hotz
1e4d63e06e uops can have multiple metadata (#10479)
* uops can have multiple metadata

* fixups
2025-05-22 21:35:02 -07:00
George Hotz
283586bb96 insert GBARRIER into graph (#10468)
* insert contiguous into graph

* exclude contiguous from kernels

* and copy

* not needed on copy

* gbarrier

* gbarrier closer

* gb

* gb

* fix double realize logic bug

* remove gbarrier

* del that

* uop tags

* tag

* fix setitem, flaky

* no ctx there

* flip rewrite

* revert order until metadata is fixed
2025-05-22 20:53:36 -07:00
George Hotz
d2bb50d75b graph_rewrite_map in the other order [pr] (#10476)
* graph_rewrite_map in the other order [pr]

* reversed to preserve behavior
2025-05-22 20:22:07 -07:00
George Hotz
9fc01c1e03 support for uop tags (#10477)
* support for uop tags [pr]

* test uop tags
2025-05-22 19:53:48 -07:00
chenyu
8cc2dff4d8 only float Tensors have gradient [pr] (#10475) 2025-05-22 21:02:11 -04:00
George Hotz
147f7747f2 remove the map from create_schedule_with_vars [pr] (#10472) 2025-05-22 15:58:25 -07:00
George Hotz
6d5f87a18a lshift/rshift reverse is broken [pr] (#10467) 2025-05-22 13:01:48 -07:00
Mike Ashcroft
209d4401f8 Merge SimpleMathTrait and MathTrait (#10463) 2025-05-22 11:47:22 -07:00
George Hotz
0d39bb5de1 rename to get_kernelize_map (#10465) 2025-05-22 11:44:44 -07:00