Commit Graph

9095 Commits

Author SHA1 Message Date
George Hotz
941cbd3471 hotfix: amd works on arch linux w/o rocm 2025-05-24 16:47:13 -07:00
nimlgen
d90ddcc365 nv: blackwell support (#10487)
* nv: blackwell support

* fixes

* hm

* h

* fixes

* mypy

* xx

* yy

* arr

* revert

* oops

* unrelated
2025-05-24 18:23:53 +03:00
chenyu
dc6309242d WallTimeEvent for mlperf ci (#10506) 2025-05-24 10:56:03 -04:00
qazal
dd5601af68 readable COPY(VIEW) reordering [pr] (#10505)
* readable COPY(VIEW) reordering [pr]

* assert that

* spec

* resolve

* Revert "resolve"

This reverts commit f5629fbef8.

* arg
2025-05-24 17:08:58 +03:00
Ahmed Harmouche
bbb6deff53 Increase op limit in test_index_mnist to pass on webgpu (#10504)
* Increase op limit to enable  mnist indexing on webgpu

* Only relax op_limit on WebGPU
2025-05-24 09:37:31 -04:00
nimlgen
c472ab636c nv: use regcount from meta (#10503) 2025-05-24 14:14:33 +03:00
qazal
82b444796d fix display of kernel args in viz [pr] (#10502) 2025-05-24 14:09:52 +03:00
qazal
a9d0bf5c4c proper error for device mismatch (#10500)
* failing test

* use bufs

* buf_uop

* not on cpu
2025-05-24 12:17:41 +03:00
qazal
fc1300f5e3 top down create_kernels + delete "replace assign sources" (#10478)
* rebase from #10468

* fixup metadata 2

* that too

* comments for metadata

* remove_gbarrier is not needed anymore

* skip that

* break metadata more

* delete more metadata fixups

* err, fix kernelize diamond

* unskip metadata

* new_map

* roots

* replace metadata of roots

* check empty

* replace globals is better
2025-05-24 09:50:06 +03:00
George Hotz
9eee5ae276 its copying the dataset every time (#10498)
* its copying the dataset every time

* add comment

* expect failure

* todo
2025-05-23 21:25:53 -07:00
George Hotz
4467b52721 remove all self copies after Tensor.clone fix (#10494) 2025-05-23 19:04:20 -07:00
George Hotz
b58f2d4544 fix tests (#10493) 2025-05-23 18:38:07 -07:00
George Hotz
6b8eb5fec2 split mlperf to its own red benchmark run (#10492)
* Add mmapeak implementation for 7900 XTX

* Change identation

* Use a template instead of multiple assebly files

* Fix output formatting

* Reduce register file bank conflicts

* More accurate measurement for quick instructions

* Add support for gfx1201

* RDNA4 wmma requires less VGRPs

* RDNA4 does not have s_cmpk instructions

* Add v_wmma_i32_16x16x32_iu4 for gfx1201

* Add sparse wmma instructions

* split to tinybox red MLPerf Benchmark

---------

Co-authored-by: Panagiotis Kourouklidis <panagiotis.kourouklidis@gmail.com>
2025-05-23 17:12:41 -07:00
Panagiotis Kourouklidis
e21836952d mmapeak implementation for 7900 XTX (#10417)
* Add mmapeak implementation for 7900 XTX

* Change identation

* Use a template instead of multiple assebly files

* Fix output formatting

* Reduce register file bank conflicts

* More accurate measurement for quick instructions

* Add support for gfx1201

* RDNA4 wmma requires less VGRPs

* RDNA4 does not have s_cmpk instructions

* Add v_wmma_i32_16x16x32_iu4 for gfx1201

* Add sparse wmma instructions

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-23 16:26:12 -07:00
George Hotz
0a313d98a0 add rocm 6.4 support (#10491)
* add rocm 6.4 support

* update to newer amdcomgr, assert lang is right

* fix aux-triple
2025-05-23 16:20:54 -07:00
wozeparrot
a18963d9e7 feat: use tinygrad useragent (#10488) 2025-05-23 15:44:40 -07:00
George Hotz
0ebd440872 add mselect op (#10453)
* add mselect op

* more work

* that shouldn't be contiguous

* remove junk

* it segfaults...

* more correct

* test fail

* inserting a contiguous fixes it

* fix children in mselect

* complain

* error

* push RESHAPE through MSELECT

* no copy arg, use mselect
2025-05-23 14:11:37 -07:00
George Hotz
bf2a0907be gate the mockdsp behind MOCKDSP=1 [pr] (#10486) 2025-05-23 11:44:02 -07:00
uuuvn
3ca5680920 Test remote in benchmark (#10304)
hlb cifar is fast so added it, can add bert too if you think it's ok

6 real gpus to test multigraph and transfers + accuracy validation

should probably be added to tinystats too, i don't know how though

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-23 12:12:57 -04:00
qazal
7a762f01ab s/shape_spec/ast_spec [pr] (#10485) 2025-05-23 15:43:54 +03:00
qazal
127a7c8aee assert AST views only exist in the edges (#10484)
* assert AST views only exist in the edges

* valid without device
2025-05-23 15:27:09 +03:00
qazal
e491168685 add metadata note + whitespace fixup [pr] (#10483)
* add metadata note + whitespace fixup [pr]

* TestSchedule.test_kernelize_diamond
2025-05-23 14:37:45 +03:00
chenyu
c5acb4e06e run mlperf resnet daily (#10482)
Runs at 08:05 UTC (12:05 AM Pacific Time)
2025-05-23 07:16:20 -04:00
Sieds Lykles
ce6ebfb8ee verify rewrites in test_uop_symbolic (#10430)
* verify rewrites in test_uop_symbolic

* use global context
2025-05-23 06:57:29 -04:00
qazal
52e8b69d98 create_kernels only matches on GBARRIER and ASSIGN [pr] (#10480) 2025-05-23 11:11:57 +03:00
George Hotz
1e4d63e06e uops can have multiple metadata (#10479)
* uops can have multiple metadata

* fixups
2025-05-22 21:35:02 -07:00
George Hotz
283586bb96 insert GBARRIER into graph (#10468)
* insert contiguous into graph

* exclude contiguous from kernels

* and copy

* not needed on copy

* gbarrier

* gbarrier closer

* gb

* gb

* fix double realize logic bug

* remove gbarrier

* del that

* uop tags

* tag

* fix setitem, flaky

* no ctx there

* flip rewrite

* revert order until metadata is fixed
2025-05-22 20:53:36 -07:00
George Hotz
d2bb50d75b graph_rewrite_map in the other order [pr] (#10476)
* graph_rewrite_map in the other order [pr]

* reversed to preserve behavior
2025-05-22 20:22:07 -07:00
George Hotz
9fc01c1e03 support for uop tags (#10477)
* support for uop tags [pr]

* test uop tags
2025-05-22 19:53:48 -07:00
chenyu
8cc2dff4d8 only float Tensors have gradient [pr] (#10475) 2025-05-22 21:02:11 -04:00
George Hotz
147f7747f2 remove the map from create_schedule_with_vars [pr] (#10472) 2025-05-22 15:58:25 -07:00
George Hotz
6d5f87a18a lshift/rshift reverse is broken [pr] (#10467) 2025-05-22 13:01:48 -07:00
Mike Ashcroft
209d4401f8 Merge SimpleMathTrait and MathTrait (#10463) 2025-05-22 11:47:22 -07:00
George Hotz
0d39bb5de1 rename to get_kernelize_map (#10465) 2025-05-22 11:44:44 -07:00
Xingyu
1e0a59aca4 fix: handle buffer size calculation in to_movement_ops and add scalar assignment test in torch_backend (#10464) 2025-05-22 10:54:13 -07:00
George Hotz
577a0b4cfa openpilot compile4 (wip) (#10407)
* openpilot compile4

* add copies

* remove junk
2025-05-22 10:47:34 -07:00
George Hotz
ab591fa4dd make schedule explicit about kernels [pr] (#10462) 2025-05-22 09:32:16 -07:00
George Hotz
c46edbf262 hotfix: add note to relu 2025-05-22 09:13:38 -07:00
George Hotz
c6cbf0145a check that arg on copy is only used on multi [pr] (#10461) 2025-05-22 09:08:43 -07:00
Ignacio Sica
f69722dc2a refactor cuda disassemble (#10449) 2025-05-22 08:58:24 -07:00
qazal
5c4cfbc22c remove merge_views from kernel grouping rewrite [pr] (#10457) 2025-05-22 18:36:54 +03:00
nimlgen
035dffb00c nv: refactor qmd from ctypes (#10459)
* nv: refactor qmd from ctypes

* shorter

* imports

* x

* fix prefetch
2025-05-22 17:20:11 +03:00
wozeparrot
12285e926a fix: apply ip version fixes during AMDIP creation (#10454) 2025-05-22 10:14:48 +03:00
Ignacio Sica
5e6b96a1be align 16 in ptx, metal, cuda and amd (#10450) 2025-05-21 14:38:54 -07:00
nimlgen
570cb89652 amd: handle all exceptions (#10448)
* amd: handle all exceptions

* linter
2025-05-21 16:51:44 +03:00
nimlgen
475a7583b3 usbgpu: tiny changes (#10445) 2025-05-21 16:20:35 +03:00
qazal
7720c1aef1 hotfix: remove viz_sz.py [pr] (#10446) 2025-05-21 14:17:42 +03:00
chenyu
7bfb20757c fix tensor int floor div (#10327)
* fix tensor int floor div

* test_float_floordiv_scalar
2025-05-21 06:46:54 -04:00
Sieds Lykles
2b4375f36d Correct divmod folding behind flag (#10433)
* add flag

* add test

* remove import
2025-05-21 06:46:13 -04:00
qazal
df4cbb69e9 move fuzz_schedule.py to extra [pr] (#10444) 2025-05-21 10:07:24 +03:00