Commit Graph

8985 Commits

Author SHA1 Message Date
George Hotz
b3b43a82c4 remove Tensor.no_grad, it's meaningless now [pr] (#10556) 2025-05-28 22:20:02 -07:00
George Hotz
e4e7b5d7e1 continue work on beautiful cifar (#10555) 2025-05-28 21:42:01 -07:00
George Hotz
e140f8f0d8 linearizer test_failure_61 (#10552)
* enumerate cases of Tensors in the JIT

* optional fused optimizers

* add fused optimizer test

* move that there

* ugh

* work on beautiful_cifar

* speed close to hlb_cifar

* test_failure_61

* just the failure
2025-05-28 21:30:50 -07:00
George Hotz
871df1436a more beautiful cifar (#10551)
* enumerate cases of Tensors in the JIT

* optional fused optimizers

* add fused optimizer test

* move that there

* ugh

* work on beautiful_cifar

* speed close to hlb_cifar

* schedule to corealize all

* one line sched step

* less lines
2025-05-28 20:48:20 -07:00
George Hotz
ee12e801a3 optional fused optimizers (#10549)
* enumerate cases of Tensors in the JIT

* optional fused optimizers

* add fused optimizer test

* move that there

* ugh
2025-05-28 13:50:30 -07:00
Sieds Lykles
ae02a1e232 [bounty] Z3 symbolic fuzzer [pr] (#10514)
* First version, caught a bug?

* Nicely print failure to reproduce

* Remove that

* Put the assert back

* Change fuzzing to use testing_unit so it has z3

* Test key to match

* Add rule

* Add test

* Add test for edge case 0

* Merge patterns

* update comment

* consistent whitespace

* whitespace

* add condition

* add test

* update comment

* use Variable

* fuzzer using z3_renderer

* Cleaned up printing and debugging

* working new fuzzer

* change some comments and printing

* more formatting

* fuzz failures in seperate file

* fix fstring

* more tests

* naming

* remove added line

* remove comment

* print number of skipped expressions

* use self.assertEqual

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-28 16:28:37 -04:00
chenyu
74cf5dbd9e mlperf system updates (#10550)
standardized processor and accelerator names
2025-05-28 16:15:46 -04:00
George Hotz
98f3d1c26d enumerate cases of Tensors in the JIT (#10548) 2025-05-28 11:51:27 -07:00
nimlgen
d1d9e729fd am_smi: mem usage (#10547) 2025-05-28 16:53:31 +03:00
chenyu
23e41f523a sdxl also run with cached search (#10546) 2025-05-28 06:51:56 -04:00
chenyu
fffdc4d31c workflow to run sdxl with search (#10543) 2025-05-27 17:25:41 -04:00
qazal
d1f0043331 use store_val helper in test_schedule asserts [pr] (#10540) 2025-05-27 21:48:06 +03:00
George Hotz
5b268121d4 remove becomes map (#10533)
* remove becomes map

* add comment and delete dead code

* multi is a view
2025-05-27 11:47:11 -07:00
qazal
271110bb5a s/src[0]/buf in lowerer.py [pr] (#10539) 2025-05-27 21:08:54 +03:00
qazal
f0042629d1 replace arg in merge_view [pr] (#10537) 2025-05-27 21:00:24 +03:00
qazal
617ecc1a7b fixup grouper process replay [pr] (#10538) 2025-05-27 20:46:55 +03:00
George Hotz
a07caaca0d handle stride 0 variable reshape (#10536) 2025-05-27 10:00:24 -07:00
George Hotz
0515622d95 the schedule graph is the tensor graph (#10534)
* the schedule graph is the tensor graph

* gate type_verify on debug

* relax that spec

* unmasked check is okay
2025-05-27 09:23:57 -07:00
qazal
142f6ba873 move merge_views to grouper swizzler [pr] (#10531) 2025-05-27 16:33:26 +03:00
qazal
c03e9c8995 fixup typing for @diskcache [pr] (#10529) 2025-05-27 15:30:49 +03:00
George Hotz
41e3d07d7f view gradient is tricky (#10528)
* view gradient is tricky

* explicit
2025-05-26 22:28:30 -07:00
George Hotz
ab4ca5da29 remove gradient nonsense [pr] (#10527)
* remove gradient nonsense [pr]

* grads to base
2025-05-26 19:09:59 -07:00
chenyu
76eb130d8c hotfix: BenchEvent MLPERF_RUN is mlperf_run (#10526) 2025-05-26 20:19:37 -04:00
uuuvn
c29c46853f Very basic mock sqtt (#10512)
This mockgpu sqtt emulation will just ignore basically everything and end
up with a 0x1000 size trace full of zeroes, but just testing for things
like register rename is better than nothing i guess
2025-05-26 14:38:28 -07:00
chenyu
51dc7eedb0 correct use AM for resnet run_and_time (#10524) 2025-05-26 15:33:11 -04:00
chenyu
c1919ad55f use AM for resnet run_and_time (#10523) 2025-05-26 14:50:49 -04:00
George Hotz
e9bb2052cf hotfix: update readme 2025-05-26 10:28:16 -07:00
qazal
6d07087fe1 remove contiguous from MSELECT 2 (#10522)
* remove contiguous from MSELECT

* test_shrink_on_shard_axis

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-05-26 19:19:01 +03:00
geohotstan
602a145f8f Add Tensor.unfold (#10518)
* yoinked 10272

* eitanturok's fixes

* hmmm should size be sint?

* add test
2025-05-26 11:15:44 -04:00
qazal
9169dcfb49 do not create kernels with more inputs than the backend allows (#10510)
* work

* no itertools + top down pass

* clean viz

* python can do that

* webgpu

* gbarrier of gbarrier is gbarrier

* device can be tuple

* bug in toposort

* failing test for gated toposort

* contiguous of gbarrier is gbarrier

* check for binops

* Revert "check for binops"

This reverts commit 53e3cdf720.

* viz + match on gbarrier, self exists by default

* alt

* green now

* cleanup
2025-05-26 18:02:03 +03:00
nimlgen
deb369417c am_smi: print device usage (#10520)
* am_smi: print device usage

* tiny comments
2025-05-26 17:17:56 +03:00
chenyu
2d50efb92b set -e on mlperf run_and_time scripts (#10519) 2025-05-26 09:22:30 -04:00
Sieds Lykles
478c76f4b7 More div conditions (#10432)
* add condition

* add test

* use Variable
2025-05-26 07:36:05 -04:00
Sieds Lykles
c6c7882bdf bugfix: seperate rule for x//d<-c (#10148)
* Add rule

* Add test

* Add test for edge case 0

* Merge patterns

* update comment

* consistent whitespace

* whitespace

* update comment
2025-05-26 07:35:41 -04:00
chenyu
2eeea373af add BENCHMARK_LOG for mlperf resnet cron (#10516) 2025-05-25 22:00:29 -04:00
b1tg
a1f64af92d ci: setup llvm for amdremote (#10507)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-05-25 21:52:27 -04:00
geohotstan
fd9f236a82 move test over (#10508) 2025-05-25 21:51:51 -04:00
wozeparrot
7c81f9f95e fix: gate mlperf workflow (#10515) 2025-05-25 17:06:21 -07:00
Panagiotis Kourouklidis
4941486cb0 Add method to return field masks for AMDReg (#10511) 2025-05-25 14:47:20 -07:00
nimlgen
88c5864bf3 nv: do not hardcode sass version (#10513) 2025-05-25 22:41:15 +03:00
George Hotz
941cbd3471 hotfix: amd works on arch linux w/o rocm 2025-05-24 16:47:13 -07:00
nimlgen
d90ddcc365 nv: blackwell support (#10487)
* nv: blackwell support

* fixes

* hm

* h

* fixes

* mypy

* xx

* yy

* arr

* revert

* oops

* unrelated
2025-05-24 18:23:53 +03:00
chenyu
dc6309242d WallTimeEvent for mlperf ci (#10506) 2025-05-24 10:56:03 -04:00
qazal
dd5601af68 readable COPY(VIEW) reordering [pr] (#10505)
* readable COPY(VIEW) reordering [pr]

* assert that

* spec

* resolve

* Revert "resolve"

This reverts commit f5629fbef8.

* arg
2025-05-24 17:08:58 +03:00
Ahmed Harmouche
bbb6deff53 Increase op limit in test_index_mnist to pass on webgpu (#10504)
* Increase op limit to enable  mnist indexing on webgpu

* Only relax op_limit on WebGPU
2025-05-24 09:37:31 -04:00
nimlgen
c472ab636c nv: use regcount from meta (#10503) 2025-05-24 14:14:33 +03:00
qazal
82b444796d fix display of kernel args in viz [pr] (#10502) 2025-05-24 14:09:52 +03:00
qazal
a9d0bf5c4c proper error for device mismatch (#10500)
* failing test

* use bufs

* buf_uop

* not on cpu
2025-05-24 12:17:41 +03:00
qazal
fc1300f5e3 top down create_kernels + delete "replace assign sources" (#10478)
* rebase from #10468

* fixup metadata 2

* that too

* comments for metadata

* remove_gbarrier is not needed anymore

* skip that

* break metadata more

* delete more metadata fixups

* err, fix kernelize diamond

* unskip metadata

* new_map

* roots

* replace metadata of roots

* check empty

* replace globals is better
2025-05-24 09:50:06 +03:00
George Hotz
9eee5ae276 its copying the dataset every time (#10498)
* its copying the dataset every time

* add comment

* expect failure

* todo
2025-05-23 21:25:53 -07:00