Commit Graph

35 Commits

Author SHA1 Message Date
chenyu
18d4ecc1f3 lower nv test_gemm_4096 target (#13107) 2025-11-05 11:05:16 -05:00
chenyu
54141e9cb9 DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096) 2025-11-04 11:28:18 -05:00
George Hotz
d59d4cdbe4 lil less is okay 2025-10-21 17:09:44 +08:00
chenyu
a3dae51085 lower test_gemm_8192 on red (#10883) 2025-06-19 10:01:25 -04:00
wozeparrot
eb739bb96a hotfix: lower threshold (#10786) 2025-06-11 19:36:20 -04:00
George Hotz
b06291077c no amdgpu kernel driver (#10408)
* no amdgpu kernel driver

* don't test hip

* lower req
2025-05-18 20:52:39 -07:00
George Hotz
427471550a hotfix: amd tflops to 74 and some external_benchmark_sdxl_softmax stuff 2025-04-29 09:02:27 -04:00
George Hotz
d1f6701eb7 hotfix: lower amd threshold + improve block reorder test 2025-04-22 20:44:29 +01:00
nimlgen
9bd13de44c lower test_gemv_4096_16384 to 750 for red (#9367) 2025-03-05 22:44:48 +03:00
chenyu
2cb2fce8d9 lower test_gemm_8192 amd_tflops to 65 (#9364) 2025-03-05 14:06:11 -05:00
chenyu
4342300eff lower test_gemm_8192 amd to 70 (#9277)
flaky
2025-02-26 16:32:08 -05:00
chenyu
0513b0c17d lower green test_gemm_8192 tflops to 125 [pr] (#8820)
flaky
2025-01-30 17:30:08 -05:00
George Hotz
d19c1c7f03 bump 75 -> 73 for test failure 2025-01-13 09:18:38 -08:00
chenyu
6a7f971fa0 hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] (#8553) 2025-01-10 12:57:44 -05:00
chenyu
9789a83064 hotfix DEBUG in speed_v_theoretical.py conv (#8266)
infinite loop with manual DEBUG set `DEBUG=2 python test/external/speed_v_theoretical.py -k conv`

```
  File "/Users/chenyu/code/tinygrad/tinygrad/helpers.py", line 95, in __ge__
    def __ge__(self, x): return self.value >= x
                                ^^^^^^^^^^^^^^^
  [Previous line repeated 4984 more times]
RecursionError: maximum recursion depth exceeded in comparison
```
2024-12-15 19:44:45 -05:00
chenyu
62e19649c0 lower test_conv_3x3_256_32_32_256_256 (#8226)
tiny7 is slow
2024-12-13 17:15:53 -05:00
chenyu
155f7df599 lower test_gemm_4096 expectation on green (#8152)
getting 119 sometimes, so lowered to 115
2024-12-10 18:05:12 -05:00
chenyu
5c6ed5dba6 lower test_conv_3x3_256_32_32_256_256 expectation (#8060)
failed https://github.com/tinygrad/tinygrad/actions/runs/12182799887/job/33982676812#step:9:210
2024-12-05 10:30:56 -05:00
George Hotz
20878be2af lower test_gemv_4096_16384 expectations 2024-12-05 12:08:26 +08:00
chenyu
0693158d28 lower v_theoretical gemv on red (#8042)
tiny7 is still slower https://github.com/tinygrad/tinygrad/actions/runs/12166149038/job/33931736130#step:8:209
2024-12-04 13:59:40 -05:00
George Hotz
08657cb7b0 hotfix: bump expectations in speed_v_theoretical 2024-12-04 19:00:33 +08:00
George Hotz
ea65c79ba2 hotfix: don't spam BEAM debug in speed_v_theoretical 2024-12-04 18:47:16 +08:00
George Hotz
09b00b1b04 hotfix: use kernel timings instead of python timings in speed_v_theoretical 2024-12-04 18:36:17 +08:00
qazal
b797aee720 uop global buf number tracking try 2 [pr] (#7912)
* uop buffer init small refactor [pr]

* add early

* this way it doesn't need late

* buffer_num

* itertools.count

* count from 0

* down to 380
2024-12-02 14:45:17 +08:00
George Hotz
cbcc1c20eb second try at block linearize (#7892)
* second try at block linearize

* weeee, works for lil matmul

* it's so beautiful

* test tiny passes

* fix bugs

* combine matching BLOCKENDS

* wrapping

* test lin failures passes

* those failures were fake

* flip sort order

* fix ptx tests

* deal with store better

* dumb ptx fix

* expect less

* reduce lines

* reduce lines

* less lines and cleaner

* no defaultdict

* tighter

* simpler block_parent_count
2024-12-02 13:43:09 +08:00
George Hotz
6c1efb9a72 hotfix: amd gemv was flaky 2024-12-02 11:08:24 +08:00
chenyu
bb23469f93 lower conv threshold on red (#7948) 2024-11-28 13:31:06 -05:00
chenyu
f54508549f don't search conv weight init in speed_v_theoretical (#7943) 2024-11-28 10:03:18 -05:00
chenyu
5c5b1b994c less flaky benchmarks (#7855)
JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830
2024-11-22 16:39:39 -05:00
chenyu
11cea00090 lower vs_theoretical conv tflops threshold for nv (#7811)
less flaky
2024-11-20 20:03:49 -05:00
chenyu
1884f021e3 add conv3x3 to speed_v_theoretical (#7658)
* add conv3x3 to speed_v_theoretical

* show test duration
2024-11-12 16:41:56 -05:00
chenyu
962dafb467 use randn in speed_v_theoretical instead of rand (#7656)
* use randn in speed_v_theoretical instead of rand

this made green gemv 20% faster... but why?

* update threshold
2024-11-12 15:00:32 -05:00
chenyu
6159790ab8 add gemv to speed_v_theoretical (#7654)
* add gemv to speed_v_theoretical

getting ~300GB/s if we just count the memory of inputs and output

* better green numbers

* flip
2024-11-12 11:19:35 -05:00
chenyu
99f29e50b2 update speed_v_theoretical numbers (#7647)
better amd after set compute profile
2024-11-11 20:05:13 -05:00
chenyu
773d5b60bf beam benchmark tests (#7638)
* beam benchmark tests

* lower AMD number somehow

* less flaky
2024-11-11 18:11:18 -05:00