chenyu
a3dae51085
lower test_gemm_8192 on red ( #10883 )
2025-06-19 10:01:25 -04:00
wozeparrot
eb739bb96a
hotfix: lower threshold ( #10786 )
2025-06-11 19:36:20 -04:00
George Hotz
b06291077c
no amdgpu kernel driver ( #10408 )
...
* no amdgpu kernel driver
* don't test hip
* lower req
2025-05-18 20:52:39 -07:00
George Hotz
427471550a
hotfix: amd tflops to 74 and some external_benchmark_sdxl_softmax stuff
2025-04-29 09:02:27 -04:00
George Hotz
d1f6701eb7
hotfix: lower amd threshold + improve block reorder test
2025-04-22 20:44:29 +01:00
nimlgen
9bd13de44c
lower test_gemv_4096_16384 to 750 for red ( #9367 )
2025-03-05 22:44:48 +03:00
chenyu
2cb2fce8d9
lower test_gemm_8192 amd_tflops to 65 ( #9364 )
2025-03-05 14:06:11 -05:00
chenyu
4342300eff
lower test_gemm_8192 amd to 70 ( #9277 )
...
flaky
2025-02-26 16:32:08 -05:00
chenyu
0513b0c17d
lower green test_gemm_8192 tflops to 125 [pr] ( #8820 )
...
flaky
2025-01-30 17:30:08 -05:00
George Hotz
d19c1c7f03
bump 75 -> 73 for test failure
2025-01-13 09:18:38 -08:00
chenyu
6a7f971fa0
hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] ( #8553 )
2025-01-10 12:57:44 -05:00
chenyu
9789a83064
hotfix DEBUG in speed_v_theoretical.py conv ( #8266 )
...
infinite loop with manual DEBUG set `DEBUG=2 python test/external/speed_v_theoretical.py -k conv`
```
File "/Users/chenyu/code/tinygrad/tinygrad/helpers.py", line 95, in __ge__
def __ge__(self, x): return self.value >= x
^^^^^^^^^^^^^^^
[Previous line repeated 4984 more times]
RecursionError: maximum recursion depth exceeded in comparison
```
2024-12-15 19:44:45 -05:00
chenyu
62e19649c0
lower test_conv_3x3_256_32_32_256_256 ( #8226 )
...
tiny7 is slow
2024-12-13 17:15:53 -05:00
chenyu
155f7df599
lower test_gemm_4096 expectation on green ( #8152 )
...
getting 119 sometimes, so lowered to 115
2024-12-10 18:05:12 -05:00
chenyu
5c6ed5dba6
lower test_conv_3x3_256_32_32_256_256 expectation ( #8060 )
...
failed https://github.com/tinygrad/tinygrad/actions/runs/12182799887/job/33982676812#step:9:210
2024-12-05 10:30:56 -05:00
George Hotz
20878be2af
lower test_gemv_4096_16384 expectations
2024-12-05 12:08:26 +08:00
chenyu
0693158d28
lower v_theoretical gemv on red ( #8042 )
...
tiny7 is still slower https://github.com/tinygrad/tinygrad/actions/runs/12166149038/job/33931736130#step:8:209
2024-12-04 13:59:40 -05:00
George Hotz
08657cb7b0
hotfix: bump expectations in speed_v_theoretical
2024-12-04 19:00:33 +08:00
George Hotz
ea65c79ba2
hotfix: don't spam BEAM debug in speed_v_theoretical
2024-12-04 18:47:16 +08:00
George Hotz
09b00b1b04
hotfix: use kernel timings instead of python timings in speed_v_theoretical
2024-12-04 18:36:17 +08:00
qazal
b797aee720
uop global buf number tracking try 2 [pr] ( #7912 )
...
* uop buffer init small refactor [pr]
* add early
* this way it doesn't need late
* buffer_num
* itertools.count
* count from 0
* down to 380
2024-12-02 14:45:17 +08:00
George Hotz
cbcc1c20eb
second try at block linearize ( #7892 )
...
* second try at block linearize
* weeee, works for lil matmul
* it's so beautiful
* test tiny passes
* fix bugs
* combine matching BLOCKENDS
* wrapping
* test lin failures passes
* those failures were fake
* flip sort order
* fix ptx tests
* deal with store better
* dumb ptx fix
* expect less
* reduce lines
* reduce lines
* less lines and cleaner
* no defaultdict
* tighter
* simpler block_parent_count
2024-12-02 13:43:09 +08:00
George Hotz
6c1efb9a72
hotfix: amd gemv was flaky
2024-12-02 11:08:24 +08:00
chenyu
bb23469f93
lower conv threshold on red ( #7948 )
2024-11-28 13:31:06 -05:00
chenyu
f54508549f
don't search conv weight init in speed_v_theoretical ( #7943 )
2024-11-28 10:03:18 -05:00
chenyu
5c5b1b994c
less flaky benchmarks ( #7855 )
...
JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830
2024-11-22 16:39:39 -05:00
chenyu
11cea00090
lower vs_theoretical conv tflops threshold for nv ( #7811 )
...
less flaky
2024-11-20 20:03:49 -05:00
chenyu
1884f021e3
add conv3x3 to speed_v_theoretical ( #7658 )
...
* add conv3x3 to speed_v_theoretical
* show test duration
2024-11-12 16:41:56 -05:00
chenyu
962dafb467
use randn in speed_v_theoretical instead of rand ( #7656 )
...
* use randn in speed_v_theoretical instead of rand
this made green gemv 20% faster... but why?
* update threshold
2024-11-12 15:00:32 -05:00
chenyu
6159790ab8
add gemv to speed_v_theoretical ( #7654 )
...
* add gemv to speed_v_theoretical
getting ~300GB/s if we just count the memory of inputs and output
* better green numbers
* flip
2024-11-12 11:19:35 -05:00
chenyu
99f29e50b2
update speed_v_theoretical numbers ( #7647 )
...
better amd after set compute profile
2024-11-11 20:05:13 -05:00
chenyu
773d5b60bf
beam benchmark tests ( #7638 )
...
* beam benchmark tests
* lower AMD number somehow
* less flaky
2024-11-11 18:11:18 -05:00