Commit Graph

  • d62c733b3f more correct and smaller George Hotz 2025-11-07 15:44:19 -08:00
  • a6e1cc3c65 just syntax George Hotz 2025-11-07 15:40:38 -08:00
  • 6f2dd96df9 make _pool simpler George Hotz 2025-11-07 15:33:55 -08:00
  • 70054cdb14 move backward cast to broadcasted, expand to mixins (#13156) George Hotz 2025-11-07 15:07:47 -08:00
  • f2519ea0ba shrink_to mixin (#13155) George Hotz 2025-11-07 11:46:24 -08:00
  • 0f9d7f650d whisper: fix oob, explicit dtype (#13144) C T 2025-11-07 19:55:01 +02:00
  • 3ecff3a8da Fix dim splitting bug for len(dim) == len(limited) case (#13142) Ahmed Harmouche 2025-11-07 18:31:06 +01:00
  • b8e48effcb device: no compilers message with reasons (#13146) nimlgen 2025-11-07 23:01:45 +08:00
  • 35e461ef69 hcq: use exception group (#12616) nimlgen 2025-11-07 21:23:12 +08:00
  • 10dc8335d2 tinygpu: fix teardown crash (#13143) nimlgen 2025-11-07 19:52:54 +08:00
  • d4a216d7d9 viz: display compiler errors (#13141) qazal 2025-11-07 18:09:50 +08:00
  • 7e94369464 add helper for test_timing custom ops (#13140) qazal 2025-11-07 17:13:55 +08:00
  • 95620426d5 tinygpu: unmap dma when client closed (#13129) nimlgen 2025-11-07 16:08:43 +08:00
  • 500d7661fa feat: show range len on index in viz (#13139) wozeparrot 2025-11-06 23:21:27 -08:00
  • bb6364d7c7 tuplize from linearizer behind flag (#13136) George Hotz 2025-11-06 20:15:03 -08:00
  • bb8cf948f2 variation of (x%c)+(x//c)*c = x (#13135) chenyu 2025-11-06 18:53:28 -05:00
  • 42b34cf83d bottom up linearizer (#13133) George Hotz 2025-11-06 15:30:32 -08:00
  • e0d828dba8 little cleanups George Hotz 2025-11-06 13:58:19 -08:00
  • bfb0c0391f test custom eye function (#13134) chenyu 2025-11-06 14:51:55 -05:00
  • f215f84241 use end range count in priority er_prioity George Hotz 2025-11-06 10:17:35 -08:00
  • 290441dd44 do loads early (#13131) George Hotz 2025-11-06 09:57:09 -08:00
  • 097264853d very simple priority (#13130) George Hotz 2025-11-06 09:25:28 -08:00
  • 07b415e831 fixup op order (#13128) George Hotz 2025-11-06 08:50:04 -08:00
  • 6809ff8fe1 simplify priority simple_priority George Hotz 2025-11-06 07:57:59 -08:00
  • b9b68bf437 amd: add kern to sqtt event (#13126) nimlgen 2025-11-06 22:02:02 +08:00
  • 88245d6579 qol improvements to sqtt decoder and timing tests (#13125) qazal 2025-11-06 20:51:30 +08:00
  • dafdb4bfb1 test hcq open with pytest (#13124) nimlgen 2025-11-06 20:09:51 +08:00
  • 05e2ff4d87 system: fix flock on pcidevs (#13123) nimlgen 2025-11-06 19:02:13 +08:00
  • 3126c89b84 viz: visible horizontal scrollbar in long texts (#13122) qazal 2025-11-06 17:23:02 +08:00
  • 91cc773397 add run count to toposort (#13119) George Hotz 2025-11-05 22:29:34 -08:00
  • dca7fb0a49 qcom: make priority configurable (#13120) Adeeb Shihadeh 2025-11-05 22:27:54 -08:00
  • b2bb3af12a make range_color work in VIZ (#13121) qazal 2025-11-06 14:26:48 +08:00
  • f33c182393 test custom qkv kernel (#13118) chenyu 2025-11-05 23:32:13 -05:00
  • c65e6d8887 add ranges to print_uops (#13116) George Hotz 2025-11-05 20:26:56 -08:00
  • 9b2b535fa4 fix issue with multi flip (#13115) George Hotz 2025-11-05 15:28:50 -08:00
  • 4027eef264 fix test warnings (#13114) George Hotz 2025-11-05 15:06:29 -08:00
  • bcfe42937f move permute/flip/shrink to mixins (#13113) George Hotz 2025-11-05 14:14:15 -08:00
  • 2d4f01fda0 move mixins to mixin dir (#13105) George Hotz 2025-11-05 10:18:33 -08:00
  • 52f0081e77 use where instead of mul in Embedding (#13112) chenyu 2025-11-05 12:49:01 -05:00
  • edc4e1aede ignore trailing nops in llvm-objdump output (#13110) b1tg 2025-11-06 01:10:51 +08:00
  • 03ee0cfe45 minor fast_idiv cleanup [pr] (#13109) chenyu 2025-11-05 11:44:36 -05:00
  • 18d4ecc1f3 lower nv test_gemm_4096 target (#13107) chenyu 2025-11-05 11:05:16 -05:00
  • eff80beeed amd: props in device not sqtt (#13106) nimlgen 2025-11-05 23:43:20 +08:00
  • 757ceab2a2 system: allow using vidmem for uc mem (#13104) nimlgen 2025-11-05 19:12:59 +08:00
  • 8119d9f082 sqtt: decode each instruction exec (#13093) qazal 2025-11-05 17:30:27 +08:00
  • 54141e9cb9 DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096) chenyu 2025-11-04 11:28:18 -05:00
  • 1c9f720654 remove unused type ignore [pr] (#13095) chenyu 2025-11-04 10:08:07 -05:00
  • c857dc5af0 autogen: try/except in try_dlopen (#13094) nimlgen 2025-11-04 22:51:53 +08:00
  • eaf7cbc178 amd: flush sqtt after each kernel (#13092) nimlgen 2025-11-04 22:12:48 +08:00
  • 96417665e8 show sqtt decoder errs in viz (#13088) qazal 2025-11-04 22:05:06 +08:00
  • 49191ada77 roc: install sqtt decoder (#13091) nimlgen 2025-11-04 18:56:01 +08:00
  • 16f1f644ba amd: remove sqtt=2 (#13090) nimlgen 2025-11-04 18:29:24 +08:00
  • 2e97eaa866 roc: no nullptr when no wave instructions (#13087) nimlgen 2025-11-04 17:32:14 +08:00
  • 9c00c0688a tk fa: use 16x64 tiles (#13086) wozeparrot 2025-11-03 18:25:38 -08:00
  • 4ed0f216b5 fix: make max_matmul run again (#13085) wozeparrot 2025-11-03 18:09:09 -08:00
  • 4cde0d87d9 this works no_sip_usbgpu George Hotz 2025-11-03 16:55:43 -08:00
  • 56825543e9 Merge branch 'master' into no_sip_usbgpu George Hotz 2025-11-04 08:52:03 +08:00
  • ca17718b6d remove symbolic_flat (#13083) chenyu 2025-11-03 17:25:21 -05:00
  • fda720e013 simpler _is_balanced [pr] (#13082) chenyu 2025-11-03 16:47:14 -05:00
  • ddf01fdb15 revert mlperf.yml setting (#13080) chenyu 2025-11-03 15:24:13 -05:00
  • 6df34a5887 lint sqtt parser with mypy (#13079) qazal 2025-11-04 00:53:59 +08:00
  • 2d2040bc92 viz: tabulate sqtt (#13078) qazal 2025-11-04 00:03:15 +08:00
  • dfde3f54d9 rocprof: use llvm disasm (#13077) nimlgen 2025-11-03 23:58:58 +08:00
  • 27d42fd575 sqtt decoder print behind DEBUG>=5 (#13076) qazal 2025-11-03 23:20:03 +08:00
  • 416b15cc59 improve uop matmul syntax (#13074) George Hotz 2025-11-03 21:34:26 +08:00
  • 08855c162b amd: correct sqtt_read for several xccs (#13075) nimlgen 2025-11-03 19:59:56 +08:00
  • 1c0d4f1cd2 viz: counters loader (#12987) qazal 2025-11-03 19:42:36 +08:00
  • 1e3d6e49a6 index slicing + allclose (#13071) George Hotz 2025-11-03 13:01:48 +08:00
  • 6c7a12f21c Revert "slicing + allclose" George Hotz 2025-11-03 12:05:44 +08:00
  • c9a1e35b1e slicing + allclose George Hotz 2025-11-03 12:00:45 +08:00
  • a317d6e625 extra/amdpci/setup_python_cap.sh (#13070) chenyu 2025-11-02 19:19:36 -05:00
  • ad501ce50a mlperf cron install tqdm (#13069) chenyu 2025-11-02 18:09:27 -05:00
  • 2c8d619147 mlperf cron install influxdb3-python (#13068) chenyu 2025-11-02 17:55:40 -05:00
  • 4c22f089fc mlperf cron install tensorflow try 2 (#13067) chenyu 2025-11-02 17:11:01 -05:00
  • c58cf91850 mlperf cron install tensorflow (#13066) chenyu 2025-11-02 16:48:05 -05:00
  • 74db65cf72 update mlperf bert LOGMLPERF (#13065) chenyu 2025-11-02 15:26:37 -05:00
  • b18293de96 train bert in mlperf cron (#13064) chenyu 2025-11-02 15:04:02 -05:00
  • be0028d3ce amd: universal set_grbm (#13062) nimlgen 2025-11-03 03:35:55 +08:00
  • 37a730abce amd: fix pmc sq gfx11+ (#13058) nimlgen 2025-11-02 21:56:47 +08:00
  • 24054bb655 viz: check overlay width after layout (#13060) qazal 2025-11-02 21:47:58 +08:00
  • 962d980919 fuse hasn't worked since rangeify, remove it (#13057) George Hotz 2025-11-02 14:01:52 +08:00
  • 036ee9f84c Self type + mixins (#13056) George Hotz 2025-11-02 13:30:01 +08:00
  • 6ffd33e1e5 fix later self_type George Hotz 2025-11-02 13:18:41 +08:00
  • 4198efb8bc mixin George Hotz 2025-11-02 13:05:26 +08:00
  • 13e8914deb use Self type George Hotz 2025-11-02 13:00:35 +08:00
  • 8cbef912d2 move reshape to MathTraits (#13054) George Hotz 2025-11-02 12:56:15 +08:00
  • 1ff341bae5 python 3.11 is now required (#13055) George Hotz 2025-11-02 12:55:40 +08:00
  • 267be7fc5e fp16 acc George Hotz 2025-11-02 12:53:04 +08:00
  • 1255eeec6d confirm it works in amd_uop_matmul reshape_trait George Hotz 2025-11-02 12:48:39 +08:00
  • a1f88fea37 move reshape to MathTraits George Hotz 2025-11-02 12:39:56 +08:00
  • 8206eab4fc fix: tk fa 4 workers (#13052) wozeparrot 2025-11-01 16:41:29 -07:00
  • 885b6dea9e multiple reduce range arange folding (#13047) Sieds Lykles 2025-11-01 22:11:26 +01:00
  • f97fb703c8 catch group error in matvec heuristic (#13051) Sieds Lykles 2025-11-01 22:09:35 +01:00
  • ecb8565f67 Revert "Better cleanup of arange bufferize (#13046)" (#13048) Sieds Lykles 2025-11-01 18:09:37 +01:00
  • 69ae7c2b3c Revert "Better cleanup of arange bufferize (#13046)" revert-13046-better_cleanup_arange_buffers Sieds Lykles 2025-11-01 18:07:40 +01:00
  • c99b7dfd4a Better cleanup of arange bufferize (#13046) Sieds Lykles 2025-11-01 16:16:31 +01:00
  • 051aab5481 open viz with sqtt flags (#13001) nimlgen 2025-11-01 22:48:17 +08:00
  • 2db57f3a97 amd: better msg when out of perf regs (#13042) nimlgen 2025-11-01 22:47:50 +08:00
  • bebec73471 write custom_sum with set and after (#13045) chenyu 2025-11-01 10:45:30 -04:00
  • e98506735b add CONTRACT support to UOp programs (#13043) George Hotz 2025-11-01 19:11:32 +08:00