Commit Graph

  • 55845f7de7 schedule: cache unbinds for consistent cache keys (#13664) George Hotz 2025-12-12 17:27:42 -05:00
  • 27845353a0 add CLAUDE.md George Hotz 2025-12-12 16:50:11 -05:00
  • 8c87a0bf8d Revert "schedule: cache unbinds for consistent cache keys (#13662)" George Hotz 2025-12-12 16:49:50 -05:00
  • 443b7fea80 Revert "add notes about jit to claude.md" George Hotz 2025-12-12 16:49:48 -05:00
  • 429f82e6a9 add notes about jit to claude.md George Hotz 2025-12-12 16:48:23 -05:00
  • af86cae10c schedule: cache unbinds for consistent cache keys (#13662) George Hotz 2025-12-12 16:40:10 -05:00
  • fcaed1e1dd don't use empty in bert fake data (#13661) chenyu 2025-12-12 15:59:50 -05:00
  • 49f70088a8 qwen work llm_qwen George Hotz 2025-12-12 15:20:15 -05:00
  • 316da9f7ff llm: add created/model fields, non-streaming support, and tests (#13660) George Hotz 2025-12-12 14:50:36 -05:00
  • 9604773e45 add model choosing support to llm (#13656) George Hotz 2025-12-12 11:22:11 -05:00
  • e36385e570 am: support xgmi systems (#13659) nimlgen 2025-12-12 18:55:45 +03:00
  • b4796e2d32 amd: set queue prio to normal (#13658) nimlgen 2025-12-12 18:25:41 +03:00
  • a1de7787bf am: xcc/inst support (#13657) nimlgen 2025-12-12 17:40:42 +03:00
  • f0fa9bcd98 openai api for llm (#13648) George Hotz 2025-12-12 08:25:33 -05:00
  • 93ad1f7732 viz: readable pmc print, share unpacker with tests (#13655) qazal 2025-12-12 06:29:59 -05:00
  • 760e508c3a autogen: no deep walk (#13654) Christopher Milan 2025-12-11 22:04:35 -08:00
  • 8f60b8dd1e fix: cast on transpose (#13653) wozeparrot 2025-12-11 21:03:49 -08:00
  • 950d8de00e automatically inline anonymous (#13652) Christopher Milan 2025-12-11 21:02:44 -08:00
  • 01e9ad0d52 clean up bert next_data (#13650) chenyu 2025-12-11 22:56:28 -05:00
  • ab2220b834 Handle missing bfloat16 natives on CPU architectures (#13553) Jakob Sachs 2025-12-11 21:38:43 +01:00
  • cbae33003d ci: add usb4 (#13643) nimlgen 2025-12-11 19:41:41 +03:00
  • 03600aef1e failed test case when init jit with empty inputs (#13641) chenyu 2025-12-10 22:03:06 -05:00
  • 51f3c9f615 am: use va_base as base (#13640) nimlgen 2025-12-10 21:09:35 +03:00
  • 5034c6fb37 reenable FREE_INTERMEDIATE for bert (#13639) chenyu 2025-12-10 12:08:09 -05:00
  • be6d538351 viz: add kernel walltime to pmc scoreboard (#13638) qazal 2025-12-10 07:16:42 -05:00
  • 1666c4aaab viz: fix counter names ordering (#13637) qazal 2025-12-10 04:05:27 -05:00
  • c801bb7054 viz: show all kernel pmcs (#13635) qazal 2025-12-09 18:16:02 -05:00
  • 4854a0c02c fix: getattr returns AttributeError not ImportError when missing (#13633) wozeparrot 2025-12-09 14:26:54 -08:00
  • 016a59cafa remove contiguous and use where in EmbeddingBert (#13632) chenyu 2025-12-09 15:49:21 -05:00
  • ddecba300f amd: use getattr for autogen (#13630) nimlgen 2025-12-09 20:36:26 +03:00
  • 76d465dbc3 optim empty shard #13513 (#13598) Nino Risteski 2025-12-09 18:28:36 +01:00
  • 47a170be2e test: enable cummax scalar IndexError test (#13625) ayanhan 2025-12-10 02:25:56 +09:00
  • 9eae9dc3be regen smu_v13 with stdint (#13631) Christopher Milan 2025-12-09 09:20:01 -08:00
  • 7cd8852f60 autogen: do no return tuples (#13629) nimlgen 2025-12-09 20:08:13 +03:00
  • 9e484b5b1c hcq: check size is None, do not read the whole size for 0s (#13628) nimlgen 2025-12-09 19:37:44 +03:00
  • 1329033b8c am: fix hot-queue restarts, only dequeue (#13627) nimlgen 2025-12-09 19:37:21 +03:00
  • b07839493d proclogs with xccs (#13626) nimlgen 2025-12-09 16:46:08 +03:00
  • 2c333818f4 simplify UOp stringifier [pr] (#13618) qazal 2025-12-09 05:06:16 +08:00
  • 2471b49e45 minor bert / llama change from grad acc branch (#13622) chenyu 2025-12-08 16:04:14 -05:00
  • cb3d756547 NAK compile-only test (#13621) Christopher Milan 2025-12-08 12:53:46 -08:00
  • a4c3d48aa9 compile-only test for IR3 actually works (#13619) Christopher Milan 2025-12-08 12:07:49 -08:00
  • a17077d1d9 skip test_double_assign in CI LVP (#13620) Christopher Milan 2025-12-08 11:54:02 -08:00
  • 1c16b6e082 Mesa: freedreno (#12746) Christopher Milan 2025-12-08 11:02:08 -08:00
  • 947c6eefc3 add Swish op (#13541) Douglas Nyberg 2025-12-08 12:41:18 -05:00
  • dd8a1a10d4 amd: tiny cleanups (#13616) nimlgen 2025-12-08 13:15:56 +03:00
  • 2b07336c82 viz server cleanups (#13615) qazal 2025-12-08 17:44:43 +08:00
  • 89c4206e22 fix: typing (#13614) wozeparrot 2025-12-07 20:10:30 -08:00
  • 572dfd5506 add static amd program info to viz (#13594) qazal 2025-12-08 04:08:14 +08:00
  • 73093314bd viz: support list of sidebar info (#13612) qazal 2025-12-08 03:09:43 +08:00
  • b981b6f89e remove old llama grad_acc (#13611) chenyu 2025-12-07 13:03:47 -05:00
  • 94d7646bdc fix anonymous struct fields (#13610) Christopher Milan 2025-12-07 09:56:38 -08:00
  • dcd50baca4 amd/nv: cleanup (#13608) nimlgen 2025-12-07 17:05:26 +03:00
  • ac5f1e115d autogen: repro for the bug (#13607) nimlgen 2025-12-07 15:51:03 +03:00
  • 4eae4b0ce6 unify adreno autogen with mesa (#13604) Christopher Milan 2025-12-06 12:17:36 -08:00
  • e20bc0b9b5 remove unused function parameter in beam search (#13602) kamilisjon 2025-12-06 18:40:47 +02:00
  • abafb96441 hcq: check all subbufs are free (#13599) nimlgen 2025-12-06 17:43:18 +03:00
  • f2b549d921 amd: refactor scratch calc (#13595) nimlgen 2025-12-06 16:41:35 +03:00
  • 4562f217e1 more bert updates (#13597) chenyu 2025-12-06 08:32:43 -05:00
  • 93f1baca77 feat: tk fa in tensor (#13580) wozeparrot 2025-12-05 14:36:29 -08:00
  • cb4c6324ef revert bert grad accumulation (#13596) chenyu 2025-12-05 17:30:08 -05:00
  • f20212e1ec refactor viz error handler (#13593) qazal 2025-12-06 02:37:39 +08:00
  • dec2f50aee reenable process replay for lvp (#13592) Christopher Milan 2025-12-05 09:36:35 -08:00
  • 0977206b1c Revert am (#13591) chenyu 2025-12-05 11:03:12 -05:00
  • ac1227575f IMAGE=1 driving_vision in benchmark (#13587) chenyu 2025-12-05 10:20:54 -05:00
  • 4d8b283b36 hotfix: amd: tmpring (#13589) nimlgen 2025-12-05 18:19:05 +03:00
  • dec2ea8a28 Revert "amd: use correct structs (#13583)" revert-13583-rocr_desc_2 chenyu 2025-12-05 09:24:02 -05:00
  • 8c332219f9 viz: remove x86asm highlighter (#13586) qazal 2025-12-05 21:05:50 +08:00
  • 5d8726d8d2 viz: refactor to generic sidebar (#13584) qazal 2025-12-05 20:09:41 +08:00
  • d8b09eda57 amd: use correct structs (#13583) nimlgen 2025-12-05 14:46:38 +03:00
  • 6d92e9ffbf hotfix: skip process replay on lvp (#13585) qazal 2025-12-05 19:25:23 +08:00
  • 8011b953c9 mesa: remove glsl type hack (#13578) Christopher Milan 2025-12-04 21:18:56 -05:00
  • c5bd28e21d start work on schedule cache (#13529) George Hotz 2025-12-04 17:24:49 -08:00
  • 62e2fc5108 tk: global load/store rv (#13577) wozeparrot 2025-12-04 17:23:48 -08:00
  • 5cfe1698e8 autogen: strip function parameter qualifiers (#13576) Christopher Milan 2025-12-04 19:54:34 -05:00
  • d1223922b1 fixed and test is real sched_cache George Hotz 2025-12-04 16:52:11 -08:00
  • 05c4b18f91 Merge branch 'master' into sched_cache George Hotz 2025-12-04 16:46:23 -08:00
  • f21c9dbf4b enable PMC with VIZ=2 (#13575) qazal 2025-12-05 03:09:53 +08:00
  • d7caae5f61 viz: tabulate pmc (#13574) qazal 2025-12-05 03:08:39 +08:00
  • 42f6cf3a90 tighter test_real_world mem and kernel count bounds (#13573) chenyu 2025-12-04 13:35:39 -05:00
  • 89f9e1dcd5 add SGD to beautiful_mnist (#13571) chenyu 2025-12-04 12:17:29 -05:00
  • 512a8f3dd4 viz: start global memory PMC tests (#13569) qazal 2025-12-05 00:40:27 +08:00
  • 7df56d3b99 Optimizer.device is a property (#13568) chenyu 2025-12-04 09:25:15 -05:00
  • db99a61fad qcom: support cpu mappings (#13565) nimlgen 2025-12-04 14:50:46 +03:00
  • bd6a068ef7 move track_rewrites to outer schedule cache (#13556) George Hotz 2025-12-04 03:13:45 -08:00
  • 3eae146139 faster process replay [pr] (#13564) qazal 2025-12-04 18:52:07 +08:00
  • 6eab756578 fix and test loading num_batches_tracked (#13538) Rory Clear 2025-12-04 09:22:49 +00:00
  • 877a7fdd61 jit: support encdec (#13563) nimlgen 2025-12-04 11:58:34 +03:00
  • a8a62bc08e add max/min reduction support to ScatterND (#13562) Douglas Nyberg 2025-12-04 03:53:47 -05:00
  • edf929ec9d fix: add __delitem__ to Tensor with proper TypeError (#13561) ayanhan 2025-12-04 17:53:08 +09:00
  • d379cd6d92 test: remove CAPTURE_PROCESS_REPLAY=1 from tests pr_overhead qazal 2025-12-04 09:48:07 +02:00
  • 9411ecedc4 fix CUDA half-precision trunc() type mismatch (#13559) Douglas Nyberg 2025-12-03 21:53:16 -05:00
  • 92b40290c7 fix: add test_sum_int and remove outdated TODO in test_custom_kernel (#13560) ayanhan 2025-12-04 11:51:58 +09:00
  • 0a54434b15 mitigate ctypes c_bool bitfield bug (#13558) Christopher Milan 2025-12-03 20:46:04 -05:00
  • f58b3afeb2 Merge branch 'master' into sched_cache George Hotz 2025-12-03 16:12:44 -08:00
  • 96d16675fe update examples/gradaccum_mnist.py to use the JIT George Hotz 2025-12-03 16:11:42 -08:00
  • e0a805765e full jit George Hotz 2025-12-03 16:08:34 -08:00
  • 7c66e44454 fix JIT in examples/gradaccum_mnist.py George Hotz 2025-12-03 16:00:28 -08:00
  • e75e391ad4 Merge branch 'master' into sched_cache George Hotz 2025-12-03 15:41:31 -08:00
  • 24ca8eeaa7 small fixups from schedule_cache (#13557) George Hotz 2025-12-03 15:41:16 -08:00
  • 8c69e26d22 metadata is best effort George Hotz 2025-12-03 15:22:58 -08:00