Commit Graph

  • 94701d4838 clean up divide_exact order [pr] (#12919) chenyu 2025-10-25 18:47:57 -04:00
  • e18922f111 limit AND const min max to ints [pr] (#12918) chenyu 2025-10-25 16:07:52 -04:00
  • 92324172be amd: refactor usb into usbdevice (#12916) nimlgen 2025-10-26 01:00:19 +08:00
  • 3b192f5eac split viz graph rendering from dag layout (#12914) qazal 2025-10-25 15:36:44 +08:00
  • fbc7f4c12a only floats relu_pattern George Hotz 2025-10-25 13:17:33 +08:00
  • 2fec7ed6df lil symbolic pattern for relu George Hotz 2025-10-25 13:12:21 +08:00
  • 6415e3e8a7 use Ops.GROUP instead of Ops.NOOP for merging stores (#12912) George Hotz 2025-10-25 12:26:12 +08:00
  • b4f6a2c7a3 add kernel spec (#12911) George Hotz 2025-10-25 11:49:20 +08:00
  • 8a941d95a4 SPEC=2 is full spec, SPEC=1 is default (#12910) George Hotz 2025-10-25 11:10:43 +08:00
  • 5c20955c8e just use SPEC, not __debug__ full_spec George Hotz 2025-10-25 10:34:01 +08:00
  • 9cdd284008 SPEC=1 passes all tests George Hotz 2025-10-25 10:31:38 +08:00
  • 456560c1ff stateless tinyfs copyin (#12908) wozeparrot 2025-10-24 19:18:38 -07:00
  • df181f3301 kitten matmul is running kitten_matmul George Hotz 2025-10-25 01:40:09 +00:00
  • a5b0f57067 clean: cleanup tinyfs copyout (#12907) wozeparrot 2025-10-24 18:32:55 -07:00
  • 4b7329001d clean up test_avg_pool3d (#12905) chenyu 2025-10-24 14:31:36 -04:00
  • 6b35467f53 stores don't end ranges (#12902) George Hotz 2025-10-24 23:05:03 +08:00
  • 5b5ba31a86 amd: make sqtt bufs uc (#12898) nimlgen 2025-10-24 18:55:14 +08:00
  • e1f8c82938 Onnx Layer/Group/RMS/Batch-Norm ReduceL2 fp32 intermediates for fp16 (#12109) Sieds Lykles 2025-10-24 12:26:11 +02:00
  • 0bde87d8d7 cleanups from flash attention branch (#12897) George Hotz 2025-10-24 14:14:56 +08:00
  • cdef359305 cleanups from flash attention branch fa_cleanups George Hotz 2025-10-24 13:15:31 +08:00
  • 9dac505565 variable bs keccak (#10731) wozeparrot 2025-10-23 14:10:21 -07:00
  • 154b4f9f40 test FUSE_OPTIM=1 test/test_optim.py (#12895) chenyu 2025-10-23 15:54:27 -04:00
  • 6e4ee8deea small heuristic cleanup [pr] (#12892) chenyu 2025-10-23 10:50:15 -04:00
  • f835566e27 sqtt: correct header (#12891) nimlgen 2025-10-23 22:37:17 +08:00
  • c1db62ff7c move reduce collapse to rangeify (#12845) Sieds Lykles 2025-10-23 15:44:17 +02:00
  • 04b3e51f1b remove old reduce collapse rule (#12889) Sieds Lykles 2025-10-23 13:51:49 +02:00
  • cdfb8e31ae hotfix: correct viz rewrite step counter reset (#12890) qazal 2025-10-23 19:47:16 +08:00
  • 6df19a4ac6 lil qol improvements to viz (#12887) George Hotz 2025-10-23 18:41:07 +08:00
  • ff68a6263b move locals into codegen (dedup works) (#12885) George Hotz 2025-10-23 17:07:39 +08:00
  • 6c66cbe5a1 move in optimize late_Locals George Hotz 2025-10-23 16:47:45 +08:00
  • ad6dd64fee move locals into codegen (dedup works) George Hotz 2025-10-23 16:45:59 +08:00
  • ddb53d1d48 PCONTIG=3 both saves ram and flops (#12884) George Hotz 2025-10-23 16:37:26 +08:00
  • 2a5c22436e remove outdated docs (#12881) qazal 2025-10-23 12:52:36 +08:00
  • bcc30e5e10 viz: add linearized UOp list view (#12883) qazal 2025-10-23 12:52:14 +08:00
  • e85cee0aad flip Ops.END srcs (#12882) George Hotz 2025-10-23 12:47:50 +08:00
  • 74b4cfe44b Ops.GROUP + range check (#12880) George Hotz 2025-10-23 12:05:21 +08:00
  • 914defd55d give endrange priority (#12870) Sieds Lykles 2025-10-23 05:19:13 +02:00
  • 2f95c10702 remu new instructions / use volatile in emulator tests (#12862) qazal 2025-10-23 11:13:43 +08:00
  • e718254004 simpler end (#12879) George Hotz 2025-10-23 10:35:58 +08:00
  • 076bfa50e3 fix that simpler_end George Hotz 2025-10-23 10:24:25 +08:00
  • f4cea6a403 simpler George Hotz 2025-10-23 10:21:27 +08:00
  • 6e00dec95d feat: pin openpilot 0.10.1 models (#12878) wozeparrot 2025-10-22 14:57:54 -07:00
  • 3a9aa05359 feat: extra nvcc options (#12876) wozeparrot 2025-10-22 13:21:11 -07:00
  • f0831c8c30 add 0.10.0 to comma benchmark (#12875) chenyu 2025-10-22 15:18:21 -04:00
  • e7e535cd53 amd: sqtt for gfx9 (#12844) nimlgen 2025-10-23 02:31:07 +08:00
  • 81108f91ee amd tc: 16x16x32 (#12874) b1tg 2025-10-23 01:48:01 +08:00
  • bf173c0a37 we don't support multi end yet (#12869) George Hotz 2025-10-22 23:43:32 +08:00
  • a7bc0104c2 amd: clean up sqtt_stop (#12872) nimlgen 2025-10-22 22:17:03 +08:00
  • b6eb9172ea amd: fix ip offsets (#12867) nimlgen 2025-10-22 20:50:18 +08:00
  • 93aa420f3b we don't support multi end yet no_merge_ends George Hotz 2025-10-22 19:59:53 +08:00
  • 174811fc0f hotfix: slightly looser load spec for AMD bfloat16 George Hotz 2025-10-22 19:54:41 +08:00
  • abb4d30476 hotfix: slightly looser load spec for AMD bfloat16 clean_spec George Hotz 2025-10-22 19:54:41 +08:00
  • 7762b3558b clean up the spec (#12868) George Hotz 2025-10-22 19:50:42 +08:00
  • eeae2f5768 after(barr) George Hotz 2025-10-22 19:40:13 +08:00
  • 6c8124bf80 that moved to validate George Hotz 2025-10-22 19:32:32 +08:00
  • 7c968898ae move validate into a different file George Hotz 2025-10-22 19:28:35 +08:00
  • 7490b2553e tighten up the spec George Hotz 2025-10-22 19:24:23 +08:00
  • 726988fa4b late ifs try 2 (#12865) George Hotz 2025-10-22 18:49:27 +08:00
  • 6abe90fb7c fix linearizer non-determinism (#12866) George Hotz 2025-10-22 17:51:35 +08:00
  • cebc2b5721 cleanup viz profiler metadata ui (#12860) qazal 2025-10-22 17:31:12 +08:00
  • 8d0256c46b Move gate to load for loaded index (#12861) Sieds Lykles 2025-10-22 09:53:07 +02:00
  • 84dde23f57 tiny gpu driver building without SIP George Hotz 2025-10-22 11:10:43 +08:00
  • 6d86e962c7 update ASSERT_MIN_STEP_TIME (#12857) chenyu 2025-10-21 22:46:07 -04:00
  • 92778c7a8b rename opts to ren, add store ranges back (#12856) George Hotz 2025-10-22 09:15:38 +08:00
  • c5cee74706 remove BLOCK_REORDER (#12854) chenyu 2025-10-21 19:10:14 -04:00
  • 0b673eddec simpler newton_schulz transpose (#12853) chenyu 2025-10-21 17:21:45 -04:00
  • 60d7e232f2 cuda fp8 (#12782) b1tg 2025-10-22 03:05:25 +08:00
  • 587ccc0e5c compile3: make selftests opt-in (#12851) Harald Schäfer 2025-10-21 11:32:27 -07:00
  • c3149c618a feat: nvcc compiler (#12852) wozeparrot 2025-10-21 11:31:23 -07:00
  • 8baa61bd67 use torch 2.9 and its Muon in test (#12773) chenyu 2025-10-21 13:35:17 -04:00
  • f51f9aaa16 muon ns_params -> ns_coefficients (#12850) chenyu 2025-10-21 12:35:52 -04:00
  • 62e7b8b870 feat: just use compile3 (#12849) wozeparrot 2025-10-21 07:56:50 -07:00
  • c7336c3e31 amd: sqtt for aql (#12846) nimlgen 2025-10-21 22:35:01 +08:00
  • 8960ac54f3 remove RewriteStep premature optimization (#12840) George Hotz 2025-10-21 21:45:20 +08:00
  • 7f798a9630 Cleanup const buffers (#12829) Sieds Lykles 2025-10-21 14:53:49 +02:00
  • 1ad6598963 amd: trace all instructions (#12831) nimlgen 2025-10-21 20:52:24 +08:00
  • cdc72556a1 no more brew (#12839) Christopher Milan 2025-10-21 08:12:46 -04:00
  • 20a232f1c5 bugfixes from multioutput + PCONTIG=3 for fa bw memory fix (#12837) George Hotz 2025-10-21 19:21:02 +08:00
  • 0435d31f1c viz: generic back button functionality (#12838) qazal 2025-10-21 18:52:00 +08:00
  • c9f1ed10c3 gate MULTIOUTPUT multioutput George Hotz 2025-10-21 18:25:38 +08:00
  • 483cd44cbf Merge branch 'master' into multioutput George Hotz 2025-10-21 18:16:34 +08:00
  • 7d9551ce2e move to late/control_flow.py (#12835) George Hotz 2025-10-21 18:15:06 +08:00
  • d711a4b933 delete old linearizer (#12834) George Hotz 2025-10-21 17:52:18 +08:00
  • 40633ab34d list buffer args to kernel in profiler (#12826) qazal 2025-10-21 17:51:36 +08:00
  • c780cd9abb new linearizer with early endrange (#12823) George Hotz 2025-10-21 17:37:48 +08:00
  • d59d4cdbe4 lil less is okay George Hotz 2025-10-21 17:09:44 +08:00
  • 32af1ff84b viz graph drawing small cleanups (#12830) qazal 2025-10-21 15:51:32 +08:00
  • 367fbabc30 remove Ops.SUBSTITUTE (#12827) Sieds Lykles 2025-10-21 08:19:42 +02:00
  • 57f6b6f229 style view codegen like a link in profiler (#12825) qazal 2025-10-21 12:15:13 +08:00
  • 154cdfe46d viz state cleanups (#12821) qazal 2025-10-21 11:44:51 +08:00
  • a71a41f6d1 rename Ops.ENDRANGE -> Ops.END (#12824) George Hotz 2025-10-21 11:32:18 +08:00
  • 8521fd5263 viz: hierarchical rewrites (#12805) qazal 2025-10-21 10:55:41 +08:00
  • df2f8b9295 use after on locals (#12815) George Hotz 2025-10-21 10:29:12 +08:00
  • 68c045bf0a NIR: Check for brew packages tinymesa and tinymesa_cpu (#12739) Christopher Milan 2025-10-20 21:38:43 -04:00
  • 990e8b97ee feat: log openpilot 0.10.1 times (#12816) wozeparrot 2025-10-20 18:30:34 -07:00
  • 565a7a6218 num_batches_tracked has shape () (#12820) George Hotz 2025-10-21 09:22:39 +08:00
  • 87affa8661 num_batches_tracked has shape () num_batches_tracked_shape George Hotz 2025-10-21 09:10:35 +08:00
  • 25beea5769 hotfix: suppress_finalizing on device __del__ George Hotz 2025-10-21 09:04:36 +08:00
  • c7c59e6dd7 unused UPat.or_broadcasted and GroupOp.Block [pr] (#12819) chenyu 2025-10-20 12:24:58 -04:00
  • e284f6325a llvm: fix compile key for different processors (#12812) nimlgen 2025-10-20 19:46:48 +08:00