chenyu
e18922f111
limit AND const min max to ints [pr] ( #12918 )
2025-10-25 16:07:52 -04:00
nimlgen
92324172be
amd: refactor usb into usbdevice ( #12916 )
...
* amd: refactor usb into usbdevice
* nu
* my bad
* ops
* my bad
2025-10-26 01:00:19 +08:00
qazal
3b192f5eac
split viz graph rendering from dag layout ( #12914 )
2025-10-25 15:36:44 +08:00
George Hotz
6415e3e8a7
use Ops.GROUP instead of Ops.NOOP for merging stores ( #12912 )
...
* use Ops.GROUP instead of Ops.NOOP for merging stores
* fs noop
2025-10-25 12:26:12 +08:00
George Hotz
b4f6a2c7a3
add kernel spec ( #12911 )
...
* add kernel spec
* fix kernel spec
2025-10-25 11:49:20 +08:00
George Hotz
8a941d95a4
SPEC=2 is full spec, SPEC=1 is default ( #12910 )
...
* SPEC=1 passes all tests
* just use SPEC, not __debug__
2025-10-25 11:10:43 +08:00
wozeparrot
456560c1ff
stateless tinyfs copyin ( #12908 )
2025-10-24 19:18:38 -07:00
wozeparrot
a5b0f57067
clean: cleanup tinyfs copyout ( #12907 )
2025-10-24 18:32:55 -07:00
chenyu
4b7329001d
clean up test_avg_pool3d ( #12905 )
2025-10-24 14:31:36 -04:00
George Hotz
6b35467f53
stores don't end ranges ( #12902 )
...
* early endrange
* bugfixes
2025-10-24 23:05:03 +08:00
nimlgen
5b5ba31a86
amd: make sqtt bufs uc ( #12898 )
2025-10-24 18:55:14 +08:00
Sieds Lykles
e1f8c82938
Onnx Layer/Group/RMS/Batch-Norm ReduceL2 fp32 intermediates for fp16 ( #12109 )
...
* match onnx spec
* use least_upper_dtype
* promote the square
* just cast before the square
2025-10-24 12:26:11 +02:00
George Hotz
0bde87d8d7
cleanups from flash attention branch ( #12897 )
2025-10-24 14:14:56 +08:00
wozeparrot
9dac505565
variable bs keccak ( #10731 )
2025-10-23 14:10:21 -07:00
chenyu
154b4f9f40
test FUSE_OPTIM=1 test/test_optim.py ( #12895 )
2025-10-23 15:54:27 -04:00
chenyu
6e4ee8deea
small heuristic cleanup [pr] ( #12892 )
2025-10-23 10:50:15 -04:00
nimlgen
f835566e27
sqtt: correct header ( #12891 )
...
* sqtt: correct header
* f
2025-10-23 22:37:17 +08:00
Sieds Lykles
c1db62ff7c
move reduce collapse to rangeify ( #12845 )
2025-10-23 15:44:17 +02:00
Sieds Lykles
04b3e51f1b
remove old reduce collapse rule ( #12889 )
...
* comment this out
* remove
2025-10-23 13:51:49 +02:00
qazal
cdfb8e31ae
hotfix: correct viz rewrite step counter reset ( #12890 )
2025-10-23 19:47:16 +08:00
George Hotz
6df19a4ac6
lil qol improvements to viz ( #12887 )
2025-10-23 18:41:07 +08:00
George Hotz
ff68a6263b
move locals into codegen (dedup works) ( #12885 )
...
* move locals into codegen (dedup works)
* move in optimize
2025-10-23 17:07:39 +08:00
George Hotz
ddb53d1d48
PCONTIG=3 both saves ram and flops ( #12884 )
...
* PCONTIG=3 both saves ram and flops
* group
* gate locals
* should be correct
2025-10-23 16:37:26 +08:00
qazal
2a5c22436e
remove outdated docs ( #12881 )
2025-10-23 12:52:36 +08:00
qazal
bcc30e5e10
viz: add linearized UOp list view ( #12883 )
...
* viz: add linearized UOp list view
* lang
2025-10-23 12:52:14 +08:00
George Hotz
e85cee0aad
flip Ops.END srcs ( #12882 )
...
* flip Ops.END srcs
* backward
* late end split
2025-10-23 12:47:50 +08:00
George Hotz
74b4cfe44b
Ops.GROUP + range check ( #12880 )
...
* simpler
* fix that
* Ops.GROUP + range check
* fix bugs
* fix linter
* fix test
2025-10-23 12:05:21 +08:00
Sieds Lykles
914defd55d
give endrange priority ( #12870 )
...
* uncomment line
* try giving endrange priority
2025-10-23 05:19:13 +02:00
qazal
2f95c10702
remu new instructions / use volatile in emulator tests ( #12862 )
...
* remu new instructions
* start moving to volatile
* test_simple works
* test_exec_mov works and lid is still here
* test_exec_cmp_vopc
* clang did s_mov_b32 exec_lo, 1
* don't hardcode v1
* support volatile in tests
* hw_test passes
* only the volatile version
* subrev saturating behavior
2025-10-23 11:13:43 +08:00
George Hotz
e718254004
simpler end ( #12879 )
...
* simpler
* fix that
2025-10-23 10:35:58 +08:00
wozeparrot
6e00dec95d
feat: pin openpilot 0.10.1 models ( #12878 )
2025-10-22 14:57:54 -07:00
wozeparrot
3a9aa05359
feat: extra nvcc options ( #12876 )
2025-10-22 13:21:11 -07:00
chenyu
f0831c8c30
add 0.10.0 to comma benchmark ( #12875 )
...
* add 0.10.0 to comma benchmark
disabled the 0.10.1 ones which are pinned to master. it does not work because benchmark uses the cached old version
* that's pinned
2025-10-22 15:18:21 -04:00
nimlgen
e7e535cd53
amd: sqtt for gfx9 ( #12844 )
...
* amd: start sqtt for gfx9
* writes something, but sometimes zeroes
* HEADER!
* w
* tiny
* mypy
2025-10-23 02:31:07 +08:00
b1tg
81108f91ee
amd tc: 16x16x32 ( #12874 )
...
* amd tc: 16x16x32
* test
* clean, test amd_cdna4
2025-10-22 13:48:01 -04:00
George Hotz
bf173c0a37
we don't support multi end yet ( #12869 )
2025-10-22 23:43:32 +08:00
nimlgen
a7bc0104c2
amd: clean up sqtt_stop ( #12872 )
2025-10-22 22:17:03 +08:00
nimlgen
b6eb9172ea
amd: fix ip offsets ( #12867 )
2025-10-22 20:50:18 +08:00
George Hotz
174811fc0f
hotfix: slightly looser load spec for AMD bfloat16
2025-10-22 19:55:59 +08:00
George Hotz
7762b3558b
clean up the spec ( #12868 )
...
* tighten up the spec
* move validate into a different file
* that moved to validate
* after(barr)
2025-10-22 19:50:42 +08:00
George Hotz
726988fa4b
late ifs try 2 ( #12865 )
...
* late ifs try 2
* fix image
* fix that test
* panic
* ptx fixups
* preserve toposort
* those pass locally
* Revert "those pass locally"
This reverts commit 063409f828 .
* no ls
* make that explicit
2025-10-22 18:49:27 +08:00
George Hotz
6abe90fb7c
fix linearizer non-determinism ( #12866 )
2025-10-22 17:51:35 +08:00
qazal
cebc2b5721
cleanup viz profiler metadata ui ( #12860 )
...
* cleanup viz profiler metadata ui
* text
* select over .args
* space
2025-10-22 17:31:12 +08:00
Sieds Lykles
8d0256c46b
Move gate to load for loaded index ( #12861 )
...
* change condition
* change test to better represent how the uop looks irl
2025-10-22 09:53:07 +02:00
chenyu
6d86e962c7
update ASSERT_MIN_STEP_TIME ( #12857 )
...
0.10.1 driving_policy is good now, still need driving_vision and dmonitoring to be fast
2025-10-21 22:46:07 -04:00
George Hotz
92778c7a8b
rename opts to ren, add store ranges back ( #12856 )
...
* rename opts to ren
* fix docs and bring store back
2025-10-22 09:15:38 +08:00
chenyu
c5cee74706
remove BLOCK_REORDER ( #12854 )
...
not used
2025-10-21 19:10:14 -04:00
chenyu
0b673eddec
simpler newton_schulz transpose ( #12853 )
2025-10-21 17:21:45 -04:00
b1tg
60d7e232f2
cuda fp8 ( #12782 )
...
* cuda fp8
* tensor core
* tc test
* clean
* clean pm
2025-10-21 15:05:25 -04:00
Harald Schäfer
587ccc0e5c
compile3: make selftests opt-in ( #12851 )
2025-10-21 11:32:27 -07:00