Commit Graph

423 Commits

Author SHA1 Message Date
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
Roelof van Dijk
e329baffa7 fix cifar while keeping openpilot fused (#13528)
* this works

* test now passes
2025-12-02 12:05:56 -08:00
qazal
366badaa68 require renderer argument in get_program, removes device opening in process replay [pr] (#13524) 2025-12-03 02:05:31 +08:00
George Hotz
567066f51f tests for cast there and back (#13195)
* fix cast folding in llama

* dtypes that work everywhere

* Skip test_cast_there_and_back for backend casts

Skip test due to backend casting issues.
2025-11-14 16:56:09 -08:00
chenyu
3f939f3d3c update pm_simplify_valid (#13241)
* update pm_simplify_valid

fixed openpilot conv regression

* IMAGE training is broken
2025-11-12 19:40:02 -05:00
Ahmed Harmouche
3ecff3a8da Fix dim splitting bug for len(dim) == len(limited) case (#13142)
* Fix gpudims bug on webgpu

* Fix split dim bug

* Remove webgpu_bug from examples

* Add test for shape correctness

* Fix 3D indexing

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-11-07 12:31:06 -05:00
George Hotz
42b34cf83d bottom up linearizer (#13133)
* bottom up linearizer

* late stores

* more complete

* remove broken heuristic

* upcast size

* opt

* more conservative

* it needs that

* disable opencl half on QCOM

* fix

* make that a real test

* cpu test okay

* ptx skip

* end is after the range
2025-11-06 15:30:32 -08:00
Sieds Lykles
e110f4632a split cat (on cpu) (#12864)
* split ranges but only on cpu

* except KernelOptError for threads

* use GROUP and END

* no more flatten_range needed

* remove noop end

* always process replay for openpilot

* update test

* skip test

* fix in outs calculation

With the new linearizer the toposort is a problem, this matches the spec
now

* undo that
2025-10-28 07:55:19 +01:00
George Hotz
726988fa4b late ifs try 2 (#12865)
* late ifs try 2

* fix image

* fix that test

* panic

* ptx fixups

* preserve toposort

* those pass locally

* Revert "those pass locally"

This reverts commit 063409f828.

* no ls

* make that explicit
2025-10-22 18:49:27 +08:00
George Hotz
c780cd9abb new linearizer with early endrange (#12823)
* new linearizer with early endrange

* cleanups

* second stage removal

* not store

* do that later

* end cleanup

* fix globals

* end

* multi end

* fix ends earlier

* work

* do_merge_ends

* mini change

* range_gate

* fix cpu

* test fixups

* ranges on index

* not for ptx
2025-10-21 17:37:48 +08:00
George Hotz
a71a41f6d1 rename Ops.ENDRANGE -> Ops.END (#12824) 2025-10-21 11:32:18 +08:00
Sieds Lykles
e537e895b1 drop unused invalid conditions (#12635)
* drop where conditions if the ranges are not used inside the index

* remove allow_any_len
2025-10-13 10:52:21 +02:00
Sieds Lykles
772a8dfe31 reshape uses valid when simplifying (#12597)
* reshape uses valid when simplifying

* try with IGNORE_OOB=0

* is it this test?

* skipif gpuocelot
2025-10-11 17:02:54 +02:00
chenyu
cf8232ec6a clean up more RANGEIFY flag (#12556) 2025-10-09 03:06:48 -04:00
qazal
bb5671a837 some more ops.py cleanups (#12525)
* remove GroupOp.Meta and st_arg

* inline axis_arg

* only allow .buffer on reshapes (or the buffer)

* gate is the other way

* still want can_pad?

* use op_in_backward_slice_with_self

* .buffer is recursive

* lint

* pathlib there
2025-10-09 06:06:44 +03:00
qazal
b6835f4134 remove Ops.VIEW and related UOp methods (#12522)
* remove Ops.VIEW and related UOp methods

* update abstractions2.py

* no ShapeTrackers in abstractions2.py

* it's a size 1
2025-10-08 14:47:02 +03:00
qazal
6f26603f06 delete swizzler.py (#12518)
* delete swizzler

* remove merge_views tests

* don't need rewrites_for_views

* apply_rewrites
2025-10-08 13:02:34 +03:00
chenyu
bf99de7b1e update a few more tests for RANGEIFY (#12434) 2025-10-03 00:16:58 -04:00
George Hotz
44558a37f7 fix some rangeify tests (#12370)
* fix bad range merges

* fix rng

* fix uop gc

* fix some rangeify tests

* now that needs rangeify 2 also
2025-09-30 20:12:08 +08:00
qazal
1591e4f66b update outbufs selection in test_linearizer [pr] (#12166) 2025-09-14 13:46:49 +03:00
nimlgen
551560b87c do not use getenv('PTX') in tests (#12095)
* test without ptx

* fix tests

* fix test

* linters
2025-09-10 14:04:07 +03:00
George Hotz
12c7b1bb01 cleanup lin tests without Kernel (#12041)
* cleanup lin tests without Kernel

* no kernel.py there

* remove that test
2025-09-05 15:13:14 -07:00
George Hotz
2b5a73ac65 improve test_linearizer (#12016)
* improve test_linearizer

* tweaks

* simpler

* get_prg

* that one doesn't have to return

* fix postopt bugs

* fix rng
2025-09-04 20:44:05 -07:00
George Hotz
70ce29b630 test pyrender (#12005)
* test pyrender

* make them print

* switch to pyrendered
2025-09-04 11:48:40 -07:00
George Hotz
560df206cc split tc test (#12003)
* split tc test

* split hand coded opts

* remove some skipped tests

* skips on emulated
2025-09-04 11:47:56 -07:00
George Hotz
9dee724fc4 make EMULATE a context var (#12002)
* make EMULATE a context var

* fix test amx
2025-09-04 11:15:43 -07:00
George Hotz
09106e4aae refactor and split test_linearizer (#12001)
* refactor and split test_linearizer

* forget that file

* imports

* remove from docs

* test gen float4
2025-09-04 10:53:07 -07:00
Sieds Lykles
572a3c15c6 Move Ops.SPECIAL arg to src (#11918)
* initial moving bound to src

* arg to src

* remove import

* fixup linearizer

* arg to src

* fix test_uop_graph

* fix more tests

* fix python renderer

* get const value from const uop

* ssimplify uop estimates

* fix webgpu locals

* fix old test

* gate Ops.SPECIAL in linearizer

* use ssimplify() for local/global_size

* remove toposort gate_parents_instead_of_self

* fix rendering in comment

* cleanup

* rename and add comments

* add BottomUpGate with test
2025-09-04 09:31:44 +02:00
George Hotz
a5f2b4872a use_tensor_cores is a heuristic (#11989)
* use_tensor_cores is a heuristic

* context
2025-09-03 17:05:10 -07:00
George Hotz
63e930fec3 apply_tensor_cores is a heuristic (#11988)
* apply_tensor_cores is a heuristic

* delete extra_opts
2025-09-03 16:39:33 -07:00
George Hotz
394c2d1db1 update Kernel API in tests + move optimize_local_size (#11907) 2025-08-28 15:12:47 -07:00
George Hotz
b9b438c516 small updates from postopt (#11903)
* tests from postopt

* modernize

* skip lin tests

* that's fixed?

* skip, not failure
2025-08-28 12:34:52 -07:00
quortus
5f8fe9a331 Replace ASSIGN with STORE in test_linearizer (#11821) 2025-08-28 07:33:20 -07:00
chenyu
91a4de4ca7 fix getitem with inf in tensor (#11781) 2025-08-21 21:55:32 -04:00
George Hotz
4b3fcb4064 Revert "REDUCE_AXIS keepdim=False (#11311)" (#11718)
This reverts commit b518a7378a.
2025-08-18 13:28:53 -07:00
b1tg
b518a7378a REDUCE_AXIS keepdim=False (#11311)
* progress

* fix tests

* fix tests

* remove hack for test_symfold

* fix test_conv.py  on llvm

* hack test_cache_speed

* lint

* remove hack for helper_linearizer_opt

* tests

* fix DSP

* clean up

* remove hack for kernelize.py

* hack for test/test_multitensor.py TestMultiTensor.test_matmul_shard_none

* clean

* uop.r need reshape?

* lower_store cause fail

* fix lower?

* avoid contiguous hack

* 2134

* conv2d count

* remove unused

* hack lower

* reduced and clean up

* fix TestMultiTensor.test_matmul_shard_none

* src sync + fix TestMultiTensor.test_matmul_shard_none

* remove excluded in mop

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-08-18 10:09:17 -07:00
George Hotz
82be8abfd2 move opt under codegen (#11569) 2025-08-07 14:19:17 -07:00
George Hotz
21570545d3 move view pushing to codegen, try 2 (#11534)
* move view pushing to codegen, try 2

* fix up some linearizer tests

* fix test search

* fix test schedule

* delete that test

* fix test arange

* fix a few tests

* update tests

* push views

* ebs cleanup

* fix local/reg

* test and lint

* fix more tests

* test cleanups

* skipped that one
2025-08-06 15:58:38 -07:00
George Hotz
6fd1332763 update some tests for less Kernel (#11543)
* update some tests for less Kernel

* get_program update
2025-08-06 14:19:59 -07:00
George Hotz
4fe11725c6 pass through sink arg, update linearizer test (#11536)
* pass through sink arg, update linearizer test

* get_program help

* bump line count

* use new api
2025-08-06 09:48:48 -07:00
chenyu
0e5d8d5c3c remove tests that used .to_uop() (#11425)
* remove tests that used .to_uop()

* import
2025-07-29 15:52:16 -04:00
George Hotz
466ab5a3f2 store/load not pass through index (#11381)
* noop

* fix noop

* store cat is NOOP

* store dtype is void

* stores aren't passed through anymore

* meh, skip those for ptx

* correct ptx skip

* hl runs
2025-07-25 21:01:47 -07:00
George Hotz
e14b4fefa5 ranges on store (#11334)
* ranges on store

* fix store spec

* fix that

* fix gates

* fix tests

* fix ptx
2025-07-22 21:00:50 -07:00
George Hotz
affd83961c small changes from define_reg (#11327)
* small changes from define_reg

* fix webgpu
2025-07-22 11:11:48 -07:00
George Hotz
3b674df34b generic changes from define_reg_2 (#11315)
* generic changes from define_reg_2

* fix for ptx

* ugh, that one
2025-07-21 15:14:06 -07:00
chenyu
54924f9969 type remove Union and Optional [pr] (#11283)
use `|` for consistency
2025-07-19 14:05:52 -04:00
chenyu
ec3efd2919 move upcast before reduce (#11250)
* move upcast before reduce

upcast goes to end of global+local+upcast

* r_196_32_4_24_8
2025-07-18 14:42:15 -04:00
chenyu
522dc72f08 remove Kernel.local_dims [pr] (#11268)
* remove Kernel.local_dims [pr]

also not needed

* fix test_matvec
2025-07-16 17:46:19 -04:00
chenyu
c8e5c4d7c3 insert_before -> insert_at [pr] (#11257)
more precise
2025-07-15 17:44:34 -04:00
chenyu
b6662096cb remove more first_reduce [pr] (#11239) 2025-07-14 19:13:44 -04:00