Commit Graph

170 Commits

Author SHA1 Message Date
chenyu
52acadc160 consolidate IGNORE_OOB=0 tests (#13937)
add a new unit test file and add more cases
2025-12-31 15:24:20 -05:00
wozeparrot
0d55aec605 fix after end (#13542) 2025-12-02 18:42:58 -08:00
George Hotz
2da02f1ae1 add loads at the end (#12988)
* add loads at the end

* simpler

* late load

* tests passing

* fix matvec

* spec test passes

* fix where on load

* fix abs2

* fix more tests
2025-10-30 10:42:19 +08:00
Sieds Lykles
9f39f6391c shared_codegen_spec and fix index spec (#12967)
* split shared_codegen_spec and fix index

* add VCONST to program_spec and move index to shared_codegen_spec

* working ignore_oob=0

* cleanup

* fix spec

* undo that

* move barrier and special earlier

* fix more spec issues

* more updates

* remove special from program_spec

* cleanup and fixes

* move more to shared

* special is not in shared_spec

* some comments

* dont do bounds check there
2025-10-29 09:14:11 +01:00
George Hotz
907499b02c clean up GROUP/SINK (#12969)
* clean up GROUP/SINK

* fix end

* range_str color
2025-10-28 16:08:10 +08:00
George Hotz
4d817a289e simplify spec (#12958)
* simplify spec

* more
2025-10-28 09:52:32 +08:00
George Hotz
25c2da1579 check SPEC=2 in CI (#12945)
* check SPEC=2 in CI

* split SPEC=2

* fast enough
2025-10-27 21:53:57 +08:00
George Hotz
b4f6a2c7a3 add kernel spec (#12911)
* add kernel spec

* fix kernel spec
2025-10-25 11:49:20 +08:00
George Hotz
6b35467f53 stores don't end ranges (#12902)
* early endrange

* bugfixes
2025-10-24 23:05:03 +08:00
Sieds Lykles
c1db62ff7c move reduce collapse to rangeify (#12845) 2025-10-23 15:44:17 +02:00
George Hotz
74b4cfe44b Ops.GROUP + range check (#12880)
* simpler

* fix that

* Ops.GROUP + range check

* fix bugs

* fix linter

* fix test
2025-10-23 12:05:21 +08:00
George Hotz
7762b3558b clean up the spec (#12868)
* tighten up the spec

* move validate into a different file

* that moved to validate

* after(barr)
2025-10-22 19:50:42 +08:00
George Hotz
726988fa4b late ifs try 2 (#12865)
* late ifs try 2

* fix image

* fix that test

* panic

* ptx fixups

* preserve toposort

* those pass locally

* Revert "those pass locally"

This reverts commit 063409f828.

* no ls

* make that explicit
2025-10-22 18:49:27 +08:00
Sieds Lykles
8d0256c46b Move gate to load for loaded index (#12861)
* change condition

* change test to better represent how the uop looks irl
2025-10-22 09:53:07 +02:00
George Hotz
d711a4b933 delete old linearizer (#12834)
* new linearizer with early endrange

* cleanups

* second stage removal

* not store

* do that later

* end cleanup

* fix globals

* end

* multi end

* fix ends earlier

* work

* do_merge_ends

* mini change

* range_gate

* fix cpu

* test fixups

* ranges on index

* not for ptx

* delete linearizer

* remove more junk

* delete that test

* we insert endif

* all ends
2025-10-21 17:52:18 +08:00
George Hotz
c780cd9abb new linearizer with early endrange (#12823)
* new linearizer with early endrange

* cleanups

* second stage removal

* not store

* do that later

* end cleanup

* fix globals

* end

* multi end

* fix ends earlier

* work

* do_merge_ends

* mini change

* range_gate

* fix cpu

* test fixups

* ranges on index

* not for ptx
2025-10-21 17:37:48 +08:00
George Hotz
a71a41f6d1 rename Ops.ENDRANGE -> Ops.END (#12824) 2025-10-21 11:32:18 +08:00
Sieds Lykles
394dc24110 post index symbolic (#12446)
* post index symbolic

* add test
2025-10-03 23:23:03 +02:00
chenyu
940a8d5ba9 default IGNORE_OOB=1 (#12441)
* default IGNORE_OOB=1

z3 can get very slow with RANGEIFY, also update some kernel numbers to what it is

* add to test
2025-10-03 04:16:19 -04:00
Sieds Lykles
0047bcc535 undo loaded comparison swap (#12436)
* add rule

* add a test
2025-10-03 06:57:29 +02:00
Sieds Lykles
9a64fc0d28 Load alt value with cast try 2 (#12407)
* add or_casted

* add tests and fix old tests

* cast load

* move that to pm_render

* add allow_any_len to gated load patterns in renderers

* slice [:2]
2025-10-02 00:55:29 +02:00
chenyu
adc8c3b28f Revert "load alt value with cast (#12384)" (#12392)
This reverts commit 05e91a248d.
2025-10-01 03:20:04 -04:00
Sieds Lykles
05e91a248d load alt value with cast (#12384)
* add or_casted

* add tests and fix old tests

* cast load

* move that to pm_render
2025-10-01 07:14:26 +02:00
Sieds Lykles
73b25bf47d z3 fix loaded mask (#12353)
* z3 fix loaded mask

* indentation
2025-09-30 06:55:50 +02:00
Sieds Lykles
d55d829635 Lower index dtype spec fix (#12337)
* new pm_lower_index_dtype

* load_store_indexing after index lowering

* shorten line

* seperate rule for long removal

* fix test

* fix index_to_concrete_int

* minor fixes

* add sink there

* update types in linearizer test
2025-09-30 04:26:50 +02:00
Sieds Lykles
6146c64d81 lower the invalid gate last (#12164)
* lowering invalid gate is part of lower_index_dtype

* update test

* remove import

* put that back

* reduce_collapse uses invalid

* fix that pattern to use invalid_pat

* valid creates the right dtype count

* seperate rule for lowering invalid gate

* dont unvectorize Invalid gate

* image_fixup uses Invalid

* update tests

* cleanup

* update split_load_store

* add .scalar() there
2025-09-24 04:27:35 +02:00
Sieds Lykles
62376c8b2b update store load noop pattern to use Invalid (#12141)
* update pattern

* add test
2025-09-12 22:25:53 +02:00
Sieds Lykles
b5a3b8de20 remove where on gated load if gates are the same (#12129)
* add rules

* add tests
2025-09-12 06:52:35 +02:00
Sieds Lykles
1f3950a484 Invalid idx (#12067)
* merge index_dtype_3

* new lowering with Invalid idx

* remove that dtype from range

* finish merge

* annotate better

* indentation

* dont need that anymore

* always process replay for openpilot

* more uop_given_valid for idx

* valid past index_child

* fix bug preventing load getting an alt value

* add track_match_stats back in in shapetracker and remove cache

* get_valid_idx -> get_valid and get_idx

* fix heuristics with new idx

* split line

* fix typo

* fix signature

* dont skip idx if stride is 0

the idx may still be invalid

* lower const with new valid

* delete to_indexed_uops

* update shapetracker test

* delete axis_is_masked

* add cache back

* move around comment

* fix get_valid bug

* move invalid fold to symbolic so its earlier

* cleanup

* update applying padto to new idx

* add unit tests

* cleanup

* fold line

* improve spec

* dont try to render Invalid as a float

* more consistent invalid index

* update some tests

* Fold index with true cond

* skip test

* vconst min max if Invalid in arg

* fix signature of UOp.const

* add test for min/max of Invalid CONST/VCONST

* add InvalidType to as_const signature

* is Invalid to isinstance

* Add InvalidType to ConstLike

* index gate is a where gate

* make that a metaclass

* fix heurisics for new idx

* mypy happy
2025-09-12 01:42:02 +02:00
Sieds Lykles
581b2388c2 add dtypes.index (#12015)
* add dtypes.index

* cast shape, stride and mask to dtypes.index in view.create

* move pm_lower_index_dtype to ops

* DEFINE_VAR is dtype.index by default

* merge var_val_using_str

* remove int from commutative

* fix test_rewrite_map

* change that to dtypes.index

* change some int to index

* shorten those

* remove old cast in renderer

* cleanup

* change that back

* add comment

* delete comment

* just delete those

* view doesnt have to cast anymore

* adjust comment
2025-09-06 06:03:44 +02:00
George Hotz
433581f8ed make POSTOPT=2 the default (#12034)
* make POSTOPT=2 the default

* more matching tc

* fix winograd

* fix that test

* add matvec to Scheduler

* flip tc sort order

* similar speed

* fix beam on image

* disable slow tests

* slow
2025-09-05 14:34:05 -07:00
Sieds Lykles
572a3c15c6 Move Ops.SPECIAL arg to src (#11918)
* initial moving bound to src

* arg to src

* remove import

* fixup linearizer

* arg to src

* fix test_uop_graph

* fix more tests

* fix python renderer

* get const value from const uop

* ssimplify uop estimates

* fix webgpu locals

* fix old test

* gate Ops.SPECIAL in linearizer

* use ssimplify() for local/global_size

* remove toposort gate_parents_instead_of_self

* fix rendering in comment

* cleanup

* rename and add comments

* add BottomUpGate with test
2025-09-04 09:31:44 +02:00
George Hotz
afad7d0cd1 remove dtype from range, it will be dtypes.index soon [pr] (#11914)
* remove dtype from range, it will be dtypes.index soon [pr]

* a few more
2025-08-29 09:52:07 -07:00
George Hotz
6540bb32a6 move into codegen late [pr] (#11823) 2025-08-24 10:23:25 -07:00
George Hotz
aefabaf774 add AxisType to range (#11798)
* add AxisType to range

* missed them

* fix that test

* fix that test
2025-08-23 11:15:00 -07:00
Sieds Lykles
5a6817d5f8 Fix z3 rendering of floats in indexing (#11740)
* Fix floating point comparison in indexing

* wrap in noop

* update tests

* improve rules for loading and comparing floats

* add test cast to bool
2025-08-23 05:56:19 +02:00
George Hotz
49a2583584 real new lowerer (#11419)
* real new lowerer

* fix group for reduce

* skip missing ranges

* fix wmma and unroll/contract

* real fix for wmma

* disable that test

* fix if gate

* simpler

* flash attention fusion works

* no end barriers

* still broken

* flash attention finally works
2025-07-29 15:35:51 -07:00
George Hotz
53339e62f7 no gate store anymore (#11338)
* no gate store anymore

* fix up spec
2025-07-22 18:41:15 -07:00
George Hotz
108aac8af4 use AddrSpace instead of local (#11314)
* use AddrSpace instead of local

* addrspace in test
2025-07-21 14:00:06 -07:00
George Hotz
532b52fcef store has a dtype, like assign (#11309)
* store has a dtype, like assign

* fix upat

* fix test
2025-07-21 12:50:01 -07:00
wozeparrot
5878b189b8 don't const fold shape changing bitcast (#11236) 2025-07-14 16:42:16 -07:00
Sieds Lykles
772cd02ad2 Perform index validation on load/store, not on the index (#10849)
* move index validation to load/stores

* add name

* add linearizer_failure

* add validate_store with implicit gates

* linearizer_failure_58 is fixed!

* add test_uop_graph test

* rename cond to gate

* test gated load/stores

* use or_casted()
2025-06-23 16:25:05 -07:00
George Hotz
a38947b4bb move symbolic and transcendental to uop [pr] (#10771) 2025-06-10 20:51:22 -07:00
George Hotz
9fc01c1e03 support for uop tags (#10477)
* support for uop tags [pr]

* test uop tags
2025-05-22 19:53:48 -07:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
George Hotz
603c03bef2 fix tests for rewrite [pr] (#10167)
* fix tests for rewrite [pr]

* cleaner

* delete linearize_uop

* clean up the rest
2025-05-05 19:19:49 -07:00
George Hotz
c3ff308abb range has only one src now [pr] (#10100)
* range has only one op now

* fix z3 checker

* ci fix

* needs shell

* try pip ensure update

* that ensurepip is useless

* upgrade pip before cache

* windows happy?
2025-04-29 10:31:05 -04:00
quortus
5cdc96409e Update outdated renderer.render calls (#10044) 2025-04-26 07:35:19 -04:00
Sieds Lykles
e75be6eafc [bounty] [pr] index validation with z3 (#9981)
* index validation with z3

* Change comment

* toposort -> toposort()

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 08:06:08 -04:00
George Hotz
2ed3acd767 toposort is a function [pr] (#10004) 2025-04-23 16:25:03 +01:00