Commit Graph

42 Commits

Author SHA1 Message Date
George Hotz
6fd1332763 update some tests for less Kernel (#11543)
* update some tests for less Kernel

* get_program update
2025-08-06 14:19:59 -07:00
Ignacio Sica
21f1c4cc09 remove some linearize calls from tests [pr] (#10978)
* remove some linearize calls from tests

speed_compare_cuda_ptx
test_uop_spec
test_linearizer
test_uops
test_winograd

* more clear assert message
2025-06-25 12:37:17 -07:00
George Hotz
92678e59ee move kernel to opt (#10899) 2025-06-20 15:22:28 -07:00
George Hotz
b68b7dbc2a test winograd is close to normal conv [pr] (#10557)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-06 19:11:49 -04:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
George Hotz
2ed3acd767 toposort is a function [pr] (#10004) 2025-04-23 16:25:03 +01:00
chenyu
6c30948df6 hand_coded_optimizations returns list[Opt] [pr] (#9938)
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
chenyu
8c6299bced move hand_coded_optimizations to heuristic.py [pr] (#9844)
* move hand_coded_optimizations to heuristic.py [pr]

also folded all long lines

* make a copy and rename self -> k

* fix test
2025-04-10 23:40:16 -04:00
qazal
866dfa1f23 create_schedule([x.lazydata]) -> x.schedule() in tests (#8449) 2024-12-31 03:15:52 +08:00
George Hotz
275951b730 clean up a few parents -> toposort [pr] (#7984)
* clean up a few parents -> toposort [pr]

* rename to old_parents + sched tests

* a few more

* that one

* second to last

* final
2024-12-02 15:59:31 +08:00
chenyu
69e382216d fix wino conv output dtype for half inputs (#7829) 2024-11-21 12:13:54 -05:00
George Hotz
c8bf09b7d4 s/UOps/Ops (#7500)
* s/UOps/Ops [pr]

* fix
2024-11-03 11:26:10 +08:00
qazal
28c75bf2a6 merge uops with ops (#6111)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-08-16 18:17:57 -04:00
qazal
c23d44c779 AST is UOp (#6030)
* most of the work from the uops2 branch

* schedule

* realize

* kernel

* lowerer

* search

* green

* merge uops with ops

* Revert "merge uops with ops"

This reverts commit 1408a59f12.

* fix benchmark

* remove extra dedup
2024-08-16 22:09:00 +03:00
qazal
4d38fec8c1 rename lazyops to parents [run_process_replay] (#6091) 2024-08-15 17:27:32 +03:00
George Hotz
fa7e734b49 MetaOps.KERNEL (#5543) 2024-07-17 19:41:23 -07:00
George Hotz
03c2dc8bd7 lowerer is kernel [run_process_replay] (#5437) 2024-07-12 18:50:55 -07:00
George Hotz
870dc8c350 s/Linearizer/Lowerer [run_process_replay] (#5428) 2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0 scheduleitem is not Tuple [run_process_replay] (#5425)
* scheduleitem is not Tuple [run_process_replay]

* fix tests

* fix op + fuzzers

* fix mop test
2024-07-12 15:13:19 -07:00
George Hotz
f6ef283e6a s/loadops/metaops [run_process_replay] (#5421) 2024-07-12 13:26:50 -07:00
George Hotz
6972a2569f Linearizer -> Lowerer (#4957)
* st to uops function

* lowerer

* uops reduce

* uops reduce

* acc_number correct

* reduce unroll

* complete unroll

* do upcasts

* handle multioutput

* define_accs

* fix valid

* get grouped dims

* revert lin

* minor

* fixup_ast

* group for reduce

* group works now

* all forwards pass

* all ops tests pass

* fix clang

* mypy

* lil cleanups, no image yet

* ugh, variables everywhere

* bugfix

* counters and name fix

* use symbolic, not uops

* cleanups

* Fix tests

* linearizer tests

* expands

* float4 expand load

* tests pass

* woooo, float4 test

* test ops works again

* one more lin test

* more lin tests

* bypass

* fix tests

* something like this

* const in defineacc

* uops get_reduce_acc

* move around

* allow consts in the LOAD/STORE

* each axis should only appear once, 21 failures

* 16 failures

* fix some image

* optional float4

* onnx tests

* gate the stores

* add reorder

* fix terrible skip function

* tc work

* opt add/mul merge

* fix float4 tests

* tiny tweak, 9 failing

* 7 test failures

* start tc, but i don't think this will work

* progress on tensorcores

* note

* fix ops tests

* closer on tc

* weeee...one tensor core works

* still works, more generic

* large WMMA works

* tc test passes

* use WMMA as accumulator

* basic tc tests passing

* small gemm padded works

* 4 failures

* 3 tests failing

* super barrier

* now two tests failing

* one test failing

* cleanpus, add reduce to UopGraph

* remove the linearizer

* remove unused

* lil cleanups

* Lowerer everywhere

* remove test that doesn't exist now

* image indexing

* llvm fix

* fix metal

* fix image

* fix images

* might fix ptx

* fix image type mismatch

* more tests pass

* CAST -> VECTORIZE

* forgot that one

* fix TestOps.test_flip_eye_crash

* locals shouldn't be image dtype

* change less files

* test fix

* fix recursive expands

* touches

* MULACC support in python

* delete unneeded

* alu before contract

* bug fixes

* tests

* no var multireduce

* simpler tc

* metal works in new style

* working on AMD and METAL

* fix amd

* shot in the dark, fix amd

* something for CUDA

* CUDA WORKS from the docs

* comment

* correct merge

* cleanups + ptx fix + get_reduce_acc

* local alias isn't used anymore

* add store sanity check

* fix for AMD

* cleanups and single expand pass

* more correct with acc_cache

* tests should pass

* block on WMMA

* tests pass

* merge contract and reduce

* contractor fixes issue

* multicontract

* pre expand wmma (same as a reduce)

* expand wmma and only take one

* all expands

* comments and whitespace
2024-07-10 15:07:42 -07:00
George Hotz
d094a6828f single pass rewrite (#5159)
* single pass rewrite

* claude cleanups

* claude cleanups

* skip those tests

* restrict that to ints

* comment

* asserts i don't expect to fail do fail

* simplest...rewrite...ever

* simplest...rewrite...ever

* add that rule back

* tests pass?

* only collapse reduce loops

* second SHL/SHR arg must be 4 bytes

* fix verify

* no SHL/SHR in ptx

* put that back

* skip them in PTX...bad tests
2024-06-27 11:36:05 -07:00
George Hotz
07b350a8f4 new uops is an actual graph (#4560)
* new uops is an actual graph

* it's way slower

* simpler

* fix define acc

* render_loop unique

* ops test pass

* add pattern matcher back, there's bugs

* rewrite

* use priority queue

* recursive children

* fix tests

* fix tests with SINK

* fix abstractions

* fix assembly

* simpler

* link define_acc

* fix DEFINE_ACC placement

* type verify

* full cmp

* fix cmp

* ACCESS_ACC

* insert DEFINE_ACC

* fix PHI

* recursive rewrite

* fix many tests

* sum collapse

* more patterns

* correct change

* fold arange

* fix that lin test

* space

* big folding rule works

* close

* has more maxes, meh

* cached node replace

* set changed

* simplest folding yet

* works

* works

* DIV

* all tests pass

* del

* fuzz linearizer fails

* sum_collapse

* test depth 2 cf

* fix lin test 14

* fix clang depth

* disable that

* failure 14 is fixed

* fix ptx

* failure 27 is fixed

* fix llama

* run_cnt

* Revert "Optimize PTX gated loads index calculation (#4304)"

This reverts commit d97d5a7689.

* fix uops loop

* fix ptx bugs

* add barrier

* print

* mem_type in ptx direct

* bypass tests that fail in CI but pass locally

* ptx remove ptr_ar

* more ptx passing

* fix ptx tests

* assert compile support

* remove  model inference benchmark from red
2024-05-17 18:00:18 -07:00
George Hotz
68ca4d4276 split to schedule.py (#3949)
* split to schedule.py

* split
2024-03-26 21:02:46 -07:00
George Hotz
150ea2eb76 create engine folder and move code (#3948)
* retry

* older tf

* that
2024-03-26 20:38:03 -07:00
qazal
337cd53444 multioutput ScheduleItem (#3699)
* refactor realize.py

* update docs

* update test_sched

* update runners and devices

* update openpilot and unit tests

* cleanup runner lowering

* update more tests
2024-03-13 08:59:38 -07:00
George Hotz
1b6e890ef2 uops flop counter (#3373)
* factor out winograd functions

* test counter

* uops flop counter

* more correct

* ish

* correct

* cleanup

* tests for uops flop counter

* tests still fail

* fix symbolic uops flop cnt

* fix symbolic uops flop cnt

* hmm, it's an alu

* uops alu resolve

* relax that
2024-02-20 09:36:30 +01:00
George Hotz
2e60012bcf move create schedule and delete old API (#3377)
* move create schedule and delete old API

* fix test multitensor
2024-02-12 18:10:45 +01:00
George Hotz
0f6cde243d import from wino_cleanup (#3374) 2024-02-12 16:26:50 +01:00
David Hou
aebaab011f faster wino compile by catting consts across data expand dim (#3293)
* PoC faster wino compile by catting consts across data expand dim

* fix fusions

* faster + golf it

* noqa 501

* implicit broadcast

* Revert "implicit broadcast"

This reverts commit 5915a9083d045ec1e6be84dcb492333325d48666.

* shorter

* shorter

* oops

* 216 upcasts is probably fine

* wino kernel count test

* test winograd number of sts

* specify device for apply_matrix mat elements
2024-02-02 03:47:45 -05:00
George Hotz
09f2952dc3 reintroduce merge views in update benchmark (#3279)
* Reapply "take merge views from corsix branch" (#3278)

This reverts commit d298916232.

* reintroduce merge views
2024-01-30 09:47:20 -08:00
George Hotz
d298916232 Revert "take merge views from corsix branch" (#3278) 2024-01-30 09:34:28 -08:00
George Hotz
b57a16aa89 take merge views from corsix branch (#3273)
* take merge views from corsix branch

* better DEBUG

* max views

* remove view.py change

* Revert "remove view.py change"

This reverts commit f3025f4f39.

* only allow filter on non symbolic

* oops, correct fix

* comment to explain
2024-01-30 09:25:16 -08:00
George Hotz
085dc87bed winograd should be 4 kernels (#3268) 2024-01-28 09:21:26 -08:00
chenyu
e52a609240 make WINO a context var, and LATEWINO in hlb_cifar (#3161) 2024-01-17 20:21:26 -05:00
George Hotz
7da2325dc7 get_lazyops() -> lazyops (#2884)
* get_lazyops() -> lazyops

* don't compare empty mem
2023-12-20 18:04:49 -08:00
Friedrich Carl Eichenroth
75676ab8e1 Profiling-helper (#2321)
* change profiler

* remove unused imports

* remove unused imports

* change lazybuffer references

* remove unused line

* remove unused import

* remove unused stuff

* add types

* typing

* typing

* typing

* trigger actions

* -1 loc

* fixup

* trigger actions

* revert lazy typing changes

* WIP profiler helper

* replace old start & stop profiler

* fixup

* linting

* Update llama.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 14:15:56 -08:00
George Hotz
15da96f393 print test durations and add speed (#2107)
* print test durations

* decrease sizes to increase speed

* faster

* GPU/CLANG onnx in seperate runner

* test split, move ONNX CPU CI

* simpler tests

* simpler uops test

* faster

* less cuda apt

* running ninja install

* apt install

* split fancy indexing
2023-10-18 13:46:42 -07:00
George Hotz
121f7aa8c5 Schedule item (#2012)
* ScheduleItem

* put var_vals in the schedule

* fix tests, wow that proliferated quickly

* not ready to be in the schedule
2023-10-07 08:59:25 -07:00
George Hotz
f54959e5cd move print tree into graph (#2003)
* move print tree into graph

* add winograd profiling test

* change pre-commit to run ruff first
2023-10-07 04:39:21 -07:00
George Hotz
a677a1e2cd winograd test prints op count 2023-09-29 05:41:29 -07:00
George Hotz
81cb120b0f winograd speed test (#1942) 2023-09-29 04:40:35 -07:00