Commit Graph

896 Commits

Author SHA1 Message Date
Umut Zengin
776605f2fc O(1) VALIDHACKS (#2072)
* first refactoring

* O(1) validhacks

* O(1) validhacks

* Some cleaning

* mypy

* flake8

* Trim trim

* flake8

* clean

* less chaotic

* less chaotic

* flake8

* Symbolic, SumNode include mulnode for gcd

* fix tests

* smal optim

* revert

* clean

* clean

* flake8

* small fix

* Add symbolic test
2023-10-15 11:26:41 -07:00
mmmkkaaayy
91168a28c4 whisper: make file transcription work, add basic CI test (#2042) 2023-10-13 17:13:35 -07:00
George Hotz
924ecc4d6a Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)
This reverts commit 63869c62fc.
2023-10-13 12:01:55 -07:00
Amrit Sahu
63869c62fc openpilot kernel fix from 209 to 207 (#2006)
* Fix openpilot kernel from 209 to 206

1. Use push_movement_ops conditions in _movement_op. Don't push
PAD or check if the ops are safe to be pushed with PAD

2. Don't push if all the op.buffers are realized

* change ALLOWED_KERNEL_COUNT to 206 for openpilot

* don't push through sourceless buffers

* change the tests to adjust kernel counts for new behaviour

* restore pushing of movement ops through childless buffer

* don't push EXPAND, causes OOM

* allow push of intermediate movement ops

* adding new test behaviour

* modifying external_test_opt for new behaviour

* restore old tests

* Reenable push of EXPAND and introduce new tests

I was wrong intially thinking EXPAND can cause OOM and hence I had
disabled it. Since it is 0 stride and doesn't allocate memory its cool

* Don't push EXPAND above LoadOps LB. This is causing OOM

* Push should be decided on movement root of bufs

To check if ast.op.buffers is sourceless/ realized go the the movement
root and then decide if pushing should be done or not

* refactor for readability

* use .base instead

* don't push expand, bad memory/compute consumption

* restrict push of reshape, seeing improvement

* push reshape if unary without further check

* disable PAD solves convnext kernel count increase

* reenable test_cache_binaryop_transpose

* small nit
2023-10-13 11:59:15 -07:00
George Hotz
90c777d815 remove apply_auto_opt (#2063) 2023-10-13 07:44:14 -07:00
nimlgen
bd42fa0b73 kernel cache (#2035)
* init compiled cache

* clang not compile to stdout

* use kwrags in compile

* remove some useless lines

* slimmer

* fix

* tabs

* retry

* remove decorator

* no race in hip

* smaller hip

* unused import

* unused pathlib

* path to str

* add test

* fix linter

* less lines?

* decorator is back

* update tests

* no hip version

* better comments

* a bit better test

* linter

* work wo decorator

* linter happy

* simpler return type

* more tests

* better comment

* readable

* readable

* readable

* compile returns bytes

* no ununsed imports

* readable
2023-10-13 06:32:01 -07:00
Umut Zengin
6b7ac5c431 ModNode __mod__ rule (#2039)
* Implement mod rule

* mypy

* feat: New test added
2023-10-12 11:30:10 -07:00
George Hotz
c5edb3c374 train value net, improve API, add BCE (#2047)
* api cleanups, BCE losses

* valuenet

* fixup examples

* learning okay

* add valuenet runner

* net improvements

* net improvements

* 40% win rate
2023-10-12 07:56:38 -07:00
geohotstan
8d6cecb25c Torch eq fix (#1562)
* init

* Revert "init"

This reverts commit 682bf2073a.

* kids dont do drugs

* one way to fix

* resolve merge conflict

* no more or

* clean up
2023-10-11 12:57:11 -07:00
George Hotz
41bfeb2c1e start work on auto opt (#2034)
* start work on auto opt

* lin failure

* not beating hcopt

* greedy

* timing is fast

* codegen.search

* greedy search in handcode_opt

* track running gflops

* clean up those files

* no failure
2023-10-11 12:54:53 -07:00
Francis Lam
81c7d750db test: fix test_linearizer.test_tensor_core test (#2036)
must use apply_tensor_core instead of hand_coded_optimizations
2023-10-10 14:48:28 -07:00
chenyu
e2b83f1b42 Variable.bind newer (#2017)
* Variable.bind attempt 2

* ShapeTracker.unbind

* fix llama

* fix types

* test case

* View.vars cleanup

* include mask in symbolic source

* mask can be sint

* st.unbind in bufferops

* assert ast contain free Variable only

* cleanup

* conservative unbinding reduce op arg

* move reduceop unbind

* fix llama JIT arg behavior
2023-10-10 10:03:01 -07:00
qazal
e40f141203 Refactor and add more unit tests for disktensors (#2022)
* testing with the test_ops pattern

* add assign test

* flake8 complaining about single line fn

* slice 2d and minor cleanup

* make assign_slice a one-liner

* we dont need to repeat the same lambda twice, default tinygrad_fxn to be np_fxn

* back assign fn for np array

* implement __setitem__ in tensor.py

* dont re-slice the ret tesnsor

* one liner assign

* drop the permute test
2023-10-09 18:46:29 -07:00
Luca Sciarpa
e93e240a6c adapting test/external/external_osx_profiling.py to the new code base (#2002)
* adapting external osx profiling

* fixing dtype

* fixing buffer size
2023-10-08 05:55:00 -07:00
George Hotz
cea4cbfc7a move image+kopt to features (#2015)
* move image+kopt to features

* fix tests

* debug prints (unrelated)
2023-10-07 15:41:08 -07:00
nimlgen
d07ac379f9 add var_vals to kopt with symbolic (#2008)
* add var_vals to kopt with symbolic again

* no copies
2023-10-07 09:34:21 -07:00
George Hotz
121f7aa8c5 Schedule item (#2012)
* ScheduleItem

* put var_vals in the schedule

* fix tests, wow that proliferated quickly

* not ready to be in the schedule
2023-10-07 08:59:25 -07:00
George Hotz
f54959e5cd move print tree into graph (#2003)
* move print tree into graph

* add winograd profiling test

* change pre-commit to run ruff first
2023-10-07 04:39:21 -07:00
Ahmed Harmouche
2114dc13d1 Allow multi-input model export (#1995)
* Allow multi-input model export

* Add model export unit test

* Fix efficientnet compilation

* Only run model export test on JIT supported devices

* Skip export model test if not EXPORT_SUPPORTED_DEVICE
2023-10-07 04:13:34 -07:00
George Hotz
ffa33d743a good changes from openpilot_compile2 (#2000)
* good changed from openpilot_compile2

* float32 image type was wrong

* cleaner way to write that + a test
2023-10-06 13:33:24 -07:00
chenyu
05be57f57f Fix llama with empty prompt (#1997)
* fix llama with one token prompt

* llama is all_jitted
2023-10-06 06:48:07 -07:00
George Hotz
fa9945dac0 remove stale tests 2023-10-06 02:14:56 -07:00
George Hotz
21a2c5df73 fix up contiguous (#1978) 2023-10-05 07:22:05 -07:00
chenyu
c99fa58dd2 simplify gpt2 example (#1973)
* simplify gpt2 example

* kernel_jitted_count and jit tests

* Revert "kernel_jitted_count and jit tests"

This reverts commit 31a3c26dd0.

* all_jitted test in test_real_world
2023-10-05 07:09:29 -07:00
George Hotz
2d0c1037b1 Fix up latest openpilot model (#1976)
* fix gemv triggering for gemm

* fixup_openpilot

* external test issues
2023-10-05 05:24:28 -07:00
George Hotz
3d5127038c don't create linearizer if we are in the method cache (#1969)
* don't create linearizer if we are in the method cache

* remove unchecked properties

* that key isn't used

* fix default type is sticky
2023-10-04 12:42:58 -07:00
George Hotz
de5d603ec1 corealize + remove realize from lazybuffer (#1968)
* corealize + remove realize from lazybuffer

* fix multigpu

* fix graph
2023-10-04 10:59:31 -07:00
George Hotz
d449b3bef1 think about removing realize from lazybuffer (#1965)
* remove realize from lazybuffer

* okay fine, back that off

* fix tests maybe

* fix test
2023-10-04 07:18:58 -07:00
nimlgen
2ea1dd3e87 no process() in Linearizer (#1966)
* no process() in Linearizer

* more process() clean up
2023-10-04 07:18:42 -07:00
Ahmed Harmouche
fb4d830a2a Fix cast error in render_load in wgsl (#1956)
* Fix cast error in wgsl

* User render_cast intead of introducing new method

* Make it shorter

* Add back webgpu tests: efficientnet and dtypes
2023-10-04 02:29:14 -07:00
George Hotz
6a79d4044a unrealized consts everywhere (#1963)
* unrealized consts everywhere

* don't import device from lazy

* Device isn't in Lazy

* same issue

* disable jit random
2023-10-04 01:48:10 -07:00
nimlgen
f04c1a63ae Rand works in jit (#1960)
* rand works in jit

* better jitted rand creation

* Update realize.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-03 12:55:25 -07:00
George Hotz
f64d5b3ba8 move to realize.py (#1961)
* move to realize.py

* run_schedule moved
2023-10-03 07:25:40 -07:00
nimlgen
e1f2c2cc19 fix jitted dist (#1955) 2023-10-02 11:45:13 -04:00
George Hotz
d48a90859c use the opts from the default device (#1954) 2023-10-02 03:13:46 -07:00
David Hou
d4671cd8e3 use schedule in more places in linearizer tests (#1946)
* pass current linearizer opts to Linearizer in TestFloat4

* use schedule instead of exec_ast hook
2023-10-02 02:22:56 -07:00
David Hou
8e9db88474 expand after expr_idxs in Linearizer.global_load (#1818)
* small changes

* expand in terms of substitute, directly expand g_idxs g_valid

* delete expand_ops

* don't compare using hash

* any instead of in

thanks gijskoning

Co-authored-by: Gijs Koning <gijs-koning@live.nl>

* support tc

* testing code

* no more create_rednode

* maxsize none in view/node

* oops

* undo

* typing

* oops

* oops

* lmao

* lmao

* add expand multi test

* Node.iter_idxs

* type

* type

* delete checks!

* clean up a little?

* expand_idx in symbolic

* un-golf

* play around with types >.>

* test_substitute and also remove an incorrect test?

* get rid of range

* Update symbolic.py

* split out view cache change

* split out flat components change

* reduce diff

* reduce diff

* add some float4 tests

* fix

---------

Co-authored-by: Gijs Koning <gijs-koning@live.nl>
2023-09-29 10:33:34 -07:00
nimlgen
692bec7b6f simplify CacheCollector (#1944)
* rewrite cc

* fix

* fix tests

* fix all tests

* is it better

* better with shape

* cleaner

* linter fix

* no ;

* better comment

* better comments

* no thneed changes
2023-09-29 10:13:04 -07:00
George Hotz
a677a1e2cd winograd test prints op count 2023-09-29 05:41:29 -07:00
George Hotz
81cb120b0f winograd speed test (#1942) 2023-09-29 04:40:35 -07:00
George Hotz
d52df788d3 remove RawConst and add test (#1939) 2023-09-29 01:21:51 -07:00
George Hotz
22b8576887 more lazy cleanup (#1938)
* small lazy cleanups

* a few more

* cleanups

* no more realizing in the scheduler test

* a few more minor things

* that was just wrong

* fix graph. the graph test was completely useless

* make graph usable

* fix op graph
2023-09-29 00:53:29 -07:00
nimlgen
2a49f7e456 fix transfer to mapped buffers (#1923) 2023-09-29 00:50:24 -07:00
Francis Lam
f445e056ed wmma: add test and tensor core shape (#1925) 2023-09-28 18:04:28 -07:00
Yixiang Gao
094d3d71be with Tensor.train() (#1935)
* add with.train

* remove the rest TODOs

* fix pyflake

* fix pyflake error

* fix mypy
2023-09-28 18:02:31 -07:00
wozeparrot
70671d9625 fix test_collectives (#1934)
* fix: fix test_collectives.py

* feat: reenable test_collectives
2023-09-28 11:02:22 -07:00
George Hotz
adab724caa schedule2, keep the tests working with small changes (#1932)
* lazy cleanups

* ast functions take in LazyOps

* op instead of self.op

* _base for mops

* fix contiguous

* start schedule

* test_schedule

* fix openpilot

* more tests

* bugfix and test skip

* work

* make sure things get freed

* fix zerosized tensors

* fix failing test

* fix ceil and friends

* fix openpilot

* disable training

* disable test collectives
2023-09-28 09:14:43 -07:00
George Hotz
c907efbf4a reorder a few things (#1915)
* reorder a few things

* huh, that has to be there

* move apply shapetracker

* BufferOps

* only for type checking
2023-09-25 10:17:21 +08:00
George Hotz
6d9065ed1c Minor cleanups (#1911)
* cleanups

* remove that simplify
2023-09-24 21:32:50 +08:00
George Hotz
20059dc55b Make ShapeTracker Immutable (#1909)
* ugh

* ops test pass

* fix shapetracker tests

* sym shapetracker

* shapetracker is a tuple of views now

* from_shape

* fix has variable shape

* key isn't needed

* post init assert
2023-09-24 21:09:03 +08:00