Commit Graph

2576 Commits

Author SHA1 Message Date
Roelof van Dijk
26fcc8dff6 fix: remove runtime imports (#1982)
fix: import what is used

probably monkeypatched

fix: import

revert selective import
2023-10-07 05:23:08 -07:00
George Hotz
f54959e5cd move print tree into graph (#2003)
* move print tree into graph

* add winograd profiling test

* change pre-commit to run ruff first
2023-10-07 04:39:21 -07:00
Ahmed Harmouche
2114dc13d1 Allow multi-input model export (#1995)
* Allow multi-input model export

* Add model export unit test

* Fix efficientnet compilation

* Only run model export test on JIT supported devices

* Skip export model test if not EXPORT_SUPPORTED_DEVICE
2023-10-07 04:13:34 -07:00
George Hotz
ffa33d743a good changes from openpilot_compile2 (#2000)
* good changed from openpilot_compile2

* float32 image type was wrong

* cleaner way to write that + a test
2023-10-06 13:33:24 -07:00
chenyu
05be57f57f Fix llama with empty prompt (#1997)
* fix llama with one token prompt

* llama is all_jitted
2023-10-06 06:48:07 -07:00
George Hotz
7a68060422 Revert "allow local + grouped reduce in hand_coded (#1996)" (#1998)
This reverts commit 219a1f7063.
2023-10-06 06:43:28 -07:00
nimlgen
219a1f7063 allow local + grouped reduce in hand_coded (#1996)
* allow local + grouped reduce in hand_coded

* allowed loop size based on global_dims

* fix const

* fix const one more time

* better divisor

* a bit fix

* can take 2, why not

* fix linter

* better comments

* start with 2

* not always pick group reduce

* fix images

* better images

* better
2023-10-06 06:11:28 -07:00
George Hotz
fa9945dac0 remove stale tests 2023-10-06 02:14:56 -07:00
Vidhan Bhatt
94b21c41a7 ci: use mypy.ini (#1993) 2023-10-06 01:45:28 -07:00
George Hotz
e43d8977f8 Revert "chore: add py.typed marker. (#1991)" (#1994)
This reverts commit 6d581e8911.
2023-10-06 01:44:34 -07:00
Vidhan Bhatt
6d581e8911 chore: add py.typed marker. (#1991)
* chore: add `py.typed` marker.

* fix: add comma
2023-10-05 16:27:33 -07:00
chenyu
da2b3e55f4 simpler llama - don't shrink twice (#1981) 2023-10-05 14:31:46 -07:00
Roelof van Dijk
972d9ea215 fix: PRUNEGRAPH is unused (#1985) 2023-10-05 14:28:43 -07:00
George Hotz
21a2c5df73 fix up contiguous (#1978) 2023-10-05 07:22:05 -07:00
chenyu
c99fa58dd2 simplify gpt2 example (#1973)
* simplify gpt2 example

* kernel_jitted_count and jit tests

* Revert "kernel_jitted_count and jit tests"

This reverts commit 31a3c26dd0.

* all_jitted test in test_real_world
2023-10-05 07:09:29 -07:00
George Hotz
2d0c1037b1 Fix up latest openpilot model (#1976)
* fix gemv triggering for gemm

* fixup_openpilot

* external test issues
2023-10-05 05:24:28 -07:00
George Hotz
1862e14a4f fix gemv triggering for gemm (#1975) 2023-10-05 05:23:00 -07:00
Francis Lam
0ba75c4370 optimizer: add matvec optimizations (#1972)
* optimizer: add matvec optimizations

* renderer: fix alignment of shared memory in opencl
2023-10-04 14:16:27 -07:00
George Hotz
3d5127038c don't create linearizer if we are in the method cache (#1969)
* don't create linearizer if we are in the method cache

* remove unchecked properties

* that key isn't used

* fix default type is sticky
2023-10-04 12:42:58 -07:00
George Hotz
de5d603ec1 corealize + remove realize from lazybuffer (#1968)
* corealize + remove realize from lazybuffer

* fix multigpu

* fix graph
2023-10-04 10:59:31 -07:00
George Hotz
88b6ed6945 disable broken optim_conv2d 2023-10-04 07:33:50 -07:00
George Hotz
d449b3bef1 think about removing realize from lazybuffer (#1965)
* remove realize from lazybuffer

* okay fine, back that off

* fix tests maybe

* fix test
2023-10-04 07:18:58 -07:00
nimlgen
2ea1dd3e87 no process() in Linearizer (#1966)
* no process() in Linearizer

* more process() clean up
2023-10-04 07:18:42 -07:00
George Hotz
0945848b5f schedule the loadops like everything else (#1964)
* schedule the loadops like everything else

* unify loadops with other things we schedule

* delete all the ops

* fix symbolic jit
2023-10-04 02:36:04 -07:00
Ahmed Harmouche
fb4d830a2a Fix cast error in render_load in wgsl (#1956)
* Fix cast error in wgsl

* User render_cast intead of introducing new method

* Make it shorter

* Add back webgpu tests: efficientnet and dtypes
2023-10-04 02:29:14 -07:00
George Hotz
6a79d4044a unrealized consts everywhere (#1963)
* unrealized consts everywhere

* don't import device from lazy

* Device isn't in Lazy

* same issue

* disable jit random
2023-10-04 01:48:10 -07:00
nimlgen
f04c1a63ae Rand works in jit (#1960)
* rand works in jit

* better jitted rand creation

* Update realize.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-03 12:55:25 -07:00
George Hotz
f64d5b3ba8 move to realize.py (#1961)
* move to realize.py

* run_schedule moved
2023-10-03 07:25:40 -07:00
George Hotz
717451a244 Revert "optimizer: add matvec optimizations (#1753)" (#1959)
This reverts commit f520323054.
2023-10-03 00:28:42 -07:00
Francis Lam
f520323054 optimizer: add matvec optimizations (#1753)
* optimizer: add matvec optimizations

* Update optimizer.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-03 00:01:59 -07:00
nimlgen
e1f2c2cc19 fix jitted dist (#1955) 2023-10-02 11:45:13 -04:00
Roelof van Dijk
35ac60775b simplify line (#1950)
* no need to index here, zip automatically truncates

* enumerate is faster

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-02 03:19:15 -07:00
nimlgen
08e884217c metal batch executor (#1920)
* metal batch executor

* no sym_infer in backends

* calc_stat in BasicBatchExecutor`

* run in batches of size 8

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-02 03:18:31 -07:00
George Hotz
d48a90859c use the opts from the default device (#1954) 2023-10-02 03:13:46 -07:00
nimlgen
c27971d51f fix llvm nan/inf const (#1951)
* allow llvm

* llvm works with inf/nan

* enable some fast math back

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-02 03:08:57 -07:00
George Hotz
6a4ec4776e fix CI (#1953)
* this work

* unauth

* update in all places
2023-10-02 02:58:58 -07:00
Daniel Riege
579cabf668 Fix examples/train_efficientnet (#1947)
* added missing colon

* bug fixes for cifar10 dataset loading
needed a reshape to work with conv layers and resolve fetched tensor to numpy since further code expects numpy array
2023-10-02 02:23:38 -07:00
David Hou
d4671cd8e3 use schedule in more places in linearizer tests (#1946)
* pass current linearizer opts to Linearizer in TestFloat4

* use schedule instead of exec_ast hook
2023-10-02 02:22:56 -07:00
Roelof van Dijk
e7a49e84c8 perf: assert behind if is not optimized (#1847)
* perf: assert behind if is not optimized

* Update helpers.py

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-29 11:07:24 -07:00
David Hou
8e9db88474 expand after expr_idxs in Linearizer.global_load (#1818)
* small changes

* expand in terms of substitute, directly expand g_idxs g_valid

* delete expand_ops

* don't compare using hash

* any instead of in

thanks gijskoning

Co-authored-by: Gijs Koning <gijs-koning@live.nl>

* support tc

* testing code

* no more create_rednode

* maxsize none in view/node

* oops

* undo

* typing

* oops

* oops

* lmao

* lmao

* add expand multi test

* Node.iter_idxs

* type

* type

* delete checks!

* clean up a little?

* expand_idx in symbolic

* un-golf

* play around with types >.>

* test_substitute and also remove an incorrect test?

* get rid of range

* Update symbolic.py

* split out view cache change

* split out flat components change

* reduce diff

* reduce diff

* add some float4 tests

* fix

---------

Co-authored-by: Gijs Koning <gijs-koning@live.nl>
2023-09-29 10:33:34 -07:00
nimlgen
692bec7b6f simplify CacheCollector (#1944)
* rewrite cc

* fix

* fix tests

* fix all tests

* is it better

* better with shape

* cleaner

* linter fix

* no ;

* better comment

* better comments

* no thneed changes
2023-09-29 10:13:04 -07:00
George Hotz
90326dbdc3 resnet50 hand coded optimization (#1945)
* resnet50 hand coded opt

* hand optimize one kernel

* opt in both places to fix test
2023-09-29 09:34:51 -07:00
George Hotz
a677a1e2cd winograd test prints op count 2023-09-29 05:41:29 -07:00
George Hotz
4ff35e2b97 better resnet eval (#1943) 2023-09-29 05:40:25 -07:00
George Hotz
48c8d130ae simpler GPT2 (#1941)
* don't realize in gpt2

* simpler gpt2
2023-09-29 04:41:09 -07:00
George Hotz
81cb120b0f winograd speed test (#1942) 2023-09-29 04:40:35 -07:00
George Hotz
d52df788d3 remove RawConst and add test (#1939) 2023-09-29 01:21:51 -07:00
George Hotz
22b8576887 more lazy cleanup (#1938)
* small lazy cleanups

* a few more

* cleanups

* no more realizing in the scheduler test

* a few more minor things

* that was just wrong

* fix graph. the graph test was completely useless

* make graph usable

* fix op graph
2023-09-29 00:53:29 -07:00
nimlgen
2a49f7e456 fix transfer to mapped buffers (#1923) 2023-09-29 00:50:24 -07:00
Francis Lam
f445e056ed wmma: add test and tensor core shape (#1925) 2023-09-28 18:04:28 -07:00