Roelof van Dijk
26fcc8dff6
fix: remove runtime imports ( #1982 )
...
fix: import what is used
probably monkeypatched
fix: import
revert selective import
2023-10-07 05:23:08 -07:00
George Hotz
f54959e5cd
move print tree into graph ( #2003 )
...
* move print tree into graph
* add winograd profiling test
* change pre-commit to run ruff first
2023-10-07 04:39:21 -07:00
Ahmed Harmouche
2114dc13d1
Allow multi-input model export ( #1995 )
...
* Allow multi-input model export
* Add model export unit test
* Fix efficientnet compilation
* Only run model export test on JIT supported devices
* Skip export model test if not EXPORT_SUPPORTED_DEVICE
2023-10-07 04:13:34 -07:00
George Hotz
ffa33d743a
good changes from openpilot_compile2 ( #2000 )
...
* good changed from openpilot_compile2
* float32 image type was wrong
* cleaner way to write that + a test
2023-10-06 13:33:24 -07:00
chenyu
05be57f57f
Fix llama with empty prompt ( #1997 )
...
* fix llama with one token prompt
* llama is all_jitted
2023-10-06 06:48:07 -07:00
George Hotz
7a68060422
Revert "allow local + grouped reduce in hand_coded ( #1996 )" ( #1998 )
...
This reverts commit 219a1f7063 .
2023-10-06 06:43:28 -07:00
nimlgen
219a1f7063
allow local + grouped reduce in hand_coded ( #1996 )
...
* allow local + grouped reduce in hand_coded
* allowed loop size based on global_dims
* fix const
* fix const one more time
* better divisor
* a bit fix
* can take 2, why not
* fix linter
* better comments
* start with 2
* not always pick group reduce
* fix images
* better images
* better
2023-10-06 06:11:28 -07:00
George Hotz
fa9945dac0
remove stale tests
2023-10-06 02:14:56 -07:00
Vidhan Bhatt
94b21c41a7
ci: use mypy.ini ( #1993 )
2023-10-06 01:45:28 -07:00
George Hotz
e43d8977f8
Revert "chore: add py.typed marker. ( #1991 )" ( #1994 )
...
This reverts commit 6d581e8911 .
2023-10-06 01:44:34 -07:00
Vidhan Bhatt
6d581e8911
chore: add py.typed marker. ( #1991 )
...
* chore: add `py.typed` marker.
* fix: add comma
2023-10-05 16:27:33 -07:00
chenyu
da2b3e55f4
simpler llama - don't shrink twice ( #1981 )
2023-10-05 14:31:46 -07:00
Roelof van Dijk
972d9ea215
fix: PRUNEGRAPH is unused ( #1985 )
2023-10-05 14:28:43 -07:00
George Hotz
21a2c5df73
fix up contiguous ( #1978 )
2023-10-05 07:22:05 -07:00
chenyu
c99fa58dd2
simplify gpt2 example ( #1973 )
...
* simplify gpt2 example
* kernel_jitted_count and jit tests
* Revert "kernel_jitted_count and jit tests"
This reverts commit 31a3c26dd0 .
* all_jitted test in test_real_world
2023-10-05 07:09:29 -07:00
George Hotz
2d0c1037b1
Fix up latest openpilot model ( #1976 )
...
* fix gemv triggering for gemm
* fixup_openpilot
* external test issues
2023-10-05 05:24:28 -07:00
George Hotz
1862e14a4f
fix gemv triggering for gemm ( #1975 )
2023-10-05 05:23:00 -07:00
Francis Lam
0ba75c4370
optimizer: add matvec optimizations ( #1972 )
...
* optimizer: add matvec optimizations
* renderer: fix alignment of shared memory in opencl
2023-10-04 14:16:27 -07:00
George Hotz
3d5127038c
don't create linearizer if we are in the method cache ( #1969 )
...
* don't create linearizer if we are in the method cache
* remove unchecked properties
* that key isn't used
* fix default type is sticky
2023-10-04 12:42:58 -07:00
George Hotz
de5d603ec1
corealize + remove realize from lazybuffer ( #1968 )
...
* corealize + remove realize from lazybuffer
* fix multigpu
* fix graph
2023-10-04 10:59:31 -07:00
George Hotz
88b6ed6945
disable broken optim_conv2d
2023-10-04 07:33:50 -07:00
George Hotz
d449b3bef1
think about removing realize from lazybuffer ( #1965 )
...
* remove realize from lazybuffer
* okay fine, back that off
* fix tests maybe
* fix test
2023-10-04 07:18:58 -07:00
nimlgen
2ea1dd3e87
no process() in Linearizer ( #1966 )
...
* no process() in Linearizer
* more process() clean up
2023-10-04 07:18:42 -07:00
George Hotz
0945848b5f
schedule the loadops like everything else ( #1964 )
...
* schedule the loadops like everything else
* unify loadops with other things we schedule
* delete all the ops
* fix symbolic jit
2023-10-04 02:36:04 -07:00
Ahmed Harmouche
fb4d830a2a
Fix cast error in render_load in wgsl ( #1956 )
...
* Fix cast error in wgsl
* User render_cast intead of introducing new method
* Make it shorter
* Add back webgpu tests: efficientnet and dtypes
2023-10-04 02:29:14 -07:00
George Hotz
6a79d4044a
unrealized consts everywhere ( #1963 )
...
* unrealized consts everywhere
* don't import device from lazy
* Device isn't in Lazy
* same issue
* disable jit random
2023-10-04 01:48:10 -07:00
nimlgen
f04c1a63ae
Rand works in jit ( #1960 )
...
* rand works in jit
* better jitted rand creation
* Update realize.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-10-03 12:55:25 -07:00
George Hotz
f64d5b3ba8
move to realize.py ( #1961 )
...
* move to realize.py
* run_schedule moved
2023-10-03 07:25:40 -07:00
George Hotz
717451a244
Revert "optimizer: add matvec optimizations ( #1753 )" ( #1959 )
...
This reverts commit f520323054 .
2023-10-03 00:28:42 -07:00
Francis Lam
f520323054
optimizer: add matvec optimizations ( #1753 )
...
* optimizer: add matvec optimizations
* Update optimizer.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-10-03 00:01:59 -07:00
nimlgen
e1f2c2cc19
fix jitted dist ( #1955 )
2023-10-02 11:45:13 -04:00
Roelof van Dijk
35ac60775b
simplify line ( #1950 )
...
* no need to index here, zip automatically truncates
* enumerate is faster
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-10-02 03:19:15 -07:00
nimlgen
08e884217c
metal batch executor ( #1920 )
...
* metal batch executor
* no sym_infer in backends
* calc_stat in BasicBatchExecutor`
* run in batches of size 8
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-10-02 03:18:31 -07:00
George Hotz
d48a90859c
use the opts from the default device ( #1954 )
2023-10-02 03:13:46 -07:00
nimlgen
c27971d51f
fix llvm nan/inf const ( #1951 )
...
* allow llvm
* llvm works with inf/nan
* enable some fast math back
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-10-02 03:08:57 -07:00
George Hotz
6a4ec4776e
fix CI ( #1953 )
...
* this work
* unauth
* update in all places
2023-10-02 02:58:58 -07:00
Daniel Riege
579cabf668
Fix examples/train_efficientnet ( #1947 )
...
* added missing colon
* bug fixes for cifar10 dataset loading
needed a reshape to work with conv layers and resolve fetched tensor to numpy since further code expects numpy array
2023-10-02 02:23:38 -07:00
David Hou
d4671cd8e3
use schedule in more places in linearizer tests ( #1946 )
...
* pass current linearizer opts to Linearizer in TestFloat4
* use schedule instead of exec_ast hook
2023-10-02 02:22:56 -07:00
Roelof van Dijk
e7a49e84c8
perf: assert behind if is not optimized ( #1847 )
...
* perf: assert behind if is not optimized
* Update helpers.py
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-09-29 11:07:24 -07:00
David Hou
8e9db88474
expand after expr_idxs in Linearizer.global_load ( #1818 )
...
* small changes
* expand in terms of substitute, directly expand g_idxs g_valid
* delete expand_ops
* don't compare using hash
* any instead of in
thanks gijskoning
Co-authored-by: Gijs Koning <gijs-koning@live.nl >
* support tc
* testing code
* no more create_rednode
* maxsize none in view/node
* oops
* undo
* typing
* oops
* oops
* lmao
* lmao
* add expand multi test
* Node.iter_idxs
* type
* type
* delete checks!
* clean up a little?
* expand_idx in symbolic
* un-golf
* play around with types >.>
* test_substitute and also remove an incorrect test?
* get rid of range
* Update symbolic.py
* split out view cache change
* split out flat components change
* reduce diff
* reduce diff
* add some float4 tests
* fix
---------
Co-authored-by: Gijs Koning <gijs-koning@live.nl >
2023-09-29 10:33:34 -07:00
nimlgen
692bec7b6f
simplify CacheCollector ( #1944 )
...
* rewrite cc
* fix
* fix tests
* fix all tests
* is it better
* better with shape
* cleaner
* linter fix
* no ;
* better comment
* better comments
* no thneed changes
2023-09-29 10:13:04 -07:00
George Hotz
90326dbdc3
resnet50 hand coded optimization ( #1945 )
...
* resnet50 hand coded opt
* hand optimize one kernel
* opt in both places to fix test
2023-09-29 09:34:51 -07:00
George Hotz
a677a1e2cd
winograd test prints op count
2023-09-29 05:41:29 -07:00
George Hotz
4ff35e2b97
better resnet eval ( #1943 )
2023-09-29 05:40:25 -07:00
George Hotz
48c8d130ae
simpler GPT2 ( #1941 )
...
* don't realize in gpt2
* simpler gpt2
2023-09-29 04:41:09 -07:00
George Hotz
81cb120b0f
winograd speed test ( #1942 )
2023-09-29 04:40:35 -07:00
George Hotz
d52df788d3
remove RawConst and add test ( #1939 )
2023-09-29 01:21:51 -07:00
George Hotz
22b8576887
more lazy cleanup ( #1938 )
...
* small lazy cleanups
* a few more
* cleanups
* no more realizing in the scheduler test
* a few more minor things
* that was just wrong
* fix graph. the graph test was completely useless
* make graph usable
* fix op graph
2023-09-29 00:53:29 -07:00
nimlgen
2a49f7e456
fix transfer to mapped buffers ( #1923 )
2023-09-29 00:50:24 -07:00
Francis Lam
f445e056ed
wmma: add test and tensor core shape ( #1925 )
2023-09-28 18:04:28 -07:00