George Hotz
67e34b356a
good stuff from tensor cores branch ( #1199 )
2023-07-08 16:58:26 -07:00
George Hotz
7151382364
Refactor load/store before tensor cores ( #1193 )
...
* minor cleanups
* render_const
* now that's a nice refactor
* clean up vload/vstore
* clean up render_load
* debugs there
* dumb
* err, this?
* const float4
* what's failing
* bugfix
* statement includes semicolon
* bugfix
2023-07-08 15:54:58 -07:00
fluffy χατγιρλ
ef1909500e
remove superfluous parentheses ( #1197 )
2023-07-08 15:11:02 -07:00
fluffy χατγιρλ
628ee46627
Fix bug where Tensor.randn returns inf ( #1192 )
...
* fix randn inf bug
* add test
* more compact test
* clarify test purpose
2023-07-08 12:03:46 -07:00
George Hotz
d9c1d81e99
Revert "feat: cancel previous workflow runs on new commits ( #1184 )" ( #1194 )
...
This reverts commit d66a0c285d .
2023-07-08 11:26:13 -07:00
George Hotz
52600d532e
add 20 minute timeout
2023-07-07 23:02:28 -07:00
wozeparrot
d66a0c285d
feat: cancel previous workflow runs on new commits ( #1184 )
2023-07-07 22:55:35 -07:00
Jacky Lee
e0c2ae8984
Update file paths ( #1179 )
2023-07-07 18:41:58 -07:00
George Hotz
0ad99038ef
Revert "Revert "Fix ShapeTracker mismatch in LazyBuffer.fromCPU ( #1156 )" ( #1181 )" + add test
...
This reverts commit a374b62bfe .
2023-07-07 18:37:04 -07:00
George Hotz
2952b8e7a8
Fix up abstractions.py to include the Linearizer ( #1177 )
...
* fix up docs
* remove pow, add sqrt
2023-07-07 18:33:51 -07:00
George Hotz
a374b62bfe
Revert "Fix ShapeTracker mismatch in LazyBuffer.fromCPU ( #1156 )" ( #1181 )
...
This reverts commit 8ff7184b1b .
2023-07-07 18:29:05 -07:00
fluffy χατγιρλ
8ff7184b1b
Fix ShapeTracker mismatch in LazyBuffer.fromCPU ( #1156 )
...
* init shape tracker with strides to fix mismatch
Author: sekstini <sekstinilol@gmail.com >
* fix whitespace
* add tests
2023-07-07 18:28:21 -07:00
George Hotz
b8dfbba703
hip_matmul: f16 gemm 2048x2048 gets 36 TFLOPS
2023-07-08 00:35:45 +00:00
Stan
69d33cab0d
Fix: auto create parent dir when downloading file ( #1173 )
...
* Fix: auto create parent dir when downloading file
also removed duplicate import `os`
* Added test for auto parent dir creation when downloading file
2023-07-07 13:40:29 -07:00
Stan
f40f8cd055
Initialise numpy arrays as float32 in DDPG ( #1171 )
...
float64 is not supported by tinygrad
2023-07-07 12:05:31 -07:00
cloud11665
884b5965de
ops_cuda fix race condition on cubin file read when testing with multiple cores ( #1172 )
2023-07-07 12:05:16 -07:00
terafo
aa60feda48
Fix naming conflict with huggingface datasets ( #1161 )
...
* Rename in files
* Move files
* Moved to extra/datasets as suggested
* Changes to files
* Fixed stupid mistake
---------
Co-authored-by: terafo <terafo@protonmail.com >
2023-07-07 10:43:44 -07:00
Yahya Lmallas
fd66d1ca00
fix Tensor.manual_seed() default to wrong type ( #1168 )
...
* fix Tensor.manual_seed() default to wrong type None while it should be int
* remove that tests
2023-07-07 10:42:48 -07:00
Stan
9b6e57eccd
helpers.py: improved test coverage + exception handling ( #1165 )
...
* Fixes + improved test coverage for helpers.py
- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread
* Made `_early_exec_process` catch any Exception
Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`
* Fixed `from tinygrad.helpers import Timing` import
oops, for some reason my IDE cleaned that import from extra/helpers.
* Fixed import in llama.py
Another one that I skipped by accident, mybad
* Extracted a class for tests of early exec
* Normalize line endings, windows uses /r/n
* Made `cross_process` not a daemon
2023-07-07 10:26:05 -07:00
Kunwar Raj Singh
8391648822
Over 90% on CIFAR with examples/hlb_cifar10.py ( #1073 )
...
* fix eval, lr decay, best eval
* 82.27
* 82.64
* 82.79, reproducable
* add lr sched, 85.26
* 87.42
* 87.94
* 87.42
* tta with flip
* training flip aug
* refactor
* using Tensor for LR is faster
* 89.5
* refactor, flip only train set
* 90.01
* 90.64
* eval jit
* refactor
* only JIT model
* fix eval JIT
* fix eval JIT
* 90.82
* STEPS=900 reaches 90.22
* TTA envvar
* TTA default 0
* fully jit training
* refactor optim
* fix sched
* add label smoothing
* param changes
* patial gelu
* OneCycle with pause
* gelu maybe works
* 90.12
* remove pause lr
* maybe fix lr schedulers
* scheduler test passing
* comments
* try mixup
* shuffle!
* add back the missing last eval
* fix shuffle bugs
* add mixup prob
* fix mixup prob
* 90.19
* correct mixup
* correct mixup
* correct mixup
* 90.24
* 90.33
* refactor, add type hints
* add gradient clipping
* maybe fix test
* full JIT
* back to relu for now
* pass mixup prob as param
* add typehints
* maybe CI works
* try erf gelu
* CI, types
* remove useless import/
* refactor optim
* refactor optim
* try leakyrelu
* try celu
* gelu
* 90.67
* remove grad clip
* remove grad clip tests
* revert params
* add test for OneCycleLR
* 90.62
* fix eval timing
* fix eval timing again
* so where i calculate mixup_prob matters
---------
Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain >
2023-07-06 20:46:22 -07:00
Barath
c5aea13a65
Fix evaluation stage in examples/transformer.py when using CUDA ( #1150 )
...
* make test data as contiguous array
* standardise contiguous array for all input data in cuda ops
* swap to x.ravel
2023-07-06 18:07:10 -07:00
Rayan Hatout
9975f24452
Fold expand preceding reduce if the reduction is on the same axis as the expansion ( #1134 )
...
* fold expands that precede a reduce if the reduction is on the same axis as the expansion
* add deterministic test for SIMPLIFY_SUM_RESHAPE_EXPAND_SUM optimization
* add a test case to make sure we don't fold reduce-expand-reduce on different axes
2023-07-06 13:41:05 -07:00
cheeetoo
f109af3cbb
Don't save parents unless needed ( #1142 )
...
* don't save parents unless requires grad
* keep del ctx since idk
2023-07-05 18:11:57 -07:00
Eli Frigo
801564f31b
Remove POW llop and add SQRT llop ( #1104 )
...
* fixed division by zero for fast operations
* made et closer to 0
* replace POW llop with SQRT
* updated mlops to swap SQRT and POW llops
* updated hlops to swap POW and SQRT
* added sqrt llop to cpu runtime
* added sqrt llop to cstyle codegen
* added POW llop to llvm ir codegen
* added SQRT llop to torch runtime
* moved pow from mlops to hlops
* found a better way to do reverse pow
* fixed indentation
* added SQRT llop to triton
* update docs to match new llops
* removed POW operator from assembly codegen
* added sqrt and rsqrt to pow hlop
* rewrote pow function in tensor.py
* Adjust tolerance
* Adjust for adamw
* Reduce for Adam too
* removed accidental leftover code
* removed all of accidental code
* added rsqrt test
* removed pow from mlops again
it was added back when resolving merge conflicts
---------
Co-authored-by: Jacky Lee <jla524@sfu.ca >
2023-07-05 18:07:58 -07:00
cloud11665
b7369ffcff
add ptx formatter + syntax highlighter ( #1128 )
2023-07-05 17:56:09 -07:00
Reza Rezvan
d1356cac27
Fix: Jacobian tests [WIP] ( #1126 )
...
* Fix: Jacobian tests; num_jacobian either bugged or not accurate enough;
* Fix: Jacobian tests;
* Fix: Gradcheck;
2023-07-05 15:36:22 -07:00
nimlgen
d363d25ee2
fix imports for examples/transformer.py ( #1136 )
2023-07-05 08:15:13 -07:00
Mehmet Kuzucu
c3173ff281
Add return statement to the train function ( #1135 )
...
add a return statement to the train function in order to provide access to the losses and accuracies lists
2023-07-05 08:13:38 -07:00
wozeparrot
981d4980c4
feat: reword contributing ( #1131 )
2023-07-04 22:17:47 -07:00
George Hotz
793a670187
from tensor cores + lb touchup ( #1127 )
2023-07-04 15:45:20 -07:00
George Hotz
2f968f8547
ignore cloudpickle type for local mypy
2023-07-04 13:51:20 -07:00
George Hotz
87d21ea979
examples: simple conv bn
2023-07-04 13:50:26 -07:00
Reza Rezvan
535224ac20
Remove float64 ( #1101 )
...
* Refactor: Remove float64
* Refactor: Remove unused imports
* Refactor: Remove float64
* Refactor: Remove float64
* Refactor: Exclude float64 onnx backend
* Add: Skip jacobian and gradcheck tests;
2023-07-04 08:40:51 -07:00
Daniel Hipke
b4ce23e4b8
Make cross_process use cloudpickle ( #1118 )
...
* fix syntax issues in imagenet_download.py
* use cloudpickle in cross_process to make it work in Python 3.9+
* add cross_process test
* prevent unpickling on every function call
* add cloudpickle to setup.py
* add support for args/kwargs
2023-07-04 00:47:34 -07:00
George Hotz
c709dec8b5
gelu: weird test was broken for metal
2023-07-04 00:43:54 -07:00
George Hotz
daf8e1942f
sigmoid: test large postive also and add note
2023-07-04 00:18:31 -07:00
Kunwar Raj Singh
9e6067378f
Broken Sigmoid backward: Add test and mlop for Sigmoid ( #1113 )
...
* Add failing sigmoid test
* update more tests
* add mlop for sigmoid
* add back test
* math.log(math.e) = 1
* remove divides
---------
Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain >
2023-07-04 00:14:22 -07:00
Daniel Hipke
d58a9603ab
Create COCO data directory if it doesn't exist. ( #1114 )
...
* Create COCO data directory if it doesn't exist.
* update paths to support windows
2023-07-03 18:15:53 -07:00
Anselm Coogan
a22aad7d32
Use generators instead of lists in anys and alls ( #1111 )
...
* Use generators in any(..) instead of lists for better best-case
* Use generators in all(...) instead of lists
* enable R1729 in .pylintrc
* revert import sorting
---------
Co-authored-by: Anselm Coogan <anselm@scandit.com >
2023-07-03 16:06:06 -07:00
tricky-labyrinth
fd98f6cffa
Small fix to abstractions.py so it runs on Windows without throwing an AttributeError ( #1109 )
...
Co-authored-by: Tricky Labyrinth <trickylabyrinth@gmail.com >
2023-07-03 13:44:49 -07:00
Mike Ovyan
651d080594
[perf] Replace more list comprehension with * ( #1106 )
...
* [perf] Replace more list comprehension with *
* comeback
* final fix?
* blind me
* kill me
* ?
* rev
* [none]
2023-07-03 10:49:23 -07:00
Frank Pinnola
2071e53da8
Handle broadcast flag on gemm ( #1103 )
2023-07-02 22:15:07 -07:00
Taras Tsugrii
cbb5c655e5
[tensor][perf] Replace list comprehension with *. ( #1102 )
...
It's more concise, idiomatic and faster:
```
In [8]: %timeit [1 for _ in range(100)]
2.12 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [9]: %timeit [1] * 100
515 ns ± 5.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
```
2023-07-02 18:34:23 -07:00
David Hou
363fbfc2e4
do not emit loop end code for global+local loops in assembly kernel ( #1100 )
2023-07-02 18:33:57 -07:00
Reza Rezvan
8ae9a054ae
Refactor nn.optim ( #1091 )
...
* Refactor: nn.optim.py
* Refactor: nn.optim.py; Fix all tests
* Refactor: Replace all optim.get_parameters()
* Refactor: Revert list comp.
* Refactor: Replace optim.get_state_dict
* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
Eli Frigo
10f1aeb144
fixed broken link ( #1097 )
2023-07-02 15:06:59 -07:00
Rob Grossman
c8ddc34368
include missing queue in thneed load ( #1095 )
2023-07-02 12:33:59 -07:00
nmarwell26
12ce68c1ee
Renamed examples/yolo to examples/vgg7_helpers because that directory contains no yolo-related code and only helper code for vgg7. This was confusing to a new user when trying to understand the examples. ( #1086 )
2023-07-01 12:04:28 -07:00
Rob Grossman
2533a992e7
remove unused imports in models ( #1088 )
2023-07-01 12:04:19 -07:00
geohotstan
575f75f613
hello ( #1084 )
2023-07-01 01:29:35 -07:00