Commit Graph

35 Commits

Author SHA1 Message Date
George Hotz
bfcec234a2 Refactor ASTs (#622)
* ugh worst branch name

* compiler refactor continues

* scc -> cloc

* buf -> _buf

* finish _buf, and program -> runtime

* gpu is still working, clang isn't

* clang in new style

* ops_metal

* something broke it

* improve metal

* clean up tons of cl crap

* hack fix sync

* cleaner gpu

* gpu metal clang

* cleanups

* minor refactor

* GPUCodegen

* fix up LLVM

* blind CUDA refactor

* codegen / runtime

* keep ops naming

* linter passes

* woah, llvm was allocing 4x what it needed to

* bugfixes

* fix openpilot compiler

* fix compile_efficientnet

* method cache should fix tests

* deal with duped functions
2023-03-01 18:57:29 -08:00
George Hotz
ea3fa07c2a bump tinygrad to 0.5, move reshape logic from mlops 2023-02-28 18:07:03 -08:00
Jacky Lee
0f58c4c648 Cleanup yolo and remove stateless classes (#604)
* Add AvgPool2d as a layer

* Clean up a bit

* Remove stateless layers in yolo_nn

* More cleanup

* Save label for test

* Add test for YOLO

* Test without cv2

* Don't fail if cv2 not installed

* Better import

* Fix image read

* Use opencv :)

* Don't download the file

* Fix errors

* Use same version

* Set higher confidence

* Why is the confidence so low?

* Start over

* Remove stateless layers

* Remove extra lines

* Revert changes

* Save a few more lines
2023-02-26 16:55:21 -08:00
Sohaib
8835df7a5c upgrade onnx to 1.13.0 (#588)
- remove protobuf from direct dependencies
- replace deprecated mapping.TENSOR_TYPE_TO_NP_TYPE

Co-authored-by: Sohaib Errabii <sohaib.errabii@ipops.io>
2023-02-23 13:59:23 -08:00
George Hotz
4126bf2982 remove six (hopefully not needed) 2023-02-20 20:44:23 -08:00
George Hotz
efcb3f0cdd fix metal dep 2023-02-20 20:43:32 -08:00
Diogo
506970414a added metal packages to setup and release metal buffers after del (#571) 2023-02-20 20:01:07 -08:00
Kirill
a4f5f2ff8b Add missing packages to setup.py (#554) 2023-02-11 14:41:56 -08:00
James Roberts
0d405fd5bc Parallelize CI tests (#535) 2023-02-06 15:27:44 -06:00
George Hotz
90529d3750 tests are 20% faster (#529)
* pytorch CPU

* no cache, it's slower

* pytorch cpu for real

* remove double onnx
2023-02-06 09:56:14 -06:00
George Hotz
039de1b332 oops, pytest is for testing 2023-02-06 09:30:12 -06:00
George Hotz
6eb0e6a650 shuffle deps: always tqdm, make linting category 2023-02-06 09:27:01 -06:00
Jacky Lee
ad4f6aa2cf Add test for quick_gelu (#526)
* Add test for quick_gelu

* Bump PyTorch version for approximate
2023-02-03 20:01:39 -08:00
George Hotz
cd97b036cc A Triton backend for tinygrad (#470)
* triton can add

* print stuff from triton

* write out file

* ops triton working

* reduce ops

* sort of works

* Triton bugfixes & implementation of remaining ops (#490)

* padding

* support pow, max, relu, gt0

* allocate return buffer

* Fix reduce

* Add tests for power op

* Fix triton illegal memory accesses and memory leak (#512)

* Fix mypy issue

* Add triton to setup.py

* Replace torch with pycuda

* Use one cuda stream for data transfer and kernels

* Remove triton submodule

* Fix memory leak by using weakrefs for caching

* Fix memory access by adding valid as mask for load

* Fix invalid kernel launches by flattening the grid (#515)

---------

Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
2023-02-01 11:53:57 -08:00
George Hotz
bd8a5c2ced Simple CUDA Runtime (#480)
* factor out opencl runtime

* don't use CL outside the runtime

* cuda runtime adds

* final_dimension

* tests pass with CUDA backend

* more cuda

* cuda simpler

* retain old functionality

* linter and typing

* move globalcounters out of runtimes

* oops, GlobalCounters in cuda

* MAX_OUTPUT_SHAPE=3 is fine for CUDA
2023-01-27 16:26:24 -08:00
Jacky Lee
026ba78526 Add commit hooks (#478)
* Add pre-commit hook

* We need ret

* Fix some type definitions
2023-01-26 22:24:31 -08:00
George Hotz
bfd4f4e35c testdocker 2023-01-09 12:41:52 -08:00
George Hotz
b8c94a67c9 Simple chonker (#431)
* chonker will make llvm fast

* work

* better speed tests, we will make them fast

* with the cache add is the same speed

* relu and neg are fast

* fix sum speed

* maximum maxnum?

* hack for gemm opt

* gemm very slow

* zeros like

* test_permute

* shapetracker returns self

* fix shapetracker factorization

* err, int strides

* permutes are faster now in tinygrad than pytorch

* support -1 in expand

* gemm unrolled

* improve final test case

* WIP GEMM

* why isn't GEMM fast?

* revert cache dim

* ffp contract works on clang, not llvm?

* ignore llvm ir

* this makes fma work at least, but no faster

* USE_4x4

* 63 GFLOPS

* 87 GFLOPS

* that wasn't matmul, 44 GFLOPS now

* 82 GFLOPS permuted

* this permute too

* a little speed for the convs

* 45 GFLOPS

* speed tests pass again

* clean up prints

* fix FMA WHAT A WASTE OF TIME

* colors

* moar fair

* GPU

* useless on chonker

* cleanups

* improve factorized shapetracker

* better threshold

* label conv

* work

* ops test pass again

* hot load the index

* run the last view, no need to create

* ZeroView needs a repr for the key to work

* fix segfault on out of bounds

* one more test

* start amx, and llvm.initialize_native_asmparser

* amx works

* nice AMX class

* nicer AMX class

* refactor get_idxs

* amx working

* is slower...

* useless flip

* cache

* SZ_X

* AMX_SZ_X/Y work alone

* Contiguous mlop

* test gemm packed

* PREPARE in packed

* use_amx factor

* prefetch isn't faster

* loop

* same 3ms

* 2.24 ms

* allow double on store in TG

* amx reduce is the same speed as non amx reduce

* include memory bandwidth

* clean up shapetracker

* flip returns stride

* prepare for upstream

* Update ops_llvm.py (#426)

* permutes are yellow and green now

* faster conv

* llvm cleanups

* Show optimised IR under debug 4 (#428)

* ASTKernel class

* Make tinygrad work with older python version (#427)

* Make tinygrad work with older python version

* Use partialmethod instead of partial

* smiple chonker is chonking

* remove junk from test speed vs torch

* fix linker and types

* AMX is only here now

* add LLVM tests, it's a valid backend now

* oops, run llvm test

* contiguous_op

* fix loadops compare

* dedup reduceops

Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>
2022-11-10 23:17:09 -08:00
George Hotz
92ed87b0a5 bump version to 0.4.0 2022-11-08 08:44:42 -08:00
George Hotz
b132de677d tinygrad.nn (#367)
* tinygrad.nn

* flake8

* working on pylint

* more pylint

* more pylint

* pylint passes

* networkx

* mypy can't infer that type

* junk
2022-08-18 07:41:00 -07:00
Nicklas Boman
64d986bc8b add mypy to ci testing (#353) 2022-07-03 15:11:35 -07:00
George Hotz
0d82cfd587 huh, torch 1.12 broke it. remove unused requirements.txt and pin torch 1.11 2022-07-02 23:07:59 -07:00
George Hotz
a710b3a210 it's a real test now 2022-06-11 11:33:33 -07:00
George Hotz
8440dbfa5d support inputs 2022-06-11 11:21:45 -07:00
George Hotz
082089d1c7 install requires pillow 2021-10-30 16:00:33 -07:00
Liam
bcf1518309 All devices are equal! (#196)
* Update all devices to be tested

ANE, CPU and OCL all now support all tests.

However tests are not currently passing on GPU and I cannot test on CPU.

Failing GPU test are not an issue caused by this update. Tests have not
been passing due to a missing "six" required installation.

OpenCL Tests have not been run since commit: 1a1c63a08b

devices have 3 types and are handle by a new DeviceTypes enum. (The goal
is to revert to Tensor.<type>, but this current setup allows for keyword
argument defaults: `device=DeviceType.CPU`)

All references to Tensor.GPU/CPU/ANE as been converted to the
corresponding `DeviceTypes` enum.

Refactor of the conversion code to allow for any device to any device
conversion.

* Add six dependency in requirements.txt

* Resolve failure to run tests

Move six into gpu required installs. Remove six from standard
installation.

* Remove repeated data conversion

* Refactor method names

Also reduce code with .to and .to_

* Dynamic device handlers

* Refactor DeviceTypes -> Device

* Add mem copy profiling back

* test_backward_pass_diamond_model passing

* Resolve Sum issue on GPU

* Revert batchnorm2d tests

* Update README with upadated API

* ANE testing with

* Last minute line gains
2020-12-15 23:44:08 -08:00
Liam
34b38dd4d0 Extra install requirements. (#164)
* Testing install requirements

* GPU install requirements
2020-12-09 02:22:47 -08:00
George Hotz
06504a5824 bump version 2020-11-08 09:34:07 -08:00
Marcel Bischoff
d24363f421 Update setup.py (#49)
I think `:=` in tinygrad/test/test_mnist.py actually needs 3.8
2020-11-02 18:09:31 -08:00
George Hotz
0b68c08de0 literally just bump version for picture on pypi 2020-10-27 08:14:22 -07:00
George Hotz
6b5982b6b3 push pypi 2020-10-27 08:13:15 -07:00
George Hotz
43591a1e71 make the example simpler 2020-10-26 09:19:20 -07:00
George Hotz
64bd4f7936 lol, it's not 1.0 2020-10-26 09:11:32 -07:00
Göktuğ Karakaşlı
8d80726207 two spaces 2020-10-26 18:54:55 +03:00
Göktuğ Karakaşlı
cc9bd45b44 add setup.py and change imports to relative 2020-10-26 18:19:50 +03:00