Commit Graph

1248 Commits

Author SHA1 Message Date
George Hotz
b8c94a67c9 Simple chonker (#431)
* chonker will make llvm fast

* work

* better speed tests, we will make them fast

* with the cache add is the same speed

* relu and neg are fast

* fix sum speed

* maximum maxnum?

* hack for gemm opt

* gemm very slow

* zeros like

* test_permute

* shapetracker returns self

* fix shapetracker factorization

* err, int strides

* permutes are faster now in tinygrad than pytorch

* support -1 in expand

* gemm unrolled

* improve final test case

* WIP GEMM

* why isn't GEMM fast?

* revert cache dim

* ffp contract works on clang, not llvm?

* ignore llvm ir

* this makes fma work at least, but no faster

* USE_4x4

* 63 GFLOPS

* 87 GFLOPS

* that wasn't matmul, 44 GFLOPS now

* 82 GFLOPS permuted

* this permute too

* a little speed for the convs

* 45 GFLOPS

* speed tests pass again

* clean up prints

* fix FMA WHAT A WASTE OF TIME

* colors

* moar fair

* GPU

* useless on chonker

* cleanups

* improve factorized shapetracker

* better threshold

* label conv

* work

* ops test pass again

* hot load the index

* run the last view, no need to create

* ZeroView needs a repr for the key to work

* fix segfault on out of bounds

* one more test

* start amx, and llvm.initialize_native_asmparser

* amx works

* nice AMX class

* nicer AMX class

* refactor get_idxs

* amx working

* is slower...

* useless flip

* cache

* SZ_X

* AMX_SZ_X/Y work alone

* Contiguous mlop

* test gemm packed

* PREPARE in packed

* use_amx factor

* prefetch isn't faster

* loop

* same 3ms

* 2.24 ms

* allow double on store in TG

* amx reduce is the same speed as non amx reduce

* include memory bandwidth

* clean up shapetracker

* flip returns stride

* prepare for upstream

* Update ops_llvm.py (#426)

* permutes are yellow and green now

* faster conv

* llvm cleanups

* Show optimised IR under debug 4 (#428)

* ASTKernel class

* Make tinygrad work with older python version (#427)

* Make tinygrad work with older python version

* Use partialmethod instead of partial

* smiple chonker is chonking

* remove junk from test speed vs torch

* fix linker and types

* AMX is only here now

* add LLVM tests, it's a valid backend now

* oops, run llvm test

* contiguous_op

* fix loadops compare

* dedup reduceops

Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>
2022-11-10 23:17:09 -08:00
George Hotz
2cc1d970c6 updates from the chonker branch 2022-11-07 21:12:08 -08:00
George Hotz
d878065ece Gemm (#416)
* gemm

* off by factor of 5

* 50 GFLOPS

* works

* 91 gflops

* working at 50G

* works

* iy

* 150 GFLOPS

* 150 GFLOPS

* N=2048 is still fast

* threading soon

* multithread

* pinning

* throttling is sad

* Align matrices to cacheline width (#361)

Co-authored-by: cloud <Cloud11665@gmail.com>
2022-11-06 10:07:28 -08:00
George Hotz
6a8fb53304 move ops.py into lazy.py (#402)
* move ops.py into lazy.py

* fix graph and linter

* ugh, didn't add
2022-10-25 13:58:03 -07:00
George Hotz
8e22d5ee67 replace networkx with defaultdict 2022-10-20 19:36:43 -07:00
George Hotz
63f9c55156 really dumb bug 2022-10-20 17:07:47 -07:00
George Hotz
1bec4651b3 fix nonstatic weights 2022-10-20 17:04:14 -07:00
George Hotz
bb288e6938 safe_numpy and warning for broken matmul 2022-10-20 15:40:22 -07:00
George Hotz
50c95c7d9a add assert to catch issue in attention 2022-10-20 15:13:00 -07:00
George Hotz
26c78ccf7d remove useless buffer 2022-10-20 14:07:28 -07:00
George Hotz
a18c1f3178 zero out the inputs 2022-10-20 13:46:52 -07:00
George Hotz
ace8db29f8 ReduceSum 2022-10-20 12:48:14 -07:00
George Hotz
c400ee0beb refactoring thneed (#400)
* refactoring thneed

* continue

* minor update

* looks like it's working

* big refactor

* confirm thneed got the right output

* code is there but it's broken

* works now

* always OPTWG, input -> dat

* fix type issue
2022-10-20 12:35:59 -07:00
YassineYousfi
ae0f9b17df openpilot: new models and onnx ops (#401)
* ngrl stuff

* fngrl

* fix typo in compile script

* workflow dispatch

* new models in tests

* dont need to up this threshold

Co-authored-by: HaraldSchafer <harald.the.engineer@gmail.com>
2022-10-20 11:49:19 -07:00
George Hotz
ff11c4316b move get_parameters to optim.py 2022-09-25 13:16:58 -04:00
Jacky Lee
2c01a66265 Reshape dataset from fetch_mnist (#390) 2022-09-24 21:16:29 -04:00
George Hotz
271446e3eb set requires_grad to None (#387)
* set requires_grad to None

* some things need gradients

* hmm, why was get_parameters filtering
2022-09-21 11:16:02 -04:00
YassineYousfi
2f0f91ba3d support float16 onnx weights (#384) 2022-09-15 09:12:18 -04:00
YassineYousfi
1a7bdc51f8 support more onnx ops (#376)
* broadcast from right to left

* add another broadcasted add test

* more onnx ops

* use float32 range in clip
2022-09-07 15:15:24 -07:00
George Hotz
0516359af8 fix stupid OPENCL=1 OOM 2022-09-06 14:29:23 -07:00
George Hotz
4dadd95e3c fix tests hopefully, more stable diffusion 2022-09-03 10:38:31 -07:00
George Hotz
c01a8c5c2d stable diffusion start 2022-09-03 10:08:42 -07:00
George Hotz
a3fc64a585 fix batchnorm folding in openpilot compile 2022-08-31 13:04:49 -07:00
George Hotz
dc7af8c3ac thneed run float32 2022-08-28 11:03:35 -07:00
George Hotz
b132de677d tinygrad.nn (#367)
* tinygrad.nn

* flake8

* working on pylint

* more pylint

* more pylint

* pylint passes

* networkx

* mypy can't infer that type

* junk
2022-08-18 07:41:00 -07:00
George Hotz
f76d41812b prune graph 2022-07-17 15:38:43 -07:00
George Hotz
eda6f071b2 default opt level 2 2022-07-17 14:54:40 -07:00
George Hotz
73b0471b25 join expands 2022-07-17 13:42:05 -07:00
George Hotz
d04b274cd2 noop removal can replace with reshape 2022-07-16 08:32:42 -07:00
George Hotz
2720ef49ca extra and test and tuple 2022-07-07 10:01:33 -07:00
George Hotz
81b73f97a3 Optiimzation (#355)
* constant folding into kernels

* that opt worth it?

* fix mypy

* ast one kernel

* save 2 lines in conv kernel

* debug print kernel count

* cl debugging

* early realize inputs

* refactor Device
2022-07-04 08:58:57 -07:00
George Hotz
7276f8d6bf improve constant folding, detach before moving tensor 2022-07-02 15:29:40 -07:00
George Hotz
8cf1aed0f4 don't track_running_stats, parameters must require_grad 2022-07-02 14:38:45 -07:00
George Hotz
49c954b389 comments 2022-06-26 17:20:25 -07:00
George Hotz
83d50e2687 move to extra.onnx 2022-06-21 19:43:44 -07:00
George Hotz
9b27ba650b load new torch files 2022-06-07 10:06:48 -07:00
George Hotz
233c71a7ba support requires_grad 2022-06-06 07:47:31 -07:00
George Hotz
d8d19ed468 wikimedia wasn't returning 200 2022-01-15 19:09:29 -08:00
George Hotz
e28cdfb0cf clean up resnet 2021-11-30 16:14:54 -05:00
George Hotz
58ed46963e fix broadcastdot 2021-11-29 18:54:57 -05:00
George Hotz
dca076dbf1 remove dumb nn ops 2021-11-29 18:05:31 -05:00
George Hotz
30eb3afbe1 add bias term to transformer 2021-11-29 12:45:27 -05:00
George Hotz
e2a8961a18 less lines, fix bug 2021-11-17 12:52:17 -08:00
George Hotz
ba28761894 move yolo into examples/yolo 2021-10-30 19:46:00 -07:00
George Hotz
63f50cff45 move back again 2021-10-30 16:13:29 -07:00
Evan Mays
285621aeda Cherry backprop for conv2d (#281)
* quick math: 0 + x = x.

* gradient w.r.t. x using cherry for conv

* gradient w.r.t. w for conv on cherry but doing vector dot products

* small optimization

* [cherry] optimize conv backpass for large channel count

* get rid of numpy einsum
2021-10-30 16:12:19 -07:00
George Hotz
3d646272d6 move back 2021-10-30 16:12:12 -07:00
George Hotz
ac8afd24fa refactor accel 2021-10-30 16:10:59 -07:00
Guglielmo Camporese
2b7589db64 Added ResNet-{18, 34, 50, 101, 152} (#271)
* added resnets

* fix minor

* fix minor

* resnet in models

* added resnet test

* added resnet train test

* added linear, conv2d nn tests

* fix minor in extra/training

* resnet in models

* fix minor

* fix tolerance for linear in nn test

* fix eval, this causes cpu and gpu UT failing

* revert transformer test

* fix minor for CPU test

* improved model get_params for sequential layer

* fix minor for params counting

* commented broken ops tests

* improved train for resnet
2021-06-21 09:37:24 -07:00
George Hotz
89798d2f43 some flags 2021-06-19 11:46:31 -07:00