Commit Graph

1242 Commits

Author SHA1 Message Date
George Hotz
fff1f046b0 Simple version of the new GPU backend (#458)
* newgpu

* more to delete

* hmm, tests pass with constant folding

* fix lint/type

* fix constant folding

* comment and rerun tests

* lazy touchups

* fix graph_batchnorm test

* smaller transformer to fix OOM

* Revert "smaller transformer to fix OOM"

This reverts commit a44ef8edc2.

* no func cache

* introspect

* touchups

* CLASTKernel

* ugh, it was lru_cache

* codegen

* spacing

* old gpu still in opencl

* typing fix
2023-01-10 19:16:02 -08:00
George Hotz
66123c99b9 gpubuffer repr match llvmbuffer 2023-01-09 20:02:22 -08:00
George Hotz
0a7d2b1a2e fix kernel_cnt type 2023-01-09 19:34:57 -08:00
George Hotz
4356683081 gpu: rename kernels 2023-01-09 19:32:22 -08:00
George Hotz
1e1abb450e fromcpu 2023-01-09 19:18:57 -08:00
George Hotz
90121482fa oops, don't assign self 2023-01-09 18:02:12 -08:00
George Hotz
fad7cba590 move batchnorm to Tensor 2023-01-09 18:00:16 -08:00
George Hotz
27211103ae docker: no -it 2023-01-09 12:49:59 -08:00
George Hotz
d6e86a29a8 docker: forgot to checkout code 2023-01-09 12:48:03 -08:00
George Hotz
73ce9a771e that fix it 2023-01-09 12:46:33 -08:00
George Hotz
bfd4f4e35c testdocker 2023-01-09 12:41:52 -08:00
George Hotz
4885fce56e shapetracker from newgpu (#456)
* shapetracker from newgpu

* touchup ops

* test

* testst

* thneed deletes unused inputs

* test

* bugfix
2023-01-09 12:40:01 -08:00
Faisal Memon
538b1d7f5b Print out the tensor using numpy(). (#454)
This commit resolves issue https://github.com/geohot/tinygrad/issues/453

In the example code in the README.md, when it is run, it prints for Tiny
Grad the tensors as:
<Tensor <LB (3, 3) op:MovementOps.RESHAPE> with grad None>
<Tensor <LB (1, 3) op:MovementOps.RESHAPE> with grad None>

But to be equivalent to the output of the Torch example, we need
to use numpy() to get it to show:
[[ 2.  2.  2.]
 [ 0.  0.  0.]
 [-2. -2. -2.]]
[[1. 1. 1.]]
2023-01-09 10:08:05 -08:00
nogira
2e744ef2f2 confirmed (#449)
w/ a bunch of print statements in the official model here: ce05de2819/ldm/modules/diffusionmodules/openaimodel.py (L413)
2023-01-07 08:41:06 -08:00
Nicolai Stoianov
8dbf76268d Add step for setting up Stable Diffusion (#452) 2023-01-07 08:40:12 -08:00
cloud11665
4fb97b8de0 don't fail when termcolor is not installed (#436) 2022-11-14 16:45:06 -08:00
George Hotz
5e07d4669d the speedy chonker is going to replace the old chonker (#432)
* bringing back reshape and permute

* done with E701

* 4x4 works in generic way

* max and sum not vectorizing...

* special case single float

* support comparing to MPS

* improve matmul speed, consider generic principles

* GlobalCounter

* fix op tracking

* faster

* comment that out for now

* err, it needs that

* fix minor issues

* fix global_mem
2022-11-11 18:34:24 -08:00
George Hotz
d2273d2cc4 s/contiguous_op/contiguous 2022-11-11 00:07:05 -08:00
George Hotz
b8c94a67c9 Simple chonker (#431)
* chonker will make llvm fast

* work

* better speed tests, we will make them fast

* with the cache add is the same speed

* relu and neg are fast

* fix sum speed

* maximum maxnum?

* hack for gemm opt

* gemm very slow

* zeros like

* test_permute

* shapetracker returns self

* fix shapetracker factorization

* err, int strides

* permutes are faster now in tinygrad than pytorch

* support -1 in expand

* gemm unrolled

* improve final test case

* WIP GEMM

* why isn't GEMM fast?

* revert cache dim

* ffp contract works on clang, not llvm?

* ignore llvm ir

* this makes fma work at least, but no faster

* USE_4x4

* 63 GFLOPS

* 87 GFLOPS

* that wasn't matmul, 44 GFLOPS now

* 82 GFLOPS permuted

* this permute too

* a little speed for the convs

* 45 GFLOPS

* speed tests pass again

* clean up prints

* fix FMA WHAT A WASTE OF TIME

* colors

* moar fair

* GPU

* useless on chonker

* cleanups

* improve factorized shapetracker

* better threshold

* label conv

* work

* ops test pass again

* hot load the index

* run the last view, no need to create

* ZeroView needs a repr for the key to work

* fix segfault on out of bounds

* one more test

* start amx, and llvm.initialize_native_asmparser

* amx works

* nice AMX class

* nicer AMX class

* refactor get_idxs

* amx working

* is slower...

* useless flip

* cache

* SZ_X

* AMX_SZ_X/Y work alone

* Contiguous mlop

* test gemm packed

* PREPARE in packed

* use_amx factor

* prefetch isn't faster

* loop

* same 3ms

* 2.24 ms

* allow double on store in TG

* amx reduce is the same speed as non amx reduce

* include memory bandwidth

* clean up shapetracker

* flip returns stride

* prepare for upstream

* Update ops_llvm.py (#426)

* permutes are yellow and green now

* faster conv

* llvm cleanups

* Show optimised IR under debug 4 (#428)

* ASTKernel class

* Make tinygrad work with older python version (#427)

* Make tinygrad work with older python version

* Use partialmethod instead of partial

* smiple chonker is chonking

* remove junk from test speed vs torch

* fix linker and types

* AMX is only here now

* add LLVM tests, it's a valid backend now

* oops, run llvm test

* contiguous_op

* fix loadops compare

* dedup reduceops

Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>
2022-11-10 23:17:09 -08:00
George Hotz
bff47e9dc1 contiguous, and no strided for matmul 2022-11-09 16:56:26 -08:00
George Hotz
1271f19a2b factorizing shapetracker from chonker 2022-11-09 16:36:38 -08:00
Daniel Davis
64ff1ddc10 Reduce line count (#424)
* save a line, save a life

* save a line, save a life

* change order of tern
2022-11-09 10:07:22 -08:00
George Hotz
0994705166 contrib more 2022-11-08 19:14:37 -08:00
George Hotz
c0bba9649a more that 2022-11-08 19:13:11 -08:00
George Hotz
5143da6a9f contributing 2022-11-08 19:12:12 -08:00
Daniel Davis
4998bf49b3 Basic editorconfig support (#422)
Almost every IDE or texteditor supports
[editorconfig](https://editorconfig.org/).
I've set it up to just enforce the 2 space python indents for now.
2022-11-08 10:34:25 -08:00
marcojob
c3d9c9b24c Fix issue where batch_invstd not being set (#421)
batch_invstd can be falsely assumed to be set, even though it is None
since hasattr will not return false in this case
BatchNorm2D a reshape will be attempted then, which causes an exception
2022-11-08 09:24:53 -08:00
Liam
8dc28dd733 Create python-publish.yml (#163) v0.4.0 2022-11-08 08:45:01 -08:00
George Hotz
92ed87b0a5 bump version to 0.4.0 2022-11-08 08:44:42 -08:00
George Hotz
9781b4c3af rename test functions to helper_ 2022-11-07 21:27:56 -08:00
George Hotz
9884be2ad5 ugh, that too 2022-11-07 21:21:35 -08:00
George Hotz
537a9eb414 fix termcolor import 2022-11-07 21:19:08 -08:00
George Hotz
2cc1d970c6 updates from the chonker branch 2022-11-07 21:12:08 -08:00
George Hotz
d878065ece Gemm (#416)
* gemm

* off by factor of 5

* 50 GFLOPS

* works

* 91 gflops

* working at 50G

* works

* iy

* 150 GFLOPS

* 150 GFLOPS

* N=2048 is still fast

* threading soon

* multithread

* pinning

* throttling is sad

* Align matrices to cacheline width (#361)

Co-authored-by: cloud <Cloud11665@gmail.com>
2022-11-06 10:07:28 -08:00
George Hotz
caea34c529 1s are always mergable 2022-11-03 10:50:48 -07:00
George Hotz
c48fc47d01 fix type error 2022-10-31 09:56:56 -07:00
George Hotz
9585b6c0cf comments and readability in lazy.py 2022-10-30 19:50:48 -07:00
George Hotz
db2da22a04 stop blowing up floats 2022-10-30 16:47:16 -07:00
George Hotz
8afc643bb1 fix bug in ops test, it was cheating somehow 2022-10-30 16:43:24 -07:00
George Hotz
b7a115e5e5 rewrite some strideds into reshapes 2022-10-30 16:31:27 -07:00
George Hotz
8c849e637c that was in there twice, DEBUG>=4 to see loop opt 2022-10-30 15:31:39 -07:00
George Hotz
cfdf803b52 fix llvm vectorization by add analysis passes from the target machine 2022-10-30 15:28:36 -07:00
George Hotz
2f602a92ff seperate STRIDED and EXPAND 2022-10-30 13:23:58 -07:00
George Hotz
544cb0a069 oops, remove while(1) 2022-10-29 14:05:13 -07:00
George Hotz
4b6097f81d more amx notes 2022-10-29 14:04:10 -07:00
George Hotz
fdb43fe553 gemm is 1.7 TFLOPS on a single M1 core 2022-10-29 13:42:33 -07:00
George Hotz
52bfbc31be vectorization 2022-10-29 12:47:52 -07:00
George Hotz
e473d35f90 llvm doesn't vectorize 2022-10-29 11:59:48 -07:00
George Hotz
86eb06eb76 accurate flop estimation 2022-10-28 19:13:20 -07:00
George Hotz
7909786dbf one more opt test 2022-10-28 18:37:53 -07:00