Commit Graph

10417 Commits

Author SHA1 Message Date
chenyu
25a767cd5d Remove LtNode.__mul__ and AndNode.__mul__ (#1913) 2023-09-25 07:03:59 +08:00
chenyu
eaa8d343d8 Remove str type from map_buffers (#1912) 2023-09-25 07:03:22 +08:00
Dat D. Nguyen
ae9529e678 chore: remove redundant noise in stable diffusion example (#1910) 2023-09-24 21:33:45 +08:00
George Hotz
6d9065ed1c Minor cleanups (#1911)
* cleanups

* remove that simplify
2023-09-24 21:32:50 +08:00
George Hotz
20059dc55b Make ShapeTracker Immutable (#1909)
* ugh

* ops test pass

* fix shapetracker tests

* sym shapetracker

* shapetracker is a tuple of views now

* from_shape

* fix has variable shape

* key isn't needed

* post init assert
2023-09-24 21:09:03 +08:00
nimlgen
45f02393f0 HipGraph support (#1880)
* init hip graph

* optimize args update

* cache symbolic in jit

* remove NOSTAT

* init BasicBatchExecutor

* symbolic infer cache per jit instance

* basicbatchexec is defualt for compiled

* batch_exec is taken from ASTRunner

* no infer cache

* batched execution of hip graph

* add comment about hip graph batches

* readable hip graph
2023-09-24 20:14:36 +08:00
George Hotz
7ff7aacdb4 LazyOp out of Linearizer (#1908)
* loadop buffer on cpu

* works for GPU

* sort of working

* has bugs

* gpu tests pass

* fix some tests

* fix tensor cores

* fix test linearizer

* fix symbolic

* fix has_variable_shape

* non symbolic size

* disable weird test

* simple cache fix

* fix custom function

* fix kopt

* cleanups

* a bit broken on the assign

* contig check

* only buffer

* need that order

* idx

* dedup buffers

* hmm, bugfix

* fix tensor cores

* opts device
2023-09-24 14:30:53 +08:00
qazal
2201b46bce Refactor Conv2d/ConvTranspose2d into a single parent class (#1906)
* refactor Conv2d/ConvTranspose2d

* raise in __call__ for the parent class

* use ABC

* drop ABC it's just syntactic sugar

* use conv2d as base for the transposed version
2023-09-24 14:23:41 +08:00
George Hotz
97dc813329 Revert "All LazyOps in the Linearizer (#1905)" (#1907)
This reverts commit a5820390db.
2023-09-24 11:51:22 +08:00
George Hotz
a5820390db All LazyOps in the Linearizer (#1905)
* loadop buffer on cpu

* works for GPU

* sort of working

* has bugs

* gpu tests pass

* fix some tests

* fix tensor cores

* fix test linearizer

* fix symbolic

* fix has_variable_shape

* non symbolic size

* disable weird test

* simple cache fix

* fix custom function

* fix kopt

* cleanups

* a bit broken on the assign

* contig check

* only buffer

* need that order

* idx
2023-09-24 11:50:00 +08:00
George Hotz
0f373b8b47 cache more uops (#1904)
* cache more uops

* fix cacheable
2023-09-23 16:50:13 +08:00
George Hotz
1e15fdaee7 disable flaky triton test 2023-09-23 14:59:36 +08:00
George Hotz
0571dd7627 move all int (#1903) 2023-09-23 14:43:45 +08:00
nimlgen
41aea3ad36 require C-contiguous array for hip._copyin (#1902) 2023-09-23 14:36:59 +08:00
Szymon Ożóg
58296c079d Make Triton work again (#1547)
* Move ops_triton to runtime and remove errors from deprecated code

* Remove deprecated AST Kernel

* Remove deprecated buffer

* Add TritonProgram

* Triton Buffer

* Use RawCUDABuffer

* triton_compile

* Added new parameter

* pass _buf to program

* remove deprecated include

* Added triton tests

* Deprecated includes removed

* remove double print

* Disable float4 support

* Disable float4 support

* variable load fix

* Track local size

* Add pycuda to triton dependencies

* Merge test.yml

* install cuda packages for testing

* merge double package install

* remove emulated from triton tests

* upscale local index to power of 2 and add masking

* cuda envs

* Add TernaryOps

* ConstOp loading

* proper function name

* remove deprecated variables

* get global program from name

* const ops match local shape

* Enable test_nn

* remove deprecated import

* fix linter error

* Add wait logic

* Add local size override

* accumulate local shapes instead of using max shape

* Merge triton tests into global tests

* fix envs in testing

* Old testing routine

* split file into renderer and program

* remove print and starting whitespace

* pretty ptx print on debug 5

* linter errors

* ignore triton saturation tests

* ignore test example

* remove pytorch cpu extra index

* Add triton to existing testing routine

* use triton tests

* disable cuda backend in triton tests

* use cudacpu in tests

* print used device

* Print device default

* Remove print

* ensure we are running triton backend

* update variable signatures

* update dtypes for load

* infinity render fixed

* limit global size

* negative infinity now properly rendered

* split chain with parentheses for and node

* Add option to disable shared memory, disable for triton

* missing import

* Properly index and mask conditional load

* use mask only if not loading a block pointer

* nan support

* fix symbolic tests to include chain split

* proper masking for stores

* Implemented bool dtype

* Add mod

* fix loads for variables with valid range

* merge triton with cuda runtime

* merge from master

* run triton tests with cuda

* Correct target when running from triton

* conftest with triton compiler config

* use triton nightly

* verbose tests for triton

* capture stdout

* fix function depth when exiting multiple loops

* add render valid function for readabilty

* fix mask for local loops

* add _arg_int32 datatype

* fix dims for conditional loads

* enable non float stores

* correct variable dtypes

* fix type for arg_int32

* remove junk

* Added get max function for range based var.max

* remove deprecated code

* Fix triton ptxas path

* Fix testing for CI

* clamp local size by max local size instead of always running max

* Disable matmul test in triton cpu

* rerun tests

* Disable broken test in triton cpu

* whitespace removed

* rerun tests again

* Disable TestSymbolicOps for triton

* update to new uops

* linter fix

* ignore test/extra

* linting fix

* Update tinygrad/renderer/triton.py

Co-authored-by: Gijs Koning <gijs-koning@live.nl>

* remove deprecated line

* quotes type fix

* linter

* Remove unnecesary lines

* UnaryOps.NEG

* dont define constants

* Linting fix

* Disable tests that are broken in ocelot

* remove trailing whitespace

* reduce line count

* linting fix

* update to new uast

* New looping style

* Update to new uast

* make AST runner work with triton

* linting fix

* set renderer var for testing

* disable local for ocelot

* reenable all tests for ocelot

* Pass shared to cuda

* Don't group if the backend doesn't support shared mem

* use working gpuocelot branch

* enable all tests

* enable local for ocelot

* cleanup

* Update test.yml

* update cache key

* reenable test symbolic and extra

* Update test.yml

* Revert "Update test.yml" (rerun tests)

This reverts commit 98c0630ee5.

* Revert "fix symbolic tests to include chain split"

This reverts commit 22a9a4c9cd.

* Revert "split chain with parentheses for and node"

This reverts commit 7499a7004e.

* use global size from linearizer

* rename newvar to dtype to match other renderers

* join program start lines

* simplify code that adds axis to local dims

* assign r[u] in ssa

* We no longer need to replace target in src

* we no longer need to cast indices to int by hand

* Update triton.py(rerun tests)

* Update triton.py(rerun tests)

* Update triton.py(rerun tests)

---------

Co-authored-by: Gijs Koning <gijs-koning@live.nl>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-23 14:17:12 +08:00
George Hotz
6fb8b3bb60 move symbolic functions to shapetracker (#1901) 2023-09-23 11:45:08 +08:00
George Hotz
9cf13bd055 rename reduce_op (#1900)
* rename reduce_op

* more design v2
2023-09-23 11:27:36 +08:00
George Hotz
73a6ed7862 Apply ShapeTracker in interpreted backends (#1846)
* applying st

* tests pass

* minor cleanups

* torch too

* hack

* contiguous

* move mops

* contig in BN

* tests should pass

* make torch fast

* make zeros and ones contig by default

* no contig there

* fix padding with expanding

* might fix tests

* still doesn't fix bug, but should be there

* Revert "still doesn't fix bug, but should be there"

This reverts commit 8ea92f3e07.

* minor cleanups
2023-09-23 10:05:13 +08:00
Umut Zengin
3987280daf Fix VALIDHACKS for Images and make it default (#1832)
* valid hacks

* valid hacks

* valid hacks

* new method

* new method

* handtune

* is gate load breaking?

* lint

ruff

less junk

new approach?

maybe this?

* Make it more clear

* Make it more clear

* Will deal with the linter later

* hack for linter

* subs the idx but dont touch the valid

* Updated the mod rules

* lint hack

* I believe bug fix lets see

* Mod Node left

* revert

* Maybe this wont break?

* revert

* implemented "handtuned garbage"

* revert and use VALIDHACKS

* Lets see the CI

* still broken?

* currently its jungle

* maybe this jungle ?

* This works for everything somehow

* Added test for symbolic

* lint

* final touch

* This still works

* lint

* midway clean

* less garbage

* lint

* final form

* Slow but working way

* lint and other stuff

* lint

* mypy

* Make sure CI test Openpilot valid checks

* test if CI break

* Convert back

* refactor

* refactor

* Managed to reduce openpilot time from 30 secs to 5 secs

* Refactor

* Substitute a node with variable

* flake8

* Comment and refactor

* More comprehensive mod

* refactor

* bug fix

* More shave off

* remove not sure part
2023-09-23 07:34:43 +08:00
Gijs Koning
767bb35903 Enable symbolic ops tests for LLVM (#1898)
* Enable symbolic tests for HIP and LLVM

* Only llvm
2023-09-23 07:30:26 +08:00
Gijs Koning
b8ff20ffe4 Gpt2 (#1896)
* small helps

* got something working

* faster?

* faster yes

* cleanup

* cleanup

* cleanup

* Fix non jit

* Fix fp16 and some cleanup

* Fix fp16 and some cleanup

* cleanup

* similar to master

* cleanup
2023-09-22 20:14:47 +08:00
chenyu
b89ee1ac83 lazy type annotation and cleanups (#1897) 2023-09-22 14:20:23 +08:00
George Hotz
78576915de Add needed contiguous to DiskBuffer. SHM support on OSX (#1891)
* add some contiguous

* remove second contig

* Revert "remove second contig"

This reverts commit fc164f7dca1ad75b1e466e4e45a05eca58b7e0e0.

* shm on osx

* can repro bug

* don't contig zeros and ones
2023-09-22 09:16:42 +08:00
qazal
d0e752003d fixes (#1893) 2023-09-22 07:20:27 +08:00
wozeparrot
009a99a0b1 feat: way cleaner hip wrapper (#1895) 2023-09-22 07:20:03 +08:00
Yixiang Gao
cb5d6576cb cifar step time 65ms while stay above 94% (#1888)
* change reduceop heruistics

* add model ema and jit hack

* add ema eval

* have to create a duplicate eval function for jit

* remove manual seed

* 94% achieveable with normal eval

* ema is outputting the same results as normal

* fix ema bug

* ema achieves 94% with fix seed

* multigpu tested

* constant fold decay, fix jit, adjust message for multigpu

* pull SpeedyResNet out of train_cifar()
2023-09-21 11:19:32 +08:00
kormann
864746d6aa polish print_tree (#1868)
* fix

* isinstance
2023-09-21 11:13:10 +08:00
chenyu
a5090f0ee9 remove NumNode.int() (#1876) 2023-09-21 10:29:16 +08:00
Gijs Koning
9eb6310686 Fix gpt optimization (#1885)
* fix for gpt

* the actual fix

* Remove change in symbolic

* small comment
2023-09-21 10:28:18 +08:00
Szymon Ożóg
bd3444797b make ssa assign r[u] (#1887) 2023-09-21 10:20:20 +08:00
nimlgen
9450e41f70 no import when Python is shutting down (#1875) 2023-09-20 12:47:02 -04:00
Yixiang Gao
84ab47a90a add branch up-to-date check (#1879) 2023-09-20 12:41:51 -04:00
nimlgen
504bb6d0ea support symbolic jit in HIP (#1877) 2023-09-20 01:44:26 -04:00
chenyu
cd66c9e249 no numnode in shape (#1871) 2023-09-17 07:49:45 +08:00
Yixiang Gao
18ec5a9e09 add comment bot to CI (#1873) 2023-09-16 12:22:06 -04:00
Yixiang Gao
a27f6c7d62 add diff mode to sz.py (#1872) 2023-09-16 00:43:47 -04:00
nimlgen
4c31dfafb3 add seed to gpt-2 (#1869) 2023-09-15 17:34:14 -04:00
wozeparrot
c870764940 Revert "add line changes diff bot to CI (#1863)" (#1870) 2023-09-15 16:56:42 -04:00
Yixiang Gao
789c84a7a3 add line changes diff bot to CI (#1863) 2023-09-15 16:29:58 -04:00
chenyu
29ac8293d7 run gpt2 in CI (#1866) 2023-09-15 04:37:02 +08:00
chenyu
1b46de1a3e fix type of helpers.prod, add test cases (#1859) 2023-09-14 05:16:55 +08:00
chenyu
e67306ba04 symbolic shape type with TypeGuard (#1852) 2023-09-13 05:27:22 +08:00
Roelof van Dijk
c91b44f7bf refactor: move size to view (#1848)
* refactor: move size to view

* fix: pylint

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-11 07:16:04 -07:00
chenyu
9e9ea20784 Fix view, CI cpu test with python 3.8 (#1845) 2023-09-10 22:37:58 -04:00
chenyu
3ec301c2d7 apply view.py patch (#1844) 2023-09-10 17:32:15 -07:00
Yixiang Gao
a32951a001 add test_tensor_copy (#1840)
* add  test_tensor_copy

* fix whitespace

* add value check
2023-09-10 16:01:58 -07:00
Roelof van Dijk
1bc52c60df fix: minor tweaks to view (#1842)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-10 15:55:57 -07:00
George Hotz
47e602f717 view: do not trade complexity for speed (#1839)
* view: do not trade complexity for speed

* staticmethods

* view create
2023-09-10 11:29:53 -07:00
chenyu
c0bc4cfbaf DivNode.b is int (#1833) 2023-09-10 09:04:29 -07:00
nimlgen
13790b1e20 cast types in render_load (#1837) 2023-09-10 07:58:13 -07:00