Commit Graph

1242 Commits

Author SHA1 Message Date
JaSpa99
491e85597a Run onnx commavq model (#1537)
* try to run commavq

* fix 0 dim, start implementing new ops

- Implement EmbedLayerNormalization
- Implement Attention

* SkipLayerNormalization and FastGelu

* use original torch model, cast inputs

* fix some ops:

- properly do Cast
- Attention: bi- and unidirectional
- FastGelu: add bias before gelu

* cleanup onnx_ops.py

* add validation option to benchmark

* cleanup imports

* add checks incase onnx2torch implements ops in future

* run onnx instead of original torch

* just skip gpu on m1

* reactivate the other models

* check for strange params & squash whitespace

* cleanup

* fix causal mask Attention

* Range doesn't need int cast

* embedding vocab_counter same dtype as input

* no need to cast

* always validate, fix PosixPath ort

---------

Co-authored-by: George Hotz <george@comma.ai>
2023-08-16 12:24:40 -07:00
George Hotz
f8109b830c promote assembly to the main codebase (#1544)
* promote assembly to the main codebase

* not namedtuple
2023-08-14 22:47:45 -07:00
Steven Anderson
93a36c3659 Arm (#1421)
* testing new memops

* better debugging

* testing padded conv

* branching with load

* refactoring a bit

* first try

* fixing bugs

* fixing some

* eq

* eq2

* do not use x's

* working

* fixing imm

* getting things working

* refactor

* pow not working

* working except one

* refactor: one store mem

* refactor: global load

* refactor: imm

* refactor: cleaning

* fixing big offsets

* refactor with ci

* try ci

* typo

* another typo

* ubuntu default

* forgot git

* do i need git?

* missing packages

* adding python-dev

* with cache?

* buildx action

* buildx name issue?

* maybe now?

* python3

* newline warning

* maybe now

* i actually need this

* ci should work now

* improved caching

* fixing cache

* maybe now it will cache

* this

* testing cache

* trying again

* load

* missing platform

* caching gha

* testing cache

* full testing

* typo

* now?

* why

* adding checkout back

* bad formatting

* fixing convention issues

* supporting python

* adding CI flag

* testing all

* better comments

* adding debugging

* takes 12x longer

* does it output progress now?

* ignore models for speed

* fixing merge

* excluding conv_transpose2d

* only 2 test cuz is to slow

* another approach

* let's see

* faster duh

* my bad

* T_T

* typo

* sup

* with output?

* comment test

* comment test

* comment test

* :?

* no comment

* with cache

* back to normal

* testing that ci works

* back to passing

* trying again

* does it create another entry

* does it create another entry?

* build local

* hey

* Revert "excluding conv_transpose2d"

This reverts commit cc7348de03.

* does it cache if done before?

* does it cache?

* done

* adding test ops

* bad formatting

* no need for this

* working static mem

* sum 1d

* add ndim

* better reg import

* fix stack

* back to np

* working except for softmax

* 5 failing

* no pogress

* remove keystone

* remove keystone

* testops passing

* cleanups

* more cleanup

* typo

* ci

* ci2

* cond import

* ci3

* ci4

* ci4

* ci5

* ci5

* ci6

* aligment

* test all

* correct test

* err read_unmapped

* passing test

* ignore for speed

* ignore for speed

* ci7

* cleanup

* remove docker

* fixing merge

* fixing bugs

* add skipload for const ops

* comments

* First merge to master: Renderer

* fix emulation

* passing all tests arm64

* cleaning

* fix handcoded binary

* cleaning

* fix errs

* fix runtime arg binary

* clean git diff

* fix and clean

* fixing metal test

* cleaning

* fix metal test

* ci ~8 min

* fix pylint and clang

* cache the files in ops_clang

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-08-14 19:29:30 -07:00
Szymon Ożóg
330fb7b1a3 Print more meaningfull hip error messages (#1530) 2023-08-12 07:16:20 -07:00
wozeparrot
29d5801387 distributed collectives (#1519)
* feat: world

* feat: tests

* feat: no more backwards

* feat: recv into

* feat: whoops

* feat: test in ci

* feat: some debug logging

* feat: workflow naming

* feat: need to set pythonpath

* feat: just send to same device

* feat: allreduce

* feat: test

* feat: need contiguous

* feat: test in ci

* feat: exit with correct code

* feat: don't need that

* feat: opencl wait_for just doesn't work

* feat: synchronize on out

* feat: try?

* feat: try again?

* feat: add extra realizes

* feat: print

* feat: seed

* feat: tol

* feat: test ones and zeros

* feat: remove print

* feat: are you just flaky

* feat: seperate scatter and gather?

* feat: just try synchronizing

* feat: remove print again

* feat: bring back difference

* feat: no sync

* feat: revert that

* feat: back to wait_for

* fix: typo
2023-08-11 10:22:07 -07:00
wozeparrot
7e7c9001e9 distributed world (#1481)
* feat: world

* feat: tests

* feat: no more backwards

* feat: recv into

* feat: whoops

* feat: test in ci

* feat: some debug logging

* feat: workflow naming

* feat: need to set pythonpath

* feat: just send to same device
2023-08-10 10:00:51 -07:00
George Hotz
c417cd3c97 fast HIP gemm -> 100 TFLOPS (#1476)
* fast HIP gemm

* wmma

* correct b

* fix spilling

* 60 TFLOPS

* 64 TFLOPS

* 65 TFLOPS
2023-08-09 06:54:15 -07:00
Yixiang Gao
6480a1a180 CIFAR 94.03% (#1340)
* add disk_tensor

* fix jit

* new baseline before whitening

* whitening through torch

* whiting done currently at 91.65%

* 91.99%

* clean up mixup and 92.3%

* clean up 92.30%

* 92.49% before searching for new hyper-parameters

* fix CI

* fix white space

* add whitening init in test

* refactor, update hyperpara, 92.72%

* converting whiting to tinygrad operation

* update CI kernels count for CIFAR

* add pad reflect

* add random crop 92.53%

* update hyperpara 93%

* 93.15% on docker container, need to refactor the assignment for hyper param

* print out weights and bias to be separated

* bias/non-bias params separated

* fix whitespace

* clean up

* refactor hyper-param with dict

* refactor lr schedular params

* fix whitespace

* fix cross entropy loss

* fix whitespace

* move opt hyp to hyp dict

* minor fixup

* adjust model, loss scaling

* 92.74% while using half of compute as before

* update hyp for cutmix

* random shuffle during batches

* clean up

* updating the model

* update ConvGroup

* disable gradients for batchnorm layer weights

* whitespace

* 93.92%

* clean up

* finally 94%git add .!

* rewrite whitening to remove dependency on torch

* whitespace

* remove dependency on torch, 93.91%

* back to 94.03%

* clean up

* update test_real_world
2023-08-08 15:13:24 -07:00
George Hotz
d24f936501 just cmplt (#1493)
* just cmplt

* fix maximum

* don't save, there's no backward

* ugh, no slot either

* eq is a scam
2023-08-08 13:58:10 -07:00
Roelof van Dijk
0ce7511110 fix: is not use with a literal (#1487)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-08 07:35:30 -07:00
Diogo
4dc8595069 simple exporting models (#1344)
* unified exporting

* json exporting

* ignore more

* simplified buffer export

* added dtypes

* added assert

* swift example

* fix tests

* linter

* remove whitespace

* fixed tests

* remove swift example

* remove unintended changes

* allow callable models to be used

* whitespace

* more readable json export

* name change

* whitespace

* whitespace
2023-08-01 09:35:48 -07:00
David Hou
3300d0aeaf syncthreads before wmma (#1389)
(venv) chaos@tiny3:~/tinygrad$ KX=2 KY=2 N=2048 python extra/gemm/hip_matmul.py
   4194304    289.60 us, would be  59322.55 GFLOPS matmul, 173.80 GB/s
2023-07-31 17:05:49 -07:00
George Hotz
37fa7e96fb Revert "update editorconfig, enforce via CI (#1343)" (#1380)
This reverts commit da2efecbe2.
2023-07-31 10:35:50 -07:00
Pavol Rusnak
da2efecbe2 update editorconfig, enforce via CI (#1343)
* update editorconfig to set unix-style newlines and trim whitespace

* add editorconfig github action to the CI

* fix whitespace
2023-07-30 18:44:30 -07:00
Cole Sutyak
2d4e182294 change fetch to allow for local file selection (#1309) 2023-07-23 15:00:16 -04:00
Jacob Pradels
b112edd2c3 Add pylint trailing whitespace rule (#1314) 2023-07-21 13:37:55 -04:00
madt2709
d2c1e8409a Update arange to be (start, stop, step) (#1308) 2023-07-21 00:27:23 -04:00
wozeparrot
37cc33269a cl fixes for multigpu (#1276)
* feat: opencl fixes for multigpu usage

* clean: who needs this import anyways
2023-07-18 19:59:30 -07:00
George Hotz
ab3d281a6e Refactor MemOps (#1256)
* metal tests pass locally

* define global

* refactor DEFINE_GLOBAL

* move assembly out. it isn't tested

* fix llvm
2023-07-17 16:36:33 -07:00
Stan
91f797cd52 Moved mkdir in utils.download_file to diff line (#1249)
* Moved mkdir to diff line

.mkdir does not return the actual directory being created.

* use walrus operator to simplify
2023-07-16 00:30:46 -07:00
Yixiang Gao
a8f2c16f8e add contiguous (#1246) 2023-07-15 08:36:34 -07:00
George Hotz
67e34b356a good stuff from tensor cores branch (#1199) 2023-07-08 16:58:26 -07:00
Jacky Lee
e0c2ae8984 Update file paths (#1179) 2023-07-07 18:41:58 -07:00
George Hotz
b8dfbba703 hip_matmul: f16 gemm 2048x2048 gets 36 TFLOPS 2023-07-08 00:35:45 +00:00
Stan
69d33cab0d Fix: auto create parent dir when downloading file (#1173)
* Fix: auto create parent dir when downloading file

also removed duplicate import `os`

* Added test for auto parent dir creation when downloading file
2023-07-07 13:40:29 -07:00
terafo
aa60feda48 Fix naming conflict with huggingface datasets (#1161)
* Rename in files

* Move files

* Moved to extra/datasets as suggested

* Changes to files

* Fixed stupid mistake

---------

Co-authored-by: terafo <terafo@protonmail.com>
2023-07-07 10:43:44 -07:00
Stan
9b6e57eccd helpers.py: improved test coverage + exception handling (#1165)
* Fixes + improved test coverage for helpers.py

- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread

* Made `_early_exec_process` catch any Exception

 Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`

* Fixed `from tinygrad.helpers import Timing` import

oops, for some reason my IDE cleaned that import from extra/helpers.

* Fixed import in llama.py

Another one that I skipped by accident, mybad

* Extracted a class for tests of early exec

* Normalize line endings, windows uses /r/n

* Made `cross_process` not a daemon
2023-07-07 10:26:05 -07:00
Kunwar Raj Singh
8391648822 Over 90% on CIFAR with examples/hlb_cifar10.py (#1073)
* fix eval, lr decay, best eval

* 82.27

* 82.64

* 82.79, reproducable

* add lr sched, 85.26

* 87.42

* 87.94

* 87.42

* tta with flip

* training flip aug

* refactor

* using Tensor for LR is faster

* 89.5

* refactor, flip only train set

* 90.01

* 90.64

* eval jit

* refactor

* only JIT model

* fix eval JIT

* fix eval JIT

* 90.82

* STEPS=900 reaches 90.22

* TTA envvar

* TTA default 0

* fully jit training

* refactor optim

* fix sched

* add label smoothing

* param changes

* patial gelu

* OneCycle with pause

* gelu maybe works

* 90.12

* remove pause lr

* maybe fix lr schedulers

* scheduler test passing

* comments

* try mixup

* shuffle!

* add back the missing last eval

* fix shuffle bugs

* add mixup prob

* fix mixup prob

* 90.19

* correct mixup

* correct mixup

* correct mixup

* 90.24

* 90.33

* refactor, add type hints

* add gradient clipping

* maybe fix test

* full JIT

* back to relu for now

* pass mixup prob as param

* add typehints

* maybe CI works

* try erf gelu

* CI, types

* remove useless import/

* refactor optim

* refactor optim

* try leakyrelu

* try celu

* gelu

* 90.67

* remove grad clip

* remove grad clip tests

* revert params

* add test for OneCycleLR

* 90.62

* fix eval timing

* fix eval timing again

* so where i calculate mixup_prob matters

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-06 20:46:22 -07:00
Eli Frigo
801564f31b Remove POW llop and add SQRT llop (#1104)
* fixed division by zero for fast operations

* made et closer to 0

* replace POW llop with SQRT

* updated mlops to swap SQRT and POW llops

* updated hlops to swap POW and SQRT

* added sqrt llop to cpu runtime

* added sqrt llop to cstyle codegen

* added POW llop to llvm ir codegen

* added SQRT llop to torch runtime

* moved pow from mlops to hlops

* found a better way to do reverse pow

* fixed indentation

* added SQRT llop to triton

* update docs to match new llops

* removed POW operator from assembly codegen

* added sqrt and rsqrt to pow hlop

* rewrote pow function in tensor.py

* Adjust tolerance

* Adjust for adamw

* Reduce for Adam too

* removed accidental leftover code

* removed all of accidental code

* added rsqrt test

* removed pow from mlops again

it was added back when resolving merge conflicts

---------

Co-authored-by: Jacky Lee <jla524@sfu.ca>
2023-07-05 18:07:58 -07:00
Reza Rezvan
d1356cac27 Fix: Jacobian tests [WIP] (#1126)
* Fix: Jacobian tests; num_jacobian either bugged or not accurate enough;

* Fix: Jacobian tests;

* Fix: Gradcheck;
2023-07-05 15:36:22 -07:00
Mehmet Kuzucu
c3173ff281 Add return statement to the train function (#1135)
add a return statement to the train function in order to provide access to the losses and accuracies lists
2023-07-05 08:13:38 -07:00
George Hotz
2f968f8547 ignore cloudpickle type for local mypy 2023-07-04 13:51:20 -07:00
Daniel Hipke
b4ce23e4b8 Make cross_process use cloudpickle (#1118)
* fix syntax issues in imagenet_download.py

* use cloudpickle in cross_process to make it work in Python 3.9+

* add cross_process test

* prevent unpickling on every function call

* add cloudpickle to setup.py

* add support for args/kwargs
2023-07-04 00:47:34 -07:00
Anselm Coogan
a22aad7d32 Use generators instead of lists in anys and alls (#1111)
* Use generators in any(..) instead of lists for better best-case

* Use generators in all(...) instead of lists

* enable R1729 in .pylintrc

* revert import sorting

---------

Co-authored-by: Anselm Coogan <anselm@scandit.com>
2023-07-03 16:06:06 -07:00
Frank Pinnola
2071e53da8 Handle broadcast flag on gemm (#1103) 2023-07-02 22:15:07 -07:00
Rob Grossman
c8ddc34368 include missing queue in thneed load (#1095) 2023-07-02 12:33:59 -07:00
George Hotz
e234bf2298 hip matmul : add K support 2023-06-28 19:54:33 +00:00
George Hotz
0e93b9642a hip matmul 2023-06-28 19:21:01 +00:00
George Hotz
6ec0a24706 imagenet eval in 1 min 28 sec 2023-06-28 04:23:26 +00:00
George Hotz
9c6e507518 move accel into extra 2023-06-23 16:38:15 -07:00
Diogo
57d3aa76a5 Windows & Ubuntu CLANG CI support (#1011)
* matrix strategy

* push env to GITHUB_ENV

* use printf instead of echo

* use temp helper function for cross os paths

* use path join

* switched to using temp helper function

* skip test on windows due to memory limit

* small fix

* removed semi

* touchups

* clean up

* seperate tests

* test changes to test_utils on windows

* small refactor

* more cleanups

* undo helpers change

* only skip if in CI and WINDOWS
2023-06-19 09:33:24 -07:00
Alex Wang
3d63c71e27 HIP backend (#750)
* llama works for HIP backend

* Use hipMemcpyAsync; Less lines of code

* Remove unused code

* Refactor

* Add comments; hipDeviceSynchronize

* HIP over GPU; Remove PyHIP dependency

* Cleanups

* Fix mypy check

* Merge master; Dump assembly code
2023-06-18 11:35:57 -07:00
Casey Primozic
805eef10dd Add tensorflow GEMM benchmark script (#1000)
* Modelled closely after the existing torch benchmark script but just adapted slightly for tensorflow
2023-06-18 10:57:45 -07:00
Diogo
d2b837c1d9 Adds floor/ceil (#989)
* floor ceil impl

* control casting in numpy
2023-06-17 10:56:21 -07:00
George Hotz
fe71282ba1 faster RDNA assembly backend (#990)
* fast asm

* torch gemm
2023-06-16 12:06:38 -07:00
George Hotz
ba56ee6020 RDNA assembly backend ($1000 bounty) (#787)
* Revert "Revert "ops rdna""

This reverts commit 0400315078.

* Revert "Revert "writing 2""

This reverts commit 325a3bf2cf.

* no dump

* 2x 2

* simple asm

* local size

* sub

* lil work

* support args != 3

* assembler work

* generate that

* ptx assembler

* begin index renderer

* max

* ptx loops

* gemms work

* valid works

* asm working a bit more

* close

* passing all ops tests

* ptx is a codegen only, not a backend

* ptx

* float16 support

* rdna goes here

* install types

* make amd disassemble

* ansilen for pretty print

* fix ptx log2/exp2

* assemblyinstruction

* new asm

* working gemm

* fix cmp

* more passing

* mod

* ptx works again

* rdan3 add works

* log exp

* sin is sin 2pi

* fix types

* progress

* loops work

* rdna xyz

* better addressing

* cleanups

* handle exception in early process

* div support

* rdna float4

* locals work

* fix neg index

* cast

* smaller diff

* yaml

* import only if selected

* fromimport

* types

* this all needs rewriting

* a few more
2023-06-16 09:33:18 -07:00
Yahya Lmallas
804c45b5fc FIX: Can't pickle local object (#979)
_early_exec_process is a local function that is defined whiting the scope of another function, should be global
2023-06-14 12:32:17 -07:00
Steven Anderson
e54b6c5e7f One hot (#972)
* passing with 1d indices

* passing all test

* cleanup

* using safe_numpy for scalar
2023-06-12 10:13:29 -07:00
Diogo
2d4370b487 Adds tril & triu support (#936)
* triu & tril support

* lint and kernel count error

* switched shape indicies

* larger shape tests

* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
Steven Anderson
c0e558b77c Test nllloss (#958)
* works but slow

* work with NC and NCd1 it still slow

* refactor

* support for k dimensions

* without numpy
2023-06-09 09:00:29 -07:00