Commit Graph

2368 Commits

Author SHA1 Message Date
Szymon Ożóg
5c6edfb064 use triton tests 2023-08-22 10:54:07 +02:00
Szymon Ożóg
fbb3793b7a Add triton to existing testing routine 2023-08-22 10:32:58 +02:00
Szymon Ożóg
39d238734b remove pytorch cpu extra index 2023-08-22 09:47:45 +02:00
Szymon Ożóg
991c9eaf6f Merge remote-tracking branch 'upstream/master' into triton 2023-08-22 09:03:26 +02:00
Szymon Ożóg
3a87a32b23 ignore test example 2023-08-22 09:03:07 +02:00
Szymon Ożóg
0b4d1022e8 ignore triton saturation tests 2023-08-22 09:02:52 +02:00
Szymon Ożóg
1be36efb9d linter errors 2023-08-22 08:47:14 +02:00
Szymon Ożóg
d3f370d69d pretty ptx print on debug 5 2023-08-22 08:32:36 +02:00
Szymon Ożóg
4e18f4e7ae remove print and starting whitespace 2023-08-22 08:32:22 +02:00
Szymon Ożóg
1d5cd82ed5 split file into renderer and program 2023-08-22 08:26:14 +02:00
Szymon Ożóg
4220908646 Old testing routine 2023-08-22 08:21:10 +02:00
Szymon Ożóg
bf092c55ac fix envs in testing 2023-08-22 08:10:14 +02:00
Roelof van Dijk
8e8724d3a8 perf: if argument order (mops) (#1599)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-21 11:20:39 -07:00
George Hotz
2e60920317 Revert "sparse cat cross entropy (#1591)" (#1596)
This reverts commit f0ee850e98.
2023-08-21 10:04:26 -07:00
Yixiang Gao
f0ee850e98 sparse cat cross entropy (#1591)
* add sparse cat cross entropy

* minor fix

* add log_softmax into loss function

* add test

* update docs
2023-08-21 09:56:41 -07:00
Yixiang Gao
8d6662a741 .cpu().numpy() -> .numpy() (#1594)
* .cpu().numpy() -> .numpy()

* restore ops_torch

* restore test_speed_v_torch
2023-08-21 09:53:29 -07:00
Szymon Ożóg
8fcc25d0aa Merge triton tests into global tests 2023-08-21 18:08:01 +02:00
Szymon Ożóg
f7ab9051f2 accumulate local shapes instead of using max shape 2023-08-21 18:06:59 +02:00
Szymon Ożóg
2bdd60565c Merge remote-tracking branch 'upstream/master' into triton 2023-08-21 17:53:17 +02:00
Szymon Ożóg
4bd31550b5 Add local size override 2023-08-21 09:56:29 +02:00
Szymon Ożóg
13eed8f9cc Add wait logic 2023-08-21 09:55:54 +02:00
Umut Zengin
35bf21276f Argmax/Argmin Feature (#1576)
* implemented argmax and argmin

* lint

* lint

* match torch behaviour

* format

* removed flip
2023-08-20 18:46:46 -07:00
Roelof van Dijk
1900acda09 [READY] ci: setup venv cache (#1475)
* ci: cache installed packages

* ci: trigger jobs

* ci: fix hashfiles argument

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-20 18:43:16 -07:00
Umut Zengin
3fc7e984f0 __getitem__ refactoring (#1586)
* dene

* dene

* form

* form

* form

* form

* lint

* small change

* preserve old

* revert to explicit reshape
2023-08-20 18:42:30 -07:00
George Hotz
d627349af0 teeny changes (#1589)
* teeny changes

* import order
2023-08-20 13:38:38 -07:00
George Hotz
012ee7d162 not worth the speed (#1584)
* not worth the speed

* no slots

* uops comments

* bump to python 3.11 for speed

* add critical slots back
2023-08-20 10:24:58 -07:00
George Hotz
739f327d2d Shorter (#1582)
* deleting lines

* remove insert dims

* if statement is never hit

* bug fixes
2023-08-20 08:12:16 -07:00
Szymon Ożóg
6f38842ef0 fix linter error 2023-08-20 10:35:28 +02:00
Szymon Ożóg
4b4280bcfb remove deprecated import 2023-08-20 10:34:59 +02:00
Szymon Ożóg
59dc1ad772 Enable test_nn 2023-08-20 10:31:50 +02:00
David Hou
4fbce972d7 CSE at uop level (#1483)
* uop-level cse

* add test

* don't cache reduce alu ops

* types

* rename variable

* fix

* delete lines
2023-08-19 23:40:40 -07:00
George Hotz
b9feb1b743 fp16 support in stable diffusion 2023-08-20 05:37:21 +00:00
George Hotz
ad7d26c393 fix __launch_bounds__ and benchmark TC MATMUL (#1575)
* fix

* benchmark matmul
2023-08-19 10:54:39 -07:00
David Hou
92754e177c cache buffer loads across multiple bufs (#1482)
* cache loads across buffers (since they may share rawbufs)

* typing

* add test

* fix test

* small changes to test

* fix test

* one big cache

* whitespace

* golf a line?

* invalid is RawBuffer(0)[0], valid 1.
2023-08-19 09:09:58 -07:00
George Hotz
e464442adf WMMA for 7900XTX (#1563)
* go

* hip no LRU

* work

* works

* 16 TFLOPS

* 29 TFLOPS

* 30 TFLOPS

* never mind, it's 60 TFLOPS

* fix metal WMMA

* put hip alloc back
2023-08-19 09:07:23 -07:00
Szymon Ożóg
5533db9a6a const ops match local shape 2023-08-19 14:31:36 +02:00
Szymon Ożóg
320a012772 get global program from name 2023-08-19 14:29:37 +02:00
Szymon Ożóg
4123920bcc remove deprecated variables 2023-08-19 13:56:37 +02:00
Szymon Ożóg
fecc58cc2b proper function name 2023-08-19 13:54:21 +02:00
nimlgen
faa521bcab fix usage of arm64 regs according to CC (#1570) 2023-08-18 21:40:32 -07:00
corranr
68ebbd2954 for issue #1555, int64 and int8 in CI=1 ARM64=1 CLANG=1 (#1572)
* fixed for int8,int64, added dtype broadcasting test, passing all CI,ARM64,CLANG tests

* remove shifts
2023-08-18 21:40:13 -07:00
chenyu
ae39cf84ab Symbolic Shape JIT main PR (#1353)
* Symbolic Shape JIT

update tests

2 variables symbolic ops, adding more tests

test passing

cleanup

* more test cases

* single flag

* review update

* jit attention one piece

* realize

* symbolic_jit test for cuda

* old artifact

* works with cuda gpu but failed ci

* CUDACPU
2023-08-18 14:39:55 -07:00
Roelof van Dijk
84e6693915 fix: apt-get to apt, no recommends, clean up (#1571)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-18 13:48:59 -07:00
wozeparrot
50decf0d45 train cifar using multigpu (#1529)
* feat: train cifar using multigpu

* feat: split eval batch across 5

* feat: cleaner allreduce

* feat: 93.88%

* feat: cleaner batch chunking from bert

* feat: cleaner grad sync

* feat: tinygrad argmax

* feat: make it work with different gpu counts

* feat: move some stuff into the normal __init__

* feat: autodetect gpu count

* feat: move import inside
2023-08-18 09:35:44 -07:00
Szymon Ożóg
b624a374b9 ConstOp loading 2023-08-18 18:30:57 +02:00
Szymon Ożóg
ef757aa5c3 Add TernaryOps 2023-08-18 18:30:18 +02:00
chenyu
be50b2fe8f more symbolic symbolic ops (#1564)
* more symbolic symbolic ops

* handle NumNode in __mul__
2023-08-18 09:21:41 -07:00
chenyu
dfec16cc83 Support arg int for CUDA kernel (#1565) 2023-08-18 09:19:40 -07:00
wozeparrot
15150d60c4 fix: small fix for lru on hip (#1567) 2023-08-18 09:18:38 -07:00
Szymon Ożóg
5430fcb4e9 cuda envs 2023-08-18 17:49:48 +02:00