Szymon Ożóg
5c6edfb064
use triton tests
2023-08-22 10:54:07 +02:00
Szymon Ożóg
fbb3793b7a
Add triton to existing testing routine
2023-08-22 10:32:58 +02:00
Szymon Ożóg
39d238734b
remove pytorch cpu extra index
2023-08-22 09:47:45 +02:00
Szymon Ożóg
991c9eaf6f
Merge remote-tracking branch 'upstream/master' into triton
2023-08-22 09:03:26 +02:00
Szymon Ożóg
3a87a32b23
ignore test example
2023-08-22 09:03:07 +02:00
Szymon Ożóg
0b4d1022e8
ignore triton saturation tests
2023-08-22 09:02:52 +02:00
Szymon Ożóg
1be36efb9d
linter errors
2023-08-22 08:47:14 +02:00
Szymon Ożóg
d3f370d69d
pretty ptx print on debug 5
2023-08-22 08:32:36 +02:00
Szymon Ożóg
4e18f4e7ae
remove print and starting whitespace
2023-08-22 08:32:22 +02:00
Szymon Ożóg
1d5cd82ed5
split file into renderer and program
2023-08-22 08:26:14 +02:00
Szymon Ożóg
4220908646
Old testing routine
2023-08-22 08:21:10 +02:00
Szymon Ożóg
bf092c55ac
fix envs in testing
2023-08-22 08:10:14 +02:00
Roelof van Dijk
8e8724d3a8
perf: if argument order (mops) ( #1599 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-21 11:20:39 -07:00
George Hotz
2e60920317
Revert "sparse cat cross entropy ( #1591 )" ( #1596 )
...
This reverts commit f0ee850e98 .
2023-08-21 10:04:26 -07:00
Yixiang Gao
f0ee850e98
sparse cat cross entropy ( #1591 )
...
* add sparse cat cross entropy
* minor fix
* add log_softmax into loss function
* add test
* update docs
2023-08-21 09:56:41 -07:00
Yixiang Gao
8d6662a741
.cpu().numpy() -> .numpy() ( #1594 )
...
* .cpu().numpy() -> .numpy()
* restore ops_torch
* restore test_speed_v_torch
2023-08-21 09:53:29 -07:00
Szymon Ożóg
8fcc25d0aa
Merge triton tests into global tests
2023-08-21 18:08:01 +02:00
Szymon Ożóg
f7ab9051f2
accumulate local shapes instead of using max shape
2023-08-21 18:06:59 +02:00
Szymon Ożóg
2bdd60565c
Merge remote-tracking branch 'upstream/master' into triton
2023-08-21 17:53:17 +02:00
Szymon Ożóg
4bd31550b5
Add local size override
2023-08-21 09:56:29 +02:00
Szymon Ożóg
13eed8f9cc
Add wait logic
2023-08-21 09:55:54 +02:00
Umut Zengin
35bf21276f
Argmax/Argmin Feature ( #1576 )
...
* implemented argmax and argmin
* lint
* lint
* match torch behaviour
* format
* removed flip
2023-08-20 18:46:46 -07:00
Roelof van Dijk
1900acda09
[READY] ci: setup venv cache ( #1475 )
...
* ci: cache installed packages
* ci: trigger jobs
* ci: fix hashfiles argument
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-20 18:43:16 -07:00
Umut Zengin
3fc7e984f0
__getitem__ refactoring ( #1586 )
...
* dene
* dene
* form
* form
* form
* form
* lint
* small change
* preserve old
* revert to explicit reshape
2023-08-20 18:42:30 -07:00
George Hotz
d627349af0
teeny changes ( #1589 )
...
* teeny changes
* import order
2023-08-20 13:38:38 -07:00
George Hotz
012ee7d162
not worth the speed ( #1584 )
...
* not worth the speed
* no slots
* uops comments
* bump to python 3.11 for speed
* add critical slots back
2023-08-20 10:24:58 -07:00
George Hotz
739f327d2d
Shorter ( #1582 )
...
* deleting lines
* remove insert dims
* if statement is never hit
* bug fixes
2023-08-20 08:12:16 -07:00
Szymon Ożóg
6f38842ef0
fix linter error
2023-08-20 10:35:28 +02:00
Szymon Ożóg
4b4280bcfb
remove deprecated import
2023-08-20 10:34:59 +02:00
Szymon Ożóg
59dc1ad772
Enable test_nn
2023-08-20 10:31:50 +02:00
David Hou
4fbce972d7
CSE at uop level ( #1483 )
...
* uop-level cse
* add test
* don't cache reduce alu ops
* types
* rename variable
* fix
* delete lines
2023-08-19 23:40:40 -07:00
George Hotz
b9feb1b743
fp16 support in stable diffusion
2023-08-20 05:37:21 +00:00
George Hotz
ad7d26c393
fix __launch_bounds__ and benchmark TC MATMUL ( #1575 )
...
* fix
* benchmark matmul
2023-08-19 10:54:39 -07:00
David Hou
92754e177c
cache buffer loads across multiple bufs ( #1482 )
...
* cache loads across buffers (since they may share rawbufs)
* typing
* add test
* fix test
* small changes to test
* fix test
* one big cache
* whitespace
* golf a line?
* invalid is RawBuffer(0)[0], valid 1.
2023-08-19 09:09:58 -07:00
George Hotz
e464442adf
WMMA for 7900XTX ( #1563 )
...
* go
* hip no LRU
* work
* works
* 16 TFLOPS
* 29 TFLOPS
* 30 TFLOPS
* never mind, it's 60 TFLOPS
* fix metal WMMA
* put hip alloc back
2023-08-19 09:07:23 -07:00
Szymon Ożóg
5533db9a6a
const ops match local shape
2023-08-19 14:31:36 +02:00
Szymon Ożóg
320a012772
get global program from name
2023-08-19 14:29:37 +02:00
Szymon Ożóg
4123920bcc
remove deprecated variables
2023-08-19 13:56:37 +02:00
Szymon Ożóg
fecc58cc2b
proper function name
2023-08-19 13:54:21 +02:00
nimlgen
faa521bcab
fix usage of arm64 regs according to CC ( #1570 )
2023-08-18 21:40:32 -07:00
corranr
68ebbd2954
for issue #1555 , int64 and int8 in CI=1 ARM64=1 CLANG=1 ( #1572 )
...
* fixed for int8,int64, added dtype broadcasting test, passing all CI,ARM64,CLANG tests
* remove shifts
2023-08-18 21:40:13 -07:00
chenyu
ae39cf84ab
Symbolic Shape JIT main PR ( #1353 )
...
* Symbolic Shape JIT
update tests
2 variables symbolic ops, adding more tests
test passing
cleanup
* more test cases
* single flag
* review update
* jit attention one piece
* realize
* symbolic_jit test for cuda
* old artifact
* works with cuda gpu but failed ci
* CUDACPU
2023-08-18 14:39:55 -07:00
Roelof van Dijk
84e6693915
fix: apt-get to apt, no recommends, clean up ( #1571 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-18 13:48:59 -07:00
wozeparrot
50decf0d45
train cifar using multigpu ( #1529 )
...
* feat: train cifar using multigpu
* feat: split eval batch across 5
* feat: cleaner allreduce
* feat: 93.88%
* feat: cleaner batch chunking from bert
* feat: cleaner grad sync
* feat: tinygrad argmax
* feat: make it work with different gpu counts
* feat: move some stuff into the normal __init__
* feat: autodetect gpu count
* feat: move import inside
2023-08-18 09:35:44 -07:00
Szymon Ożóg
b624a374b9
ConstOp loading
2023-08-18 18:30:57 +02:00
Szymon Ożóg
ef757aa5c3
Add TernaryOps
2023-08-18 18:30:18 +02:00
chenyu
be50b2fe8f
more symbolic symbolic ops ( #1564 )
...
* more symbolic symbolic ops
* handle NumNode in __mul__
2023-08-18 09:21:41 -07:00
chenyu
dfec16cc83
Support arg int for CUDA kernel ( #1565 )
2023-08-18 09:19:40 -07:00
wozeparrot
15150d60c4
fix: small fix for lru on hip ( #1567 )
2023-08-18 09:18:38 -07:00
Szymon Ożóg
5430fcb4e9
cuda envs
2023-08-18 17:49:48 +02:00