Commit Graph

2238 Commits

Author SHA1 Message Date
Roelof van Dijk
e2cf0f322e [READY] ci: missing n=auto (#1486)
* ci: missing n=auto

* fix: add to commented test

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-08 07:37:24 -07:00
Roelof van Dijk
0ce7511110 fix: is not use with a literal (#1487)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-08 07:35:30 -07:00
nimlgen
932dad1a2b fix cast bool->float in llvmir (#1480)
Closes #1479
2023-08-07 21:30:51 -07:00
nimlgen
046fd7437a use fake buffer for external_test_speed_llama.py (#1478) 2023-08-07 22:05:44 -04:00
George Hotz
5fdd248617 don't download cifar (#1472) 2023-08-06 21:38:59 -07:00
George Hotz
d78fb8f4ed add stable diffusion and llama (#1471)
* add stable diffusion and llama

* pretty in CI

* was CI not true

* that

* CI=true, wtf

* pythonpath

* debug=1

* oops, wrong place

* uops test broken for wgpu

* wgpu tests flaky
2023-08-06 21:31:51 -07:00
terafo
24933ab551 Actually flip local_max in CUDA (#1462)
* Actually do the flip

* Fixed typo

---------

Co-authored-by: terafo <terafo@protonmail.com>
2023-08-06 10:35:25 -07:00
Diogo
d7d1011f1e Add WEBGPU tests to CI (#1463)
* webgpu tests

* assert device is webgpu

* missed env set

* exclude failing ci tests

* ignore test file

* changed acc for adam test
2023-08-06 10:32:01 -07:00
George Hotz
486a9dbfd9 speed v torch (#1464)
* speed v torch

* always print

* change print

* torch speed tee

* all exposed
2023-08-06 09:32:33 -07:00
George Hotz
2ab282bfec run on update_benchmark too (#1460)
* run on update_benchmark too

* amd inference test

* name it better

* add 10 CIFAR training steps
2023-08-06 08:58:37 -07:00
terafo
3d41674b42 Fixed regression (#1447)
Co-authored-by: terafo <terafo@protonmail.com>
2023-08-06 07:55:58 -07:00
George Hotz
d67e248d9b simple bitcast 2 (#1445)
* simple bitcast 2

* bc 2

* empty

* Revert "empty"

This reverts commit d8ee083655.
2023-08-06 00:30:50 -07:00
George Hotz
943b227cb1 only on push to master 2023-08-06 00:10:07 -07:00
George Hotz
2274e3e757 Fix benchmark (#1454)
* do benchmarking

* system

* artifact

* go

* name artifact

* only on push
2023-08-05 23:44:36 -07:00
George Hotz
bf21aec81f do benchmarking (#1451)
* do benchmarking

* system

* artifact

* go

* name artifact
2023-08-05 23:35:01 -07:00
nimlgen
1ba8ae62a1 Match Torch speed for sum reduction (#1387)
Co-authored-by: Alexander Edwards <alex@alexedw.com>
2023-08-05 22:27:33 -07:00
chenyu
09ede08b23 simplify Node.sum aggregating (#1449) 2023-08-05 22:19:36 -07:00
George Hotz
7fa730b506 external model benchmark test 2023-08-05 22:10:48 -07:00
chenyu
cb5dcc7b57 remove view_from_shape (#1448) 2023-08-05 20:39:13 -07:00
Diogo
e2af95c2f8 moved global_max and local_max to LinearizerOptions also added assert for max bufs (#1446) 2023-08-05 18:23:18 -07:00
George Hotz
7b8d06c9f1 test uops (#1444)
* test uops

* tests should pass

* improve uops

* precision
2023-08-05 12:35:56 -07:00
George Hotz
84c430355e fix backends for new style (#1443)
* fix backends for new style

* fix method cache

* fix fakeless

* llvm blacklist

* fix kernel optimizer
2023-08-05 11:07:04 -07:00
George Hotz
67781fcf5d fix fail fast in CI 2023-08-05 10:24:24 -07:00
George Hotz
bd7f4b1249 move renamer to linearizer (#1442)
* move renamer to linearizer

* uops converter

* Delete test_uops.py
2023-08-05 08:53:25 -07:00
nimlgen
669b406ec6 correct children count with lazycache (#1429) 2023-08-05 00:30:16 -07:00
Felix
97a6029cf7 Corrected a few misspelled words (#1435) 2023-08-04 16:51:08 -07:00
Adrian Kretz
043d5f2cb5 Fix NOUNROLL (#1439) 2023-08-04 16:50:19 -07:00
Francesco Castelli
579f4615a0 Add assert for wrong matmul/dot shapes (#1438) 2023-08-04 18:16:56 -04:00
Umut Zengin
52db7d7435 inf, -inf support for pad (#1436) 2023-08-04 15:05:25 -04:00
Alex Telon
7325bc914f fix: Context (#1430)
* Fixed issue in Context

* Cleaned up fix

Now that DEBUG.value = 3 always works we can do so in __new__ as well.
2023-08-04 10:53:48 -04:00
ian
c08ed1949f Fix plt output comment (#1428) 2023-08-03 23:35:52 -07:00
wozeparrot
801bed4f66 Add ops_shm (#1413)
* feat: add ops_shm

* clean: extra newline

* feat: add test

* feat: ci doesn't like that

* feat: ci still doesn't like that

* feat: skip big test on ci

* feat: testing

* feat: big

* feat: testing again

* feat: reskip test
2023-08-03 17:40:52 -07:00
chenyu
34f348643b Support constant expand to symbolic shape (#1411) 2023-08-02 21:21:22 -07:00
chenyu
6572ca6835 support symbolic expand (#1407) 2023-08-02 20:03:46 -04:00
wozeparrot
a367f71fea fix: don't put kernels into cache when optimizing (#1409) 2023-08-02 18:17:16 -04:00
Paolo Gavazzi
9ffa1eb7e2 Removed dep of torch, torchaudio, kept librosa only (#1264) 2023-08-02 13:52:04 -04:00
George Hotz
fc2303e520 gitignore in weights 2023-08-02 16:26:41 +00:00
chenyu
18d0a93f09 LazyBuffer.get_variable_buffers() (#1391)
* LazyBudder.get_variable_buffers()

* remove left_only, add ProdNode

* no vars for OpNode.b

* do not change symbolic vars, remove ProdNode
2023-08-02 09:01:35 -07:00
Umut Zengin
8889821547 Const pad support to pad2d and slice (#1392)
* slice to pad2d migrate

* Gain line

* Mypy happy

* Mypy happy

* Revert

* whitespace
2023-08-02 08:58:52 -07:00
wozeparrot
ab9e4a2e93 Make cuda CI a bit more consistent (#1403)
* feat: use fast-apt-mirror

* feat: use in more places
2023-08-02 07:38:22 -07:00
wozeparrot
7aff8c4ded cl fixes (#1402)
* feat: non-blocking

* feat: store event on buffer
2023-08-01 22:13:51 -07:00
Alex Telon
b66361843a Timing and Context can now be used as decorators (#1385)
* Context and Timing can now be used as decorators

* Using Timing decorator in quickstart.md

The time formating is better and is a useful tool to learn.

Old: Time: 3.5260659999912605
New: Time: 3526.14 ms

* Updated env_vars documentation for Context

* Added test for Context decorator

* Put new import on same line as others
2023-08-01 17:16:10 -07:00
chenyu
d9d1372dd0 Update pytest.ini format (#1398) 2023-08-01 18:00:51 -04:00
George Hotz
f4218b709f Revert "Improve Metal runtime command buffer handling (#1335)" (#1397)
This reverts commit bd54105b6b.
2023-08-01 12:10:20 -07:00
Diogo
4dc8595069 simple exporting models (#1344)
* unified exporting

* json exporting

* ignore more

* simplified buffer export

* added dtypes

* added assert

* swift example

* fix tests

* linter

* remove whitespace

* fixed tests

* remove swift example

* remove unintended changes

* allow callable models to be used

* whitespace

* more readable json export

* name change

* whitespace

* whitespace
2023-08-01 09:35:48 -07:00
wozeparrot
7c7cf16ef2 use host ptr for speed on copyouts (#1393)
* feat: use mapped buffer for speed

* fix: whoops don't need that

* feat: don't need explicit call to memoryview
2023-08-01 09:34:12 -07:00
Diogo
ba5e3818a0 Limit dims based on max size (#1390)
* working

* whitespace

* changed defaults to None

* linter

* last linter error
2023-07-31 19:18:19 -07:00
chenyu
b2fde9ec36 reshape to register variable value (#1386)
* reshape to register variable value

* better error message
2023-07-31 17:10:02 -07:00
Umut Zengin
0de5f20970 Re-open constant pad support to Tensor.pad (#1388)
* Added const padding support to .pad

* Linter
2023-07-31 17:08:57 -07:00
David Hou
3300d0aeaf syncthreads before wmma (#1389)
(venv) chaos@tiny3:~/tinygrad$ KX=2 KY=2 N=2048 python extra/gemm/hip_matmul.py
   4194304    289.60 us, would be  59322.55 GFLOPS matmul, 173.80 GB/s
2023-07-31 17:05:49 -07:00