Roelof van Dijk
e2cf0f322e
[READY] ci: missing n=auto ( #1486 )
...
* ci: missing n=auto
* fix: add to commented test
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-08 07:37:24 -07:00
Roelof van Dijk
0ce7511110
fix: is not use with a literal ( #1487 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-08 07:35:30 -07:00
nimlgen
932dad1a2b
fix cast bool->float in llvmir ( #1480 )
...
Closes #1479
2023-08-07 21:30:51 -07:00
nimlgen
046fd7437a
use fake buffer for external_test_speed_llama.py ( #1478 )
2023-08-07 22:05:44 -04:00
George Hotz
5fdd248617
don't download cifar ( #1472 )
2023-08-06 21:38:59 -07:00
George Hotz
d78fb8f4ed
add stable diffusion and llama ( #1471 )
...
* add stable diffusion and llama
* pretty in CI
* was CI not true
* that
* CI=true, wtf
* pythonpath
* debug=1
* oops, wrong place
* uops test broken for wgpu
* wgpu tests flaky
2023-08-06 21:31:51 -07:00
terafo
24933ab551
Actually flip local_max in CUDA ( #1462 )
...
* Actually do the flip
* Fixed typo
---------
Co-authored-by: terafo <terafo@protonmail.com >
2023-08-06 10:35:25 -07:00
Diogo
d7d1011f1e
Add WEBGPU tests to CI ( #1463 )
...
* webgpu tests
* assert device is webgpu
* missed env set
* exclude failing ci tests
* ignore test file
* changed acc for adam test
2023-08-06 10:32:01 -07:00
George Hotz
486a9dbfd9
speed v torch ( #1464 )
...
* speed v torch
* always print
* change print
* torch speed tee
* all exposed
2023-08-06 09:32:33 -07:00
George Hotz
2ab282bfec
run on update_benchmark too ( #1460 )
...
* run on update_benchmark too
* amd inference test
* name it better
* add 10 CIFAR training steps
2023-08-06 08:58:37 -07:00
terafo
3d41674b42
Fixed regression ( #1447 )
...
Co-authored-by: terafo <terafo@protonmail.com >
2023-08-06 07:55:58 -07:00
George Hotz
d67e248d9b
simple bitcast 2 ( #1445 )
...
* simple bitcast 2
* bc 2
* empty
* Revert "empty"
This reverts commit d8ee083655 .
2023-08-06 00:30:50 -07:00
George Hotz
943b227cb1
only on push to master
2023-08-06 00:10:07 -07:00
George Hotz
2274e3e757
Fix benchmark ( #1454 )
...
* do benchmarking
* system
* artifact
* go
* name artifact
* only on push
2023-08-05 23:44:36 -07:00
George Hotz
bf21aec81f
do benchmarking ( #1451 )
...
* do benchmarking
* system
* artifact
* go
* name artifact
2023-08-05 23:35:01 -07:00
nimlgen
1ba8ae62a1
Match Torch speed for sum reduction ( #1387 )
...
Co-authored-by: Alexander Edwards <alex@alexedw.com >
2023-08-05 22:27:33 -07:00
chenyu
09ede08b23
simplify Node.sum aggregating ( #1449 )
2023-08-05 22:19:36 -07:00
George Hotz
7fa730b506
external model benchmark test
2023-08-05 22:10:48 -07:00
chenyu
cb5dcc7b57
remove view_from_shape ( #1448 )
2023-08-05 20:39:13 -07:00
Diogo
e2af95c2f8
moved global_max and local_max to LinearizerOptions also added assert for max bufs ( #1446 )
2023-08-05 18:23:18 -07:00
George Hotz
7b8d06c9f1
test uops ( #1444 )
...
* test uops
* tests should pass
* improve uops
* precision
2023-08-05 12:35:56 -07:00
George Hotz
84c430355e
fix backends for new style ( #1443 )
...
* fix backends for new style
* fix method cache
* fix fakeless
* llvm blacklist
* fix kernel optimizer
2023-08-05 11:07:04 -07:00
George Hotz
67781fcf5d
fix fail fast in CI
2023-08-05 10:24:24 -07:00
George Hotz
bd7f4b1249
move renamer to linearizer ( #1442 )
...
* move renamer to linearizer
* uops converter
* Delete test_uops.py
2023-08-05 08:53:25 -07:00
nimlgen
669b406ec6
correct children count with lazycache ( #1429 )
2023-08-05 00:30:16 -07:00
Felix
97a6029cf7
Corrected a few misspelled words ( #1435 )
2023-08-04 16:51:08 -07:00
Adrian Kretz
043d5f2cb5
Fix NOUNROLL ( #1439 )
2023-08-04 16:50:19 -07:00
Francesco Castelli
579f4615a0
Add assert for wrong matmul/dot shapes ( #1438 )
2023-08-04 18:16:56 -04:00
Umut Zengin
52db7d7435
inf, -inf support for pad ( #1436 )
2023-08-04 15:05:25 -04:00
Alex Telon
7325bc914f
fix: Context ( #1430 )
...
* Fixed issue in Context
* Cleaned up fix
Now that DEBUG.value = 3 always works we can do so in __new__ as well.
2023-08-04 10:53:48 -04:00
ian
c08ed1949f
Fix plt output comment ( #1428 )
2023-08-03 23:35:52 -07:00
wozeparrot
801bed4f66
Add ops_shm ( #1413 )
...
* feat: add ops_shm
* clean: extra newline
* feat: add test
* feat: ci doesn't like that
* feat: ci still doesn't like that
* feat: skip big test on ci
* feat: testing
* feat: big
* feat: testing again
* feat: reskip test
2023-08-03 17:40:52 -07:00
chenyu
34f348643b
Support constant expand to symbolic shape ( #1411 )
2023-08-02 21:21:22 -07:00
chenyu
6572ca6835
support symbolic expand ( #1407 )
2023-08-02 20:03:46 -04:00
wozeparrot
a367f71fea
fix: don't put kernels into cache when optimizing ( #1409 )
2023-08-02 18:17:16 -04:00
Paolo Gavazzi
9ffa1eb7e2
Removed dep of torch, torchaudio, kept librosa only ( #1264 )
2023-08-02 13:52:04 -04:00
George Hotz
fc2303e520
gitignore in weights
2023-08-02 16:26:41 +00:00
chenyu
18d0a93f09
LazyBuffer.get_variable_buffers() ( #1391 )
...
* LazyBudder.get_variable_buffers()
* remove left_only, add ProdNode
* no vars for OpNode.b
* do not change symbolic vars, remove ProdNode
2023-08-02 09:01:35 -07:00
Umut Zengin
8889821547
Const pad support to pad2d and slice ( #1392 )
...
* slice to pad2d migrate
* Gain line
* Mypy happy
* Mypy happy
* Revert
* whitespace
2023-08-02 08:58:52 -07:00
wozeparrot
ab9e4a2e93
Make cuda CI a bit more consistent ( #1403 )
...
* feat: use fast-apt-mirror
* feat: use in more places
2023-08-02 07:38:22 -07:00
wozeparrot
7aff8c4ded
cl fixes ( #1402 )
...
* feat: non-blocking
* feat: store event on buffer
2023-08-01 22:13:51 -07:00
Alex Telon
b66361843a
Timing and Context can now be used as decorators ( #1385 )
...
* Context and Timing can now be used as decorators
* Using Timing decorator in quickstart.md
The time formating is better and is a useful tool to learn.
Old: Time: 3.5260659999912605
New: Time: 3526.14 ms
* Updated env_vars documentation for Context
* Added test for Context decorator
* Put new import on same line as others
2023-08-01 17:16:10 -07:00
chenyu
d9d1372dd0
Update pytest.ini format ( #1398 )
2023-08-01 18:00:51 -04:00
George Hotz
f4218b709f
Revert "Improve Metal runtime command buffer handling ( #1335 )" ( #1397 )
...
This reverts commit bd54105b6b .
2023-08-01 12:10:20 -07:00
Diogo
4dc8595069
simple exporting models ( #1344 )
...
* unified exporting
* json exporting
* ignore more
* simplified buffer export
* added dtypes
* added assert
* swift example
* fix tests
* linter
* remove whitespace
* fixed tests
* remove swift example
* remove unintended changes
* allow callable models to be used
* whitespace
* more readable json export
* name change
* whitespace
* whitespace
2023-08-01 09:35:48 -07:00
wozeparrot
7c7cf16ef2
use host ptr for speed on copyouts ( #1393 )
...
* feat: use mapped buffer for speed
* fix: whoops don't need that
* feat: don't need explicit call to memoryview
2023-08-01 09:34:12 -07:00
Diogo
ba5e3818a0
Limit dims based on max size ( #1390 )
...
* working
* whitespace
* changed defaults to None
* linter
* last linter error
2023-07-31 19:18:19 -07:00
chenyu
b2fde9ec36
reshape to register variable value ( #1386 )
...
* reshape to register variable value
* better error message
2023-07-31 17:10:02 -07:00
Umut Zengin
0de5f20970
Re-open constant pad support to Tensor.pad ( #1388 )
...
* Added const padding support to .pad
* Linter
2023-07-31 17:08:57 -07:00
David Hou
3300d0aeaf
syncthreads before wmma ( #1389 )
...
(venv) chaos@tiny3:~/tinygrad$ KX=2 KY=2 N=2048 python extra/gemm/hip_matmul.py
4194304 289.60 us, would be 59322.55 GFLOPS matmul, 173.80 GB/s
2023-07-31 17:05:49 -07:00