Commit Graph

1107 Commits

Author SHA1 Message Date
qazal
345457f518 webgpu cache packages (#7911)
* webgpu -n=auto

* fix webgpu ci cache
2024-11-27 00:17:36 +08:00
qazal
6102e3159c webgpu -n=auto (#7910) 2024-11-26 21:13:12 +08:00
George Hotz
4e5bf9dc7a test assignment in jit (#7906)
* test assignment in jit

* don't waste lines

* skip broken test in webgpu
2024-11-26 17:37:00 +08:00
Ahmed Harmouche
10618aba98 Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00
chenyu
ac57d82a13 test_tiny on real NV/CUDA/AMD/HIP (#7886)
simple tests that run on real CUDA and HIP
2024-11-24 16:34:54 -05:00
chenyu
5c5b1b994c less flaky benchmarks (#7855)
JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830
2024-11-22 16:39:39 -05:00
chenyu
d5c9fafff5 default run stable diffusion benchmark with fp16 (#7831)
and keep the non-fp16 one in mac
2024-11-21 15:58:17 -05:00
chenyu
46aa23539f generate and print mypy lineprecision report (#7809) 2024-11-20 16:53:17 -05:00
chenyu
c815d7b56e run bfloat16 tensor core in metal benchmark (#7808)
* run bfloat16 tensor core in metal benchmark

* separate task
2024-11-20 15:34:07 -05:00
chenyu
d5f76462c8 fix CI beautiful_mnist dir (#7790)
fixed `fatal: not a git repository (or any of the parent directories): .git` because $HOME is not $GITHUB_WORKSPACE
2024-11-19 09:59:02 -05:00
George Hotz
fbb4099b3c add test for compile3 [pr] (#7783)
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-19 19:26:51 +08:00
chenyu
9fb396f660 test_ops maxpool2d -> max_pool2d (#7696)
and avgpool2d -> avg_pool2d for better grepping the tests
2024-11-14 10:39:12 -05:00
chenyu
e6cfaaa496 metal benchmark JIT=2 -> JIT=1 (#7661) 2024-11-12 22:55:27 -05:00
chenyu
1884f021e3 add conv3x3 to speed_v_theoretical (#7658)
* add conv3x3 to speed_v_theoretical

* show test duration
2024-11-12 16:41:56 -05:00
chenyu
a88a15c7e8 setup perflevel in red CI (#7645)
runs v4.1 bert setup.
```
rocm-smi --setprofile compute
rocm-smi --setmclk 3
rocm-smi --setperflevel high
```
2024-11-11 18:44:55 -05:00
chenyu
773d5b60bf beam benchmark tests (#7638)
* beam benchmark tests

* lower AMD number somehow

* less flaky
2024-11-11 18:11:18 -05:00
chenyu
bfab03288d fix HALF=1 in test_speed_v_torch (#7642)
* fix HALF=1 in test_speed_v_torch

"operation cache defeats" adds 1 to all arg, which were centered around 0. adding 1 makes big matmul and matvec go inf.

fixed by subtract 1 after and bumpped tolerance for half input

* bigger tol for BIG=2, update CI too

* bigger tol
2024-11-11 14:29:37 -05:00
George Hotz
b4cb6b89f9 hotfix: CI mac uses python 3.11 2024-11-11 23:42:35 +08:00
George Hotz
9648372ee6 hotfix: mac uses python 3.12 2024-11-11 23:23:48 +08:00
George Hotz
d40673505f new cloud is cloudy [pr] (#7631)
* new cloud is cloudy [pr]

* waste lines to add security

* safety, with speed and less lines

* timing and del

* lines

* cleanups

* restore CloudSession

* bump to 3.10

* quotes

* renderer security
2024-11-11 20:18:04 +08:00
chenyu
e7b18cf5c0 fix load_worlds filter_novariable (#7564)
filter based on "DEFINE_VAR" instead of "Variable". also added a unit test to make sure dataset includes image and variable kernels
2024-11-05 16:06:39 -05:00
chenyu
207bca6cea set PAGE_SIZE=1 and generate new dataset (#7559)
13080 rows in total. both generating and loading this are pretty broken now. filters are wrong for example
2024-11-05 11:25:01 -05:00
George Hotz
6f93e91deb hotfix: lower mnist threshold for non determinism 2024-11-03 11:05:12 +08:00
George Hotz
72a9ac27e9 support image dtype in cloud [pr] (#7482)
* support image dtype in cloud [pr]

* remove outdated osx hack

* unused imports
2024-11-02 23:54:27 +08:00
George Hotz
133fe81cc5 Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)" (#7407)
* Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)"

This reverts commit ea5654a9bc.

* test padded in emulation too

* bring back early folding
2024-10-30 23:25:45 +08:00
George Hotz
d9d4dd6756 faster ci [pr] (#7348) 2024-10-29 14:01:44 +08:00
George Hotz
a5e0f59e41 move autogen to different CI runner [pr] (#7346)
* move autogen to different CI runner [pr]

* balance a bit

* readme back there

* compile enet in autogen
2024-10-29 13:35:22 +08:00
George Hotz
f55c3dcff8 hotfix: bump ocelot 2024-10-29 12:46:24 +08:00
George Hotz
4fed358511 hotfix: timeouts to 20 minutes. better no stats update than a red x 2024-10-25 16:31:52 +08:00
chenyu
d4c94d0d32 disable llama 1 4gpu and 6gpu benchmark (#7276)
having llama3 4gpu and 6gpu should be good enough
2024-10-24 14:19:22 -04:00
chenyu
e6929f2402 RUN_PROCESS_REPLAY=0 on llama 70B and resnet training (#7272)
* RUN_PROCESS_REPLAY=0 on llama 70B and resnet training

also added a 15 minutes total timeout, this cannot grow indefinitely

* add a few more

* a few more just for NV
2024-10-24 12:09:54 -04:00
qazal
4cf7cca91a delete fuzz_schedule [pr] (#7144) 2024-10-18 15:09:39 +03:00
George Hotz
9f4ca88218 hotfix: relax target pct for beautiful_mnist 2024-10-17 12:36:07 +08:00
chenyu
d12c87dc8e use ubuntu-22.04 in CI (#7068)
ubuntu-latest points to 24.04 now, maybe it's this?
2024-10-15 09:44:59 -04:00
chenyu
fbaab30fe3 add timing to fuzz_linearizer (#7056)
and applied smaller FUZZ_MAX_SIZE. this is getting quite slow in CI
2024-10-14 11:57:41 -04:00
nimlgen
feb0bcb58b qcom bench bind to perf cluster (#6996) 2024-10-11 12:21:52 +03:00
George Hotz
f50d0e0ee0 cloud device [pr] (#6964)
* first try at cloud device [pr]

* real separation

* we're free

* clang works

* unhappy with timeout

* better timeouts and free

* unrelated

* use http verbs + add test

* lines + better test

* fix DELETE

* shorter cloud

* split key

* fix sending renderer

* PTXRenderer serialization

* add sessions

* http.client

* minor timeout bump

* fix keep-alive

* inc server timeout

* real fix timeout

* that one too
2024-10-11 12:24:06 +08:00
nimlgen
f9d454aed5 correct kernargs alignment (#6984) 2024-10-11 00:06:28 +03:00
qazal
3724a66716 move test_viz to test/, prereq for tinygrad/viz [pr] (#6972) 2024-10-10 11:40:46 +03:00
qazal
b82023c97e process replay cleanup to generic _pmap [pr] (#6929)
* process replay cleanup to generic _pmap [pr]

* delete `COMPARE_SCHEDULE`
2024-10-07 13:57:05 +08:00
George Hotz
0d6216aba1 bump the download cache (#6896) 2024-10-05 10:23:18 +08:00
George Hotz
0f28e93224 add pickle support for pattern matchers [run_process_replay] (#6816)
* add pickle support for pattern matchers [run_process_replay]

* cleaner and all

* no closures

* fix tests

* revert that

* final

* cleaner

* python 3.8 fix

* add round trip back

* this

* waste lines on this. that's the final line count

* max print better

* more targetted fix

* regrettably add 3.8 support
2024-09-30 21:54:46 +08:00
wozeparrot
2b899164c6 no numpy (#6751) 2024-09-26 16:40:18 +08:00
wozeparrot
c100f3d406 default threefry (#6116) 2024-09-25 17:45:13 +08:00
George Hotz
dd575da7ee real minimum cstyle change (#6709)
* real minimum cstyle change

* make it match

* bring back DEFINE_GLOBAL store marking writable

* bump line count to 9800

* closer

* precompute don't render

* cast/bitcast too

* smem_align

* vectorize

* more pr match

* remove that test

* less PR diff
2024-09-25 12:40:46 +08:00
George Hotz
f45d178a55 hotfix: support JIT_BATCH_SIZE=0, make that the default 2024-09-25 10:36:04 +08:00
George Hotz
52e7f1c108 add new model CI 2024-09-25 10:23:06 +08:00
George Hotz
b0ffe2452b bump line count to 9800 2024-09-25 09:15:30 +08:00
George Hotz
de259e3f09 hotfix: add compile3 to comma CI 2024-09-23 18:25:49 +08:00
qazal
e2d6e10ddf hotfix: reset benchmarks cache for process replay (#6671) 2024-09-23 15:13:02 +08:00