Commit Graph

1088 Commits

Author SHA1 Message Date
chenyu
0061dc7447 fix benchmark allreduce and add to ci [pr] (#8521) 2025-01-07 00:37:59 -05:00
nimlgen
9bc317d5d2 mockcuda (#8503)
* init mockcuda

* run gpu ocelot

* fix

* sfixes

* disable broken tests

* linter

* these fails as well

* pylint

* myypy

* this fails on real platforms as well

* mypy please
2025-01-05 01:23:57 +03:00
uuuvn
5ffc50d58c Clang JIT (#8481)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-03 11:12:55 -05:00
George Hotz
803a47494e Revert "Clang JIT (#8312)" (#8452)
This reverts commit b6266c8e41.
2024-12-30 17:49:35 -05:00
uuuvn
b6266c8e41 Clang JIT (#8312)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-30 17:37:53 -05:00
George Hotz
0addbad36d Happy New Year! Let's get AM merged 2024-12-30 13:15:10 -05:00
qazal
9defbc7d54 add symbolic_simple to the scheduler [pr] (#8419) 2024-12-26 20:05:08 +08:00
George Hotz
9f62c80f68 hotfix: this is a loan 2024-12-20 14:47:04 -08:00
qazal
d78e75f710 hotfix: use ubuntu-22.04 ci from 8249 (#8251) 2024-12-15 02:23:00 +02:00
George Hotz
8a50868264 touchup function.py [pr] (#8220)
* touchup function.py [pr]

* remove ALLOWED_READ_IMAGE

* eh, keep it, just change it
2024-12-13 13:07:00 -08:00
ignaciosica
0a00187dce add real AMX tests to benchmark (#8216)
* add real amx to benchmark

* add debug=2 to check tc are triggered
2024-12-13 14:03:41 -05:00
George Hotz
d9a0880d33 delete fuzz uops (not tested) [pr] (#8181) 2024-12-12 01:41:27 -08:00
chenyu
26e049ab40 add ALLOWED_READ_IMAGE=2131 to openpilot (#8166)
added as exact number check now as it's not clear if more/less than allowed is any better
2024-12-11 12:14:17 -08:00
Ahmed Harmouche
a73e3677d0 Test linearizer on webgpu (#8159)
* Test linearizer on wgpu

* Skip tests due to exceeded dims
2024-12-11 17:03:26 +01:00
chenyu
d462f8ace0 use HALF in cifar wino benchmarks (#8153)
more representative as it hits tensor cores on tinyboxes
2024-12-10 20:21:00 -05:00
Ahmed Harmouche
a8cfdc70ed Run more webgpu tests (#8142) 2024-12-10 23:20:04 +01:00
Ahmed Harmouche
ed7318a3f5 Fix puppeteer install (#8148)
Clean npm cache before puppeteer install
2024-12-10 23:06:33 +01:00
Ahmed Harmouche
71dd222f66 Fix setitem on wgpu (#8144) 2024-12-10 19:34:25 +01:00
George Hotz
f83d715f41 move checks into compile3, delete compile2 [pr] (#8127)
* move checks into compile3 [pr]

* test_vs_onnx

* test v torch works

* float16 won't compile on compile3

* actually delete compile2
2024-12-09 14:21:42 -08:00
George Hotz
87c360c4b5 hotfix: add --size 8B to llama3 2024-12-09 07:53:20 -08:00
chenyu
e9692de42b don't FUZZ_ALL_ACTIONS in fuzz_linearizer.py (#8096)
mostly for speed, this is just making sure the script runs
2024-12-06 17:22:17 -05:00
Ahmed Harmouche
ce72fe1411 u32 to f16 in tinygrad (#8074)
* f16 decompression in tinygrad

* Typing and cleanup
2024-12-06 12:00:13 +01:00
Ahmed Harmouche
ff9a89f714 Proper dtypes for input/output of exported WebGPU model (#8053)
* Respect input/output dtypes in exported WebGPU model

* Add some comments about skipped dtypes
2024-12-05 10:38:05 +01:00
George Hotz
83aecbdc70 do gpuocelot copy manually [pr] (#8050) 2024-12-05 11:51:20 +08:00
George Hotz
4a208bfb28 bump download cache version 2024-12-05 11:42:34 +08:00
Ahmed Harmouche
13eedd373b Run WebGPU tests on ubuntu (#8033) 2024-12-04 12:42:04 +01:00
Ahmed Harmouche
db330a3110 Remove WebGL (#8012) 2024-12-03 16:02:53 +01:00
George Hotz
dddfb494d7 don't mutate the uop/lazybuffer, just the Buffer [pr] (#8000)
* don't mutate the uop/lazybuffer, just the Buffer [pr]

* fix red test

* try different fix

* that

* that's the right fix

* test for fixed behavior

* bump to 3.12
2024-12-03 19:03:51 +08:00
chenyu
17d5719a38 add process replay to webgpu tests (#7998) 2024-12-02 20:27:29 -05:00
chenyu
3c8c98253a BEAM_DEBUG=1 in speed_v_theoretical (#7942)
* DEBUG=3 in speed_v_theoretical

* BEAM_DEBUG=1
2024-11-28 08:30:55 -05:00
chenyu
a6171cbe71 add stable diffusion v2 to mac benchmark (#7917)
this caught #7902
2024-11-26 22:09:43 -05:00
qazal
345457f518 webgpu cache packages (#7911)
* webgpu -n=auto

* fix webgpu ci cache
2024-11-27 00:17:36 +08:00
qazal
6102e3159c webgpu -n=auto (#7910) 2024-11-26 21:13:12 +08:00
George Hotz
4e5bf9dc7a test assignment in jit (#7906)
* test assignment in jit

* don't waste lines

* skip broken test in webgpu
2024-11-26 17:37:00 +08:00
Ahmed Harmouche
10618aba98 Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00
chenyu
ac57d82a13 test_tiny on real NV/CUDA/AMD/HIP (#7886)
simple tests that run on real CUDA and HIP
2024-11-24 16:34:54 -05:00
chenyu
5c5b1b994c less flaky benchmarks (#7855)
JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830
2024-11-22 16:39:39 -05:00
chenyu
d5c9fafff5 default run stable diffusion benchmark with fp16 (#7831)
and keep the non-fp16 one in mac
2024-11-21 15:58:17 -05:00
chenyu
46aa23539f generate and print mypy lineprecision report (#7809) 2024-11-20 16:53:17 -05:00
chenyu
c815d7b56e run bfloat16 tensor core in metal benchmark (#7808)
* run bfloat16 tensor core in metal benchmark

* separate task
2024-11-20 15:34:07 -05:00
chenyu
d5f76462c8 fix CI beautiful_mnist dir (#7790)
fixed `fatal: not a git repository (or any of the parent directories): .git` because $HOME is not $GITHUB_WORKSPACE
2024-11-19 09:59:02 -05:00
George Hotz
fbb4099b3c add test for compile3 [pr] (#7783)
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-19 19:26:51 +08:00
chenyu
9fb396f660 test_ops maxpool2d -> max_pool2d (#7696)
and avgpool2d -> avg_pool2d for better grepping the tests
2024-11-14 10:39:12 -05:00
chenyu
e6cfaaa496 metal benchmark JIT=2 -> JIT=1 (#7661) 2024-11-12 22:55:27 -05:00
chenyu
1884f021e3 add conv3x3 to speed_v_theoretical (#7658)
* add conv3x3 to speed_v_theoretical

* show test duration
2024-11-12 16:41:56 -05:00
chenyu
a88a15c7e8 setup perflevel in red CI (#7645)
runs v4.1 bert setup.
```
rocm-smi --setprofile compute
rocm-smi --setmclk 3
rocm-smi --setperflevel high
```
2024-11-11 18:44:55 -05:00
chenyu
773d5b60bf beam benchmark tests (#7638)
* beam benchmark tests

* lower AMD number somehow

* less flaky
2024-11-11 18:11:18 -05:00
chenyu
bfab03288d fix HALF=1 in test_speed_v_torch (#7642)
* fix HALF=1 in test_speed_v_torch

"operation cache defeats" adds 1 to all arg, which were centered around 0. adding 1 makes big matmul and matvec go inf.

fixed by subtract 1 after and bumpped tolerance for half input

* bigger tol for BIG=2, update CI too

* bigger tol
2024-11-11 14:29:37 -05:00
George Hotz
b4cb6b89f9 hotfix: CI mac uses python 3.11 2024-11-11 23:42:35 +08:00
George Hotz
9648372ee6 hotfix: mac uses python 3.12 2024-11-11 23:23:48 +08:00