Commit Graph

1003 Commits

Author SHA1 Message Date
nimlgen
9d3c40601f am: fast memory manager (#8654)
* start

* progress

* fixes

* smth

* mini fixes

* fix2

* ugh, need this for now

* faster

* cleanups

* tiny linters

* make mypy happier

* test & free pts

* ops

* linter

* cleanup vm

* fix

* remove map_from

* tiny fixes

* add test to ci
2025-01-20 16:58:22 +03:00
ignaciosica
d2234e308a tf32 tc for nv and ptx (#8635)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-17 17:43:57 -08:00
nimlgen
f671da6755 ci: add AM start time to benchmark (#8637)
* ci: add AM start time to benchmark

* am: unlock it

* add AMD

* revert this
2025-01-16 14:47:36 +03:00
chenyu
4ee3243c93 JITBEAM=2 for LLaMA-3 8B on 4 GPUs [pr] (#8623)
is it fast?
2025-01-14 19:52:38 -05:00
George Hotz
bfbe81df71 remove cast before view (#8613)
* remove cast before view

* greener

* indexing

* that passes too

* openpilot too

* ack

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-01-14 15:04:58 -05:00
chenyu
393eec3201 raise RuntimeError for uneven shard [pr] (#8593)
no 7B llama on 6 GPUs

skip 70B
2025-01-14 14:51:48 -05:00
ignaciosica
d5a646d492 CUDA Turing TC (#8597)
* init turing tc

* reorder tc

* hotfix: remove some spaces

* revert var name to x

* consistent order of factors

* revert order of terms to match old stuff

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-14 10:35:14 -08:00
nimlgen
1ff6862a3d ci: sleep a bit to let the driver unload the prev pid (#8605) 2025-01-14 15:55:23 +03:00
nimlgen
74b83c4c41 am in ci (#8532)
* try am in ci

* no sudo

* temp

* run more am test

* run half on am

* insert amdgpu

* other machine as well
2025-01-13 19:55:17 +03:00
qazal
2f71a00236 remove PYTHONPATH=. from mypy ci [pr] (#8578) 2025-01-12 09:52:03 -08:00
qazal
98c9e23560 remove global PYTHONPATH setting in CI (test.yml) [pr] (#8568)
* remove global PYTHONPATH setting in CI [pr]

* only run mypy in tinygrad/

* still needed for benchmarks
2025-01-11 12:47:50 -05:00
qazal
60503c8621 use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564) 2025-01-11 06:03:48 -05:00
nimlgen
aa3d612df2 add script to install amd mockgpu on macOS (#8536)
* upload artifact every time

* hm

* sh script

* hm

* hm2

* hm2

* hm2

* no sudo

* def paths

* small comments

* text

* try auth for bigger limits
2025-01-09 01:29:25 +03:00
patrini32
21c7d7c71a MOCKGPU amd test on OSX (#8505)
* add tests

* Refactor

* cache only amd/comgr/build (saves a lot of space)

* fix

* silence warning and add check for cache hit before installing cmake

* run only pytest

* use actions/cache

* lower timeout-minutes and add Device.DEFAULT step

* add nvidia to Device.DEFAULT check

* typo

* fix

* Check only for amd and run only 2 test
2025-01-08 14:27:56 +03:00
chenyu
85a4397f27 fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522)
* fix create_schedule_with_vars usage in allreduce benchmark [pr]

because i didn't know how to use it...

* increase time limit because tiny17 is slow
2025-01-07 01:30:01 -05:00
chenyu
0061dc7447 fix benchmark allreduce and add to ci [pr] (#8521) 2025-01-07 00:37:59 -05:00
nimlgen
9bc317d5d2 mockcuda (#8503)
* init mockcuda

* run gpu ocelot

* fix

* sfixes

* disable broken tests

* linter

* these fails as well

* pylint

* myypy

* this fails on real platforms as well

* mypy please
2025-01-05 01:23:57 +03:00
uuuvn
5ffc50d58c Clang JIT (#8481)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-03 11:12:55 -05:00
George Hotz
803a47494e Revert "Clang JIT (#8312)" (#8452)
This reverts commit b6266c8e41.
2024-12-30 17:49:35 -05:00
uuuvn
b6266c8e41 Clang JIT (#8312)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-30 17:37:53 -05:00
George Hotz
0addbad36d Happy New Year! Let's get AM merged 2024-12-30 13:15:10 -05:00
qazal
9defbc7d54 add symbolic_simple to the scheduler [pr] (#8419) 2024-12-26 20:05:08 +08:00
George Hotz
9f62c80f68 hotfix: this is a loan 2024-12-20 14:47:04 -08:00
qazal
d78e75f710 hotfix: use ubuntu-22.04 ci from 8249 (#8251) 2024-12-15 02:23:00 +02:00
George Hotz
8a50868264 touchup function.py [pr] (#8220)
* touchup function.py [pr]

* remove ALLOWED_READ_IMAGE

* eh, keep it, just change it
2024-12-13 13:07:00 -08:00
ignaciosica
0a00187dce add real AMX tests to benchmark (#8216)
* add real amx to benchmark

* add debug=2 to check tc are triggered
2024-12-13 14:03:41 -05:00
George Hotz
d9a0880d33 delete fuzz uops (not tested) [pr] (#8181) 2024-12-12 01:41:27 -08:00
chenyu
26e049ab40 add ALLOWED_READ_IMAGE=2131 to openpilot (#8166)
added as exact number check now as it's not clear if more/less than allowed is any better
2024-12-11 12:14:17 -08:00
Ahmed Harmouche
a73e3677d0 Test linearizer on webgpu (#8159)
* Test linearizer on wgpu

* Skip tests due to exceeded dims
2024-12-11 17:03:26 +01:00
chenyu
d462f8ace0 use HALF in cifar wino benchmarks (#8153)
more representative as it hits tensor cores on tinyboxes
2024-12-10 20:21:00 -05:00
Ahmed Harmouche
a8cfdc70ed Run more webgpu tests (#8142) 2024-12-10 23:20:04 +01:00
Ahmed Harmouche
ed7318a3f5 Fix puppeteer install (#8148)
Clean npm cache before puppeteer install
2024-12-10 23:06:33 +01:00
Ahmed Harmouche
71dd222f66 Fix setitem on wgpu (#8144) 2024-12-10 19:34:25 +01:00
George Hotz
f83d715f41 move checks into compile3, delete compile2 [pr] (#8127)
* move checks into compile3 [pr]

* test_vs_onnx

* test v torch works

* float16 won't compile on compile3

* actually delete compile2
2024-12-09 14:21:42 -08:00
George Hotz
87c360c4b5 hotfix: add --size 8B to llama3 2024-12-09 07:53:20 -08:00
chenyu
e9692de42b don't FUZZ_ALL_ACTIONS in fuzz_linearizer.py (#8096)
mostly for speed, this is just making sure the script runs
2024-12-06 17:22:17 -05:00
Ahmed Harmouche
ce72fe1411 u32 to f16 in tinygrad (#8074)
* f16 decompression in tinygrad

* Typing and cleanup
2024-12-06 12:00:13 +01:00
Ahmed Harmouche
ff9a89f714 Proper dtypes for input/output of exported WebGPU model (#8053)
* Respect input/output dtypes in exported WebGPU model

* Add some comments about skipped dtypes
2024-12-05 10:38:05 +01:00
George Hotz
83aecbdc70 do gpuocelot copy manually [pr] (#8050) 2024-12-05 11:51:20 +08:00
George Hotz
4a208bfb28 bump download cache version 2024-12-05 11:42:34 +08:00
Ahmed Harmouche
13eedd373b Run WebGPU tests on ubuntu (#8033) 2024-12-04 12:42:04 +01:00
Ahmed Harmouche
db330a3110 Remove WebGL (#8012) 2024-12-03 16:02:53 +01:00
George Hotz
dddfb494d7 don't mutate the uop/lazybuffer, just the Buffer [pr] (#8000)
* don't mutate the uop/lazybuffer, just the Buffer [pr]

* fix red test

* try different fix

* that

* that's the right fix

* test for fixed behavior

* bump to 3.12
2024-12-03 19:03:51 +08:00
chenyu
17d5719a38 add process replay to webgpu tests (#7998) 2024-12-02 20:27:29 -05:00
chenyu
3c8c98253a BEAM_DEBUG=1 in speed_v_theoretical (#7942)
* DEBUG=3 in speed_v_theoretical

* BEAM_DEBUG=1
2024-11-28 08:30:55 -05:00
chenyu
a6171cbe71 add stable diffusion v2 to mac benchmark (#7917)
this caught #7902
2024-11-26 22:09:43 -05:00
qazal
345457f518 webgpu cache packages (#7911)
* webgpu -n=auto

* fix webgpu ci cache
2024-11-27 00:17:36 +08:00
qazal
6102e3159c webgpu -n=auto (#7910) 2024-11-26 21:13:12 +08:00
George Hotz
4e5bf9dc7a test assignment in jit (#7906)
* test assignment in jit

* don't waste lines

* skip broken test in webgpu
2024-11-26 17:37:00 +08:00
Ahmed Harmouche
10618aba98 Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00