Commit Graph

1207 Commits

Author SHA1 Message Date
leopf
65b6696f3b refactor safe_load (#8035)
* refactor safe_load

* cleanup
2024-12-06 12:08:21 +08:00
Ahmed Harmouche
c6f5bb03fa YoloV8 WebGPU fixes (#8057)
* Bump up input size to 416, show if webgpu is not supported

* Minor fix in export_model
2024-12-05 16:23:45 +01:00
Francis Lata
c3187087f7 QwQ-32B-Preview support (#7962)
* load weights with some debugging

* start running a prompt

* cleanup

* optionally permute layers and cleanup

* add validation for simple prompt

* small cleanup

* minor cleanup with formatting download links

* add a longer prompt

* add timing option

* some typings

* remove unused arg

* reset GlobalCounters

* minor cleanups
2024-12-04 21:46:37 -05:00
Ahmed Harmouche
c9e7701417 Fast YoloV8 on WebGPU (#8036)
* Fast yolov8 with downscaled input

* Faster + FPS meter

* Add loader while model is downloading/compiling

* Title touchup
2024-12-04 15:23:09 +01:00
Ahmed Harmouche
db330a3110 Remove WebGL (#8012) 2024-12-03 16:02:53 +01:00
Ahmed Harmouche
8818046940 YoloV8 on WebGPU (#8007)
Port YoloV8 to WebGPU
2024-12-03 15:10:41 +01:00
Ahmed Harmouche
8909dbd82c Remove wgpu specific checks from stable diffusion example (#7991) 2024-12-02 11:31:14 +01:00
chenyu
3e2430f822 use tqdm tqdm in mlperf training (#7929)
issue in benchmark dashboard logging, revert back to tqdm tqdm for now
2024-11-27 21:57:05 -05:00
Ahmed Harmouche
10618aba98 Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00
chenyu
631dc98b52 validate llama quantize output (#7901)
mac benchmark already runs quantize, this adds output validation
2024-11-25 16:46:23 -05:00
George Hotz
5d28a202b5 make tinychat local (#7871) 2024-11-24 14:45:48 +08:00
chenyu
22d5def113 download llama3 70B (#7868)
use "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF".
```
PYTHONPATH=. JITBEAM=2 python3 examples/llama3.py --download_model --size 70B --quantize int8 --benchmark
```

on M4 Max, 40 sec to load the model and
```
enqueue in 165.15 ms
total 328.54 ms, 3.04 tok/s, 247.46 GB/s, param 221.20 GB/s

enqueue in   5.31 ms
total 168.48 ms, 5.94 tok/s, 482.54 GB/s, param 431.34 GB/s

enqueue in   5.32 ms
total 168.77 ms, 5.93 tok/s, 481.71 GB/s, param 430.60 GB/s

enqueue in   5.69 ms
total 169.51 ms, 5.90 tok/s, 479.61 GB/s, param 428.72 GB/s

enqueue in   5.41 ms
total 168.60 ms, 5.93 tok/s, 482.20 GB/s, param 431.04 GB/s

enqueue in   5.18 ms
total 168.98 ms, 5.92 tok/s, 481.12 GB/s, param 430.08 GB/s

enqueue in   5.43 ms
total 168.82 ms, 5.92 tok/s, 481.59 GB/s, param 430.49 GB/s

enqueue in   5.27 ms
total 168.94 ms, 5.92 tok/s, 481.23 GB/s, param 430.17 GB/s
```
2024-11-23 12:18:31 -05:00
George Hotz
144e9f00df viz is local, new test, and new quantize [pr] (#7859)
* viz is local, new test, and new quantize [pr]

* fix mime types

* remove font

* after index
2024-11-23 14:27:10 +08:00
qazal
9828277c03 view doesn't have buffer, fix the tests [pr] (#7841)
* view doesn't have buffer, fix the tests [pr]

* need assigns
2024-11-22 20:41:55 +08:00
chenyu
69e382216d fix wino conv output dtype for half inputs (#7829) 2024-11-21 12:13:54 -05:00
George Hotz
fbb4099b3c add test for compile3 [pr] (#7783)
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-19 19:26:51 +08:00
chenyu
73ea913050 really not using numpy in gpt2 example (#7779) 2024-11-18 23:21:16 -05:00
chenyu
e6debda5c4 remove numpy from gpt2 and llama examples (#7778) 2024-11-18 22:48:17 -05:00
ignaciosica
597a239e28 Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725)
* remove unaryops

* remove ternaryops

* remove metaops

* hotfix

* remove binaryops

* hotfix: test_pattern_matcher

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-16 20:56:56 +08:00
geohotstan
f8056a74d6 combine pad2d with pad (#7677)
* I have pad2d, I have pad, uuh~, pad2dpad~

* fix some small things

* strategically placed cast hack

* fix more

* fix more more

* tests

* periods
2024-11-14 17:56:02 +08:00
chenyu
4c5f7ddf1f flux set model path in args (#7660)
in addition to default downloading through fetch, add an arg to pass model path directly
2024-11-12 22:11:40 -05:00
Harald Schäfer
e7cbc29f48 openpilot benchmark: add cast from numpy to benchmark (#7593)
* openpilot benchmark: add cast from numpy to benchmark

* whitespace

* comment
2024-11-08 19:31:00 +08:00
Anthony DeMattos
953ef1b57e tinychat ui +/- 20 lines (#7471)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-06 14:23:55 +08:00
George Hotz
c8bf09b7d4 s/UOps/Ops (#7500)
* s/UOps/Ops [pr]

* fix
2024-11-03 11:26:10 +08:00
George Hotz
72a9ac27e9 support image dtype in cloud [pr] (#7482)
* support image dtype in cloud [pr]

* remove outdated osx hack

* unused imports
2024-11-02 23:54:27 +08:00
Tobias Fischer
7c9a1d69f9 sdxl gen fix (#7459) 2024-11-01 13:57:01 -04:00
gonutz
e7cbc6dc23 Fix ValueError in Yolo 8 example (#7387)
Calling

    python3 examples/yolov8.py ./test/models/efficientnet/Chicken.jpg

used to result in this error

    ValueError: Calling nonzero on 0d arrays is not allowed.

Using np.atleast_1d makes sure we avoid a zero-dimension array.

Co-authored-by: gonutz <gonutz@fake.mail>
2024-10-30 10:18:39 +08:00
George Hotz
3989bd2682 idiv + reciprocal [pr] (#7354)
* idiv + reciprocal

* remove upcast from div

* fix docs
2024-10-29 15:54:19 +08:00
chenyu
4a03e00aa1 fix llama3 download_model assert (#7320)
false positive if download_model and model are not provided
2024-10-27 11:20:24 -04:00
eliotgolding
e920f1d663 Llama 3.2 1B load from GGUF (#7295)
* gguf 1b-instruct

* not needed
2024-10-27 09:29:02 +08:00
George Hotz
dc3148c677 hotfix: minor speed increase + stable diffusion relax 2024-10-25 16:27:21 +08:00
leopf
87877d7a91 GGUF cleanup (#7192)
* cleanup

* remove vocab size hard code
2024-10-21 10:44:54 -04:00
leopf
b6d9b276bb GGUF support (#7046)
* basic loader, untested

* testing

* remove utils import in test

* q8_0

* q4_1

* end to end testing

* minor cleanup

* fix casting

* moved to state

* move tests

* move dequant to fn

* fix lint elif

* remove gguf from extra

* fix dict union

* q6_k simpler

* naming and spacing

* gpt2-gguf example

* cleanup

* move gguf example

* minor cleanup

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-10-21 16:15:34 +08:00
qazal
30989fb459 changes from the big graph branch [pr] (#7160)
* metaops srcs

* delete multioutput ctx var

* always has metadata

* shorter path for realized

* this still needs inputs

This reverts commit a59cbb2886.
2024-10-19 16:22:37 +03:00
Francis Lata
90eff347e2 tinytqdm write support (#6359)
* add write support

* add test

* update test case to compare write outputs

* assert final write output

* flush when using write

* update write logic

* Revert "update write logic"

This reverts commit 5e0e611b46.

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-10-16 14:51:41 -04:00
George Hotz
3169cb386d remove graph [pr] (#7085) 2024-10-16 11:40:07 +08:00
George Hotz
26df50cf43 move memory_planner to memory.py [pr] (#7079) 2024-10-16 10:04:35 +08:00
chenyu
ed1ed9e4ff bert use BS=72 (#7015)
memory 131 -> 138
green tflops 201 -> 209
red tflops 160 -> 169
2024-10-12 09:41:56 -04:00
George Hotz
a71bb09ec3 remove symbolic file [pr] (#7012) 2024-10-12 18:44:44 +08:00
George Hotz
5c9f76e274 hotfix: openpilot compile3 compare to i==1 2024-10-12 09:44:24 +08:00
chenyu
36056e0760 update mlperf systems and copy 4.1 to 5.0 (#7004) 2024-10-11 16:20:34 -04:00
chenyu
0e42662f2a log seed at the right place for bert (#7000) 2024-10-11 10:39:40 -04:00
nimlgen
5496a36536 update red mlperf bert readme (#6969) 2024-10-11 13:08:06 +03:00
Friedrich Carl Eichenroth
859d6d0407 Fix mypy examples/beautiful_*.py (#6978)
* fix mypy examples/beautiful_*.py

* backwards

* add test

* Revert "add test"

This reverts commit 4d88845ba3.

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-10-10 11:34:29 -04:00
Kinvert
960c495755 added beautiful fashion mnist and example (#6961)
* added beautiful fashion mnist and example

* fixing whitespace

* refactor Fashion MNIST to fewer lines

* fix newline to reduce diff

* Update beautiful_mnist.py

* Update beautiful_mnist.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-10-10 12:01:07 +08:00
chenyu
b5546912e2 10% more TRAIN_STEPS for bert (#6971)
got two very close run, adding more steps for buffer
2024-10-09 19:21:43 -04:00
chenyu
35cf48659b limit beam param for bert on green (#6966)
seems to mitigate the crash
2024-10-09 11:48:18 -04:00
chenyu
1ff2c98f8a fix logfile name for bert red (#6952) 2024-10-08 05:37:52 -04:00
chenyu
a78c96273a update bert epoch logging (#6940)
* update bert epoch logging

epoch for bert is simply number of examples seen (which is used for RCP check)

* update total steps too

* more changes
2024-10-08 00:34:06 -04:00
chenyu
102dfe5510 back to 2**10 for bert loss scaler (#6934)
getting 2 NaN for this, revert back to 2**10
2024-10-07 10:17:21 -04:00