Commit Graph

10633 Commits

Author SHA1 Message Date
Ahmed Harmouche
ff9a89f714 Proper dtypes for input/output of exported WebGPU model (#8053)
* Respect input/output dtypes in exported WebGPU model

* Add some comments about skipped dtypes
2024-12-05 10:38:05 +01:00
qazal
435a51e10c reduce folding simple tests [pr] (#8040)
* reduce folding simple tests [pr]

* test for view and realized src pattern

* realize / buffer behavior
2024-12-05 12:22:45 +08:00
George Hotz
20878be2af lower test_gemv_4096_16384 expectations 2024-12-05 12:08:26 +08:00
George Hotz
83aecbdc70 do gpuocelot copy manually [pr] (#8050) 2024-12-05 11:51:20 +08:00
George Hotz
4a208bfb28 bump download cache version 2024-12-05 11:42:34 +08:00
George Hotz
df18e7cc37 accept filename decorator [pr] (#8049)
* accept filename decorator [pr]

* add test for safe_load

* bring old tar tests back
2024-12-05 11:40:59 +08:00
Francis Lata
c3187087f7 QwQ-32B-Preview support (#7962)
* load weights with some debugging

* start running a prompt

* cleanup

* optionally permute layers and cleanup

* add validation for simple prompt

* small cleanup

* minor cleanup with formatting download links

* add a longer prompt

* add timing option

* some typings

* remove unused arg

* reset GlobalCounters

* minor cleanups
2024-12-04 21:46:37 -05:00
chenyu
b3220ca7b1 test cases of always True/False lt (#8048)
* test cases of always True/False lt

* one more
2024-12-04 20:38:40 -05:00
chenyu
8bb806888b hook_overflow -> safe_exp2 [pr] (#8047)
that's the only use case, so no need for indirection
2024-12-04 19:05:38 -05:00
chenyu
99abdc6d39 minor push_swizzle_down_through_elementwise cleanup [pr] (#8046)
walrus, and if x are the same, prod(x) must be the same
2024-12-04 17:22:37 -05:00
chenyu
5933ec8dc3 use argfix in smax/smin and remove if [pr] (#8045) 2024-12-04 17:06:13 -05:00
chenyu
4e518334b8 minor get_grouped_dims cleanup [pr] (#8044) 2024-12-04 16:22:51 -05:00
geohotstan
5ce8090d42 simple onnx_ops cleanups (#8003)
* simple clean ups first

* more work

* kinda have adam

* ooo momentum worked nicely

* almost there

* wow.. is the onnx test wrong

* nicer optim stuff

* just skip that test

* small comment changes

* use naming convention from other parts of codebase

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-04 15:33:03 -05:00
Sieds Lykles
70db1bab5c Fold nested div with const (#8010)
* Rebase nested div and with const

* Update the ordering

* return None on vectors

Fixes cpu test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-04 14:59:09 -05:00
chenyu
0693158d28 lower v_theoretical gemv on red (#8042)
tiny7 is still slower https://github.com/tinygrad/tinygrad/actions/runs/12166149038/job/33931736130#step:8:209
2024-12-04 13:59:40 -05:00
chenyu
5c2b1089b2 vectorized input in div_and_mod_folding returns None [pr] (#8041) 2024-12-04 13:36:41 -05:00
qazal
ff6def9ffb simple contiguous_while_contiguous prereqs [pr] (#8038)
* simple contiguous_while_contiguous prereqs [pr]

* early realize

* fine if it's folding a non-contig buffer
2024-12-04 23:00:28 +08:00
Ahmed Harmouche
c9e7701417 Fast YoloV8 on WebGPU (#8036)
* Fast yolov8 with downscaled input

* Faster + FPS meter

* Add loader while model is downloading/compiling

* Title touchup
2024-12-04 15:23:09 +01:00
qazal
b116e1511d make device on uop optional [pr] (#8034) 2024-12-04 20:18:00 +08:00
Ahmed Harmouche
13eedd373b Run WebGPU tests on ubuntu (#8033) 2024-12-04 12:42:04 +01:00
leopf
fb89971e73 use BufferedReader (#8032) 2024-12-04 19:08:54 +08:00
George Hotz
08657cb7b0 hotfix: bump expectations in speed_v_theoretical 2024-12-04 19:00:33 +08:00
George Hotz
ea65c79ba2 hotfix: don't spam BEAM debug in speed_v_theoretical 2024-12-04 18:47:16 +08:00
George Hotz
09b00b1b04 hotfix: use kernel timings instead of python timings in speed_v_theoretical 2024-12-04 18:36:17 +08:00
George Hotz
8f65c1fafb simpler block reorder function [pr] (#8031)
* simpler block reorder function [pr]

* simpler

* block_reorder in substitute, so wasteful otherwise

* extend and count

* leave push logic for same order

* sort new ctx

* less loop

* Revert "less loop"

This reverts commit 30249d097a.
2024-12-04 17:57:35 +08:00
leopf
f0401e14e8 tar_extract with Tensors (#7853)
* initial

* USTAR, PAX and GNU support + testing

* from_bytes byteorder

* use TarInfo.frombuf

* tensor only usage

* remove contextlib.suppress

* shorter ow,pax

* more tests

* testing length + move tests

* cleanup

* new approach: RawTensorIO

* fix fetch

* enable read test

* cleanup and ignore fix

* fix for python < 3.12

* make it RawIO

* functions

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-04 17:03:19 +08:00
George Hotz
1e06aefde7 bunch up ops for lines [pr] (#8030) 2024-12-04 17:03:01 +08:00
uuuvn
e9c5b23ba1 Use MTLCompiler directly (v2) (#7920)
* Use MTLCompiler directly (v2)

* to_block_literal and REQUEST_TYPE_COMPILE

* Rewrite command encoding

* Revert to_block_literal

* Maybe that's more readable to some people?

* Typo and comment about stdlib caching

* Update ops_metal.py

* Update ops_metal.py

* Update ops_metal.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-04 16:36:48 +08:00
George Hotz
bb98bae751 local reordering in block (#8029)
* local reordering in block

* load (and parents) is highest priority

* minor loads in order

* comments

* explicit depth

* simpler

* matters less, but store early too
2024-12-04 15:11:29 +08:00
George Hotz
4cb630ac1c hotfix: early INDEX 2024-12-04 14:47:47 +08:00
George Hotz
fdd1e56827 clean up rewrite logic + merge siblings (#8026)
* clean up rewrite logic [pr]

* simpler

* merge sibling blocks

* no PR
2024-12-04 13:26:16 +08:00
chenyu
004b2ecff5 remove lt/gt/le/ge from SimpleMathTrait [pr] (#8027)
just use the dunder methods
2024-12-04 00:24:33 -05:00
chenyu
39e0fc05f5 update function to not use gt/lt [pr] (#8025)
pr does not test this, but it's the same
2024-12-03 22:39:06 -05:00
chenyu
cfd4d19250 replace .lt in rewrite rules with < [pr] (#8024) 2024-12-03 21:34:47 -05:00
chenyu
0c060fa040 update uop and tests to not use lt/gt/le/ge [pr] (#8023)
just use dunder methods, eventually remove those from ops
2024-12-03 21:02:52 -05:00
chenyu
03bf9c2985 unused mul add lt rule [pr] (#8022) 2024-12-03 19:38:34 -05:00
nimlgen
7fda464b08 hcq c-like args state (#8020)
* hcq c-like args state

* ugh

* Dfix

* rename

* i
2024-12-03 23:53:35 +03:00
qazal
099364ed32 lazy srcs shape mistmatch assert + fix ASSIGN [pr] (#8014)
* lazy srcs shape mistmatch assert [pr]

* duplicate assert

* base it later

* keep the assert
2024-12-03 15:40:37 -05:00
ignaciosica
f14dd1488e reduce on wmma (#8016) 2024-12-03 12:46:28 -05:00
chenyu
dacb1ff38a minor nn cleanups (#8018)
use more .numel and .ndim
2024-12-03 12:34:52 -05:00
chenyu
35c30f76f2 minor tweak in ptx asm_for_op [pr] (#8017)
always compare with dtypes instead of name string
2024-12-03 12:34:22 -05:00
chenyu
a5af4e5596 clean up wgsl_matcher [pr] (#8015)
use more UPat syntatic sugar and remove unneeded rules
2024-12-03 11:55:03 -05:00
Ahmed Harmouche
db330a3110 Remove WebGL (#8012) 2024-12-03 16:02:53 +01:00
chenyu
ef3752625b add test case of realize_size with 0 in shape (#8011) 2024-12-03 09:19:50 -05:00
Ahmed Harmouche
8818046940 YoloV8 on WebGPU (#8007)
Port YoloV8 to WebGPU
2024-12-03 15:10:41 +01:00
George Hotz
09eac42fd6 cache indexed uops in st [pr] (#8008)
* cache indexed uops in st [pr]

* remove arg from range
2024-12-03 21:27:07 +08:00
Sieds Lykles
e44183647f Improved div folding (#7996)
* First version of div_mod folding together

* Working version with old div folding behaviour

* Test is fixed

* Fix linting

* Happy mypy

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-03 08:11:25 -05:00
George Hotz
32675a8a77 sacrifice ClangGraph on the altar of lines [pr] (#8009) 2024-12-03 21:11:15 +08:00
qazal
5441127417 assert const folding return shape matches [pr] (#8006) 2024-12-03 19:31:06 +08:00
George Hotz
dddfb494d7 don't mutate the uop/lazybuffer, just the Buffer [pr] (#8000)
* don't mutate the uop/lazybuffer, just the Buffer [pr]

* fix red test

* try different fix

* that

* that's the right fix

* test for fixed behavior

* bump to 3.12
2024-12-03 19:03:51 +08:00