Ahmed Harmouche
ff9a89f714
Proper dtypes for input/output of exported WebGPU model ( #8053 )
...
* Respect input/output dtypes in exported WebGPU model
* Add some comments about skipped dtypes
2024-12-05 10:38:05 +01:00
qazal
435a51e10c
reduce folding simple tests [pr] ( #8040 )
...
* reduce folding simple tests [pr]
* test for view and realized src pattern
* realize / buffer behavior
2024-12-05 12:22:45 +08:00
George Hotz
20878be2af
lower test_gemv_4096_16384 expectations
2024-12-05 12:08:26 +08:00
George Hotz
83aecbdc70
do gpuocelot copy manually [pr] ( #8050 )
2024-12-05 11:51:20 +08:00
George Hotz
4a208bfb28
bump download cache version
2024-12-05 11:42:34 +08:00
George Hotz
df18e7cc37
accept filename decorator [pr] ( #8049 )
...
* accept filename decorator [pr]
* add test for safe_load
* bring old tar tests back
2024-12-05 11:40:59 +08:00
Francis Lata
c3187087f7
QwQ-32B-Preview support ( #7962 )
...
* load weights with some debugging
* start running a prompt
* cleanup
* optionally permute layers and cleanup
* add validation for simple prompt
* small cleanup
* minor cleanup with formatting download links
* add a longer prompt
* add timing option
* some typings
* remove unused arg
* reset GlobalCounters
* minor cleanups
2024-12-04 21:46:37 -05:00
chenyu
b3220ca7b1
test cases of always True/False lt ( #8048 )
...
* test cases of always True/False lt
* one more
2024-12-04 20:38:40 -05:00
chenyu
8bb806888b
hook_overflow -> safe_exp2 [pr] ( #8047 )
...
that's the only use case, so no need for indirection
2024-12-04 19:05:38 -05:00
chenyu
99abdc6d39
minor push_swizzle_down_through_elementwise cleanup [pr] ( #8046 )
...
walrus, and if x are the same, prod(x) must be the same
2024-12-04 17:22:37 -05:00
chenyu
5933ec8dc3
use argfix in smax/smin and remove if [pr] ( #8045 )
2024-12-04 17:06:13 -05:00
chenyu
4e518334b8
minor get_grouped_dims cleanup [pr] ( #8044 )
2024-12-04 16:22:51 -05:00
geohotstan
5ce8090d42
simple onnx_ops cleanups ( #8003 )
...
* simple clean ups first
* more work
* kinda have adam
* ooo momentum worked nicely
* almost there
* wow.. is the onnx test wrong
* nicer optim stuff
* just skip that test
* small comment changes
* use naming convention from other parts of codebase
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-04 15:33:03 -05:00
Sieds Lykles
70db1bab5c
Fold nested div with const ( #8010 )
...
* Rebase nested div and with const
* Update the ordering
* return None on vectors
Fixes cpu test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-04 14:59:09 -05:00
chenyu
0693158d28
lower v_theoretical gemv on red ( #8042 )
...
tiny7 is still slower https://github.com/tinygrad/tinygrad/actions/runs/12166149038/job/33931736130#step:8:209
2024-12-04 13:59:40 -05:00
chenyu
5c2b1089b2
vectorized input in div_and_mod_folding returns None [pr] ( #8041 )
2024-12-04 13:36:41 -05:00
qazal
ff6def9ffb
simple contiguous_while_contiguous prereqs [pr] ( #8038 )
...
* simple contiguous_while_contiguous prereqs [pr]
* early realize
* fine if it's folding a non-contig buffer
2024-12-04 23:00:28 +08:00
Ahmed Harmouche
c9e7701417
Fast YoloV8 on WebGPU ( #8036 )
...
* Fast yolov8 with downscaled input
* Faster + FPS meter
* Add loader while model is downloading/compiling
* Title touchup
2024-12-04 15:23:09 +01:00
qazal
b116e1511d
make device on uop optional [pr] ( #8034 )
2024-12-04 20:18:00 +08:00
Ahmed Harmouche
13eedd373b
Run WebGPU tests on ubuntu ( #8033 )
2024-12-04 12:42:04 +01:00
leopf
fb89971e73
use BufferedReader ( #8032 )
2024-12-04 19:08:54 +08:00
George Hotz
08657cb7b0
hotfix: bump expectations in speed_v_theoretical
2024-12-04 19:00:33 +08:00
George Hotz
ea65c79ba2
hotfix: don't spam BEAM debug in speed_v_theoretical
2024-12-04 18:47:16 +08:00
George Hotz
09b00b1b04
hotfix: use kernel timings instead of python timings in speed_v_theoretical
2024-12-04 18:36:17 +08:00
George Hotz
8f65c1fafb
simpler block reorder function [pr] ( #8031 )
...
* simpler block reorder function [pr]
* simpler
* block_reorder in substitute, so wasteful otherwise
* extend and count
* leave push logic for same order
* sort new ctx
* less loop
* Revert "less loop"
This reverts commit 30249d097a .
2024-12-04 17:57:35 +08:00
leopf
f0401e14e8
tar_extract with Tensors ( #7853 )
...
* initial
* USTAR, PAX and GNU support + testing
* from_bytes byteorder
* use TarInfo.frombuf
* tensor only usage
* remove contextlib.suppress
* shorter ow,pax
* more tests
* testing length + move tests
* cleanup
* new approach: RawTensorIO
* fix fetch
* enable read test
* cleanup and ignore fix
* fix for python < 3.12
* make it RawIO
* functions
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-04 17:03:19 +08:00
George Hotz
1e06aefde7
bunch up ops for lines [pr] ( #8030 )
2024-12-04 17:03:01 +08:00
uuuvn
e9c5b23ba1
Use MTLCompiler directly (v2) ( #7920 )
...
* Use MTLCompiler directly (v2)
* to_block_literal and REQUEST_TYPE_COMPILE
* Rewrite command encoding
* Revert to_block_literal
* Maybe that's more readable to some people?
* Typo and comment about stdlib caching
* Update ops_metal.py
* Update ops_metal.py
* Update ops_metal.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-12-04 16:36:48 +08:00
George Hotz
bb98bae751
local reordering in block ( #8029 )
...
* local reordering in block
* load (and parents) is highest priority
* minor loads in order
* comments
* explicit depth
* simpler
* matters less, but store early too
2024-12-04 15:11:29 +08:00
George Hotz
4cb630ac1c
hotfix: early INDEX
2024-12-04 14:47:47 +08:00
George Hotz
fdd1e56827
clean up rewrite logic + merge siblings ( #8026 )
...
* clean up rewrite logic [pr]
* simpler
* merge sibling blocks
* no PR
2024-12-04 13:26:16 +08:00
chenyu
004b2ecff5
remove lt/gt/le/ge from SimpleMathTrait [pr] ( #8027 )
...
just use the dunder methods
2024-12-04 00:24:33 -05:00
chenyu
39e0fc05f5
update function to not use gt/lt [pr] ( #8025 )
...
pr does not test this, but it's the same
2024-12-03 22:39:06 -05:00
chenyu
cfd4d19250
replace .lt in rewrite rules with < [pr] ( #8024 )
2024-12-03 21:34:47 -05:00
chenyu
0c060fa040
update uop and tests to not use lt/gt/le/ge [pr] ( #8023 )
...
just use dunder methods, eventually remove those from ops
2024-12-03 21:02:52 -05:00
chenyu
03bf9c2985
unused mul add lt rule [pr] ( #8022 )
2024-12-03 19:38:34 -05:00
nimlgen
7fda464b08
hcq c-like args state ( #8020 )
...
* hcq c-like args state
* ugh
* Dfix
* rename
* i
2024-12-03 23:53:35 +03:00
qazal
099364ed32
lazy srcs shape mistmatch assert + fix ASSIGN [pr] ( #8014 )
...
* lazy srcs shape mistmatch assert [pr]
* duplicate assert
* base it later
* keep the assert
2024-12-03 15:40:37 -05:00
ignaciosica
f14dd1488e
reduce on wmma ( #8016 )
2024-12-03 12:46:28 -05:00
chenyu
dacb1ff38a
minor nn cleanups ( #8018 )
...
use more .numel and .ndim
2024-12-03 12:34:52 -05:00
chenyu
35c30f76f2
minor tweak in ptx asm_for_op [pr] ( #8017 )
...
always compare with dtypes instead of name string
2024-12-03 12:34:22 -05:00
chenyu
a5af4e5596
clean up wgsl_matcher [pr] ( #8015 )
...
use more UPat syntatic sugar and remove unneeded rules
2024-12-03 11:55:03 -05:00
Ahmed Harmouche
db330a3110
Remove WebGL ( #8012 )
2024-12-03 16:02:53 +01:00
chenyu
ef3752625b
add test case of realize_size with 0 in shape ( #8011 )
2024-12-03 09:19:50 -05:00
Ahmed Harmouche
8818046940
YoloV8 on WebGPU ( #8007 )
...
Port YoloV8 to WebGPU
2024-12-03 15:10:41 +01:00
George Hotz
09eac42fd6
cache indexed uops in st [pr] ( #8008 )
...
* cache indexed uops in st [pr]
* remove arg from range
2024-12-03 21:27:07 +08:00
Sieds Lykles
e44183647f
Improved div folding ( #7996 )
...
* First version of div_mod folding together
* Working version with old div folding behaviour
* Test is fixed
* Fix linting
* Happy mypy
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-03 08:11:25 -05:00
George Hotz
32675a8a77
sacrifice ClangGraph on the altar of lines [pr] ( #8009 )
2024-12-03 21:11:15 +08:00
qazal
5441127417
assert const folding return shape matches [pr] ( #8006 )
2024-12-03 19:31:06 +08:00
George Hotz
dddfb494d7
don't mutate the uop/lazybuffer, just the Buffer [pr] ( #8000 )
...
* don't mutate the uop/lazybuffer, just the Buffer [pr]
* fix red test
* try different fix
* that
* that's the right fix
* test for fixed behavior
* bump to 3.12
2024-12-03 19:03:51 +08:00