George Hotz
e37bff6c19
fix bug in jit prune with copy [pr] ( #8073 )
2024-12-06 18:38:23 +08:00
George Hotz
aae8557ada
test copy inside jit [pr] ( #8072 )
2024-12-06 17:51:50 +08:00
George Hotz
e2fe7f0d2f
hotfix: actually fix pylint, it's a python 3.10 issue
2024-12-06 13:53:46 +08:00
George Hotz
b28d660172
update self_tokenize, fix pylint maybe
2024-12-06 13:49:41 +08:00
George Hotz
344fd4845c
example: self_tokenize. someday tinygrad will be recursively self improving
2024-12-06 13:35:02 +08:00
JaSpa99
3c5d5f9414
mypy==1.13.0 ( #7990 )
...
* explicit instantiation and narrowing asserts
* explicit cast
* bump
* one line assert
* handle case for no copy_queue_t
* Revert "handle case for no copy_queue_t"
This reverts commit 38347806ca .
* more readable control flow
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-12-06 12:09:14 +08:00
leopf
65b6696f3b
refactor safe_load ( #8035 )
...
* refactor safe_load
* cleanup
2024-12-06 12:08:21 +08:00
chenyu
e7d5fe4a32
improve idiv _min_max ( #8066 )
...
for the cases that the we don't know the exact bounds, we might still know the sign. with this, can remove some resolve for symbolic shapetracker
2024-12-05 23:02:16 -05:00
chenyu
13b954f22c
unify expand conditions [pr] ( #8065 )
...
same condition (check if old == new or old == 1) in tensor and view. also renamed _pad_left to _align_left because it's not really a pad
2024-12-05 21:40:14 -05:00
chenyu
aefdff4ef5
reshape mask cleanups [pr] ( #8064 )
...
don't need canonicalize_st because we always merge 1 in `_merge_dims`
2024-12-05 20:20:43 -05:00
chenyu
05dba6e4ee
minor to_indexed_uops cleanup [pr] ( #8063 )
2024-12-05 17:15:03 -05:00
chenyu
b2dd703592
fix typing of UOp.range [pr] ( #8062 )
...
start/end should not be float or bool
2024-12-05 14:56:34 -05:00
Sieds Lykles
49c6dab74b
Add pattern for div mod recombine with gcd ( #8061 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-05 13:16:58 -05:00
geohotstan
707e9a9c8e
add _one_hot_along_dim helper for Tensor.arange masking ( #8039 )
...
* feelsbadman
* feelsextrabadman
* make sure indices is on same device as self Tensor
* renamed to _one_hot_along_dim
* revert onnx change will do them in onnx only PRs
* address feedback
* add onnx changes here too
* make pad arg better
* revert pad arg
* maybe still keep dim
* simplify onehot onnx ops more
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-05 12:43:00 -05:00
chenyu
3c5983473a
combine parentless reduce rule [pr] ( #8059 )
2024-12-05 11:28:35 -05:00
chenyu
87594a8153
simpler dtypes.max for int [pr] ( #8058 )
2024-12-05 10:31:41 -05:00
geohotstan
66b8242375
Simple onnx.py clean ups ( #8054 )
...
* start
* simplify ops
* why did this not work before
* will split buffer parse to separate pr
* flip the error order
* only this much for now
* to_python_const clean up
* minimize diff
* move tensor_methods into onnx.py
* improve some type signatures
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-05 10:31:26 -05:00
chenyu
5c6ed5dba6
lower test_conv_3x3_256_32_32_256_256 expectation ( #8060 )
...
failed https://github.com/tinygrad/tinygrad/actions/runs/12182799887/job/33982676812#step:9:210
2024-12-05 10:30:56 -05:00
Ahmed Harmouche
c6f5bb03fa
YoloV8 WebGPU fixes ( #8057 )
...
* Bump up input size to 416, show if webgpu is not supported
* Minor fix in export_model
2024-12-05 16:23:45 +01:00
nimlgen
78c01a5c2b
amd general _gpu_alloc ( #8056 )
...
* amd general _gpu_alloc
* hmm
* ops
2024-12-05 15:50:23 +03:00
nimlgen
8071600897
nv one _gpu_alloc ( #8055 )
2024-12-05 15:22:03 +03:00
Ahmed Harmouche
ff9a89f714
Proper dtypes for input/output of exported WebGPU model ( #8053 )
...
* Respect input/output dtypes in exported WebGPU model
* Add some comments about skipped dtypes
2024-12-05 10:38:05 +01:00
qazal
435a51e10c
reduce folding simple tests [pr] ( #8040 )
...
* reduce folding simple tests [pr]
* test for view and realized src pattern
* realize / buffer behavior
2024-12-05 12:22:45 +08:00
George Hotz
20878be2af
lower test_gemv_4096_16384 expectations
2024-12-05 12:08:26 +08:00
George Hotz
83aecbdc70
do gpuocelot copy manually [pr] ( #8050 )
2024-12-05 11:51:20 +08:00
George Hotz
4a208bfb28
bump download cache version
2024-12-05 11:42:34 +08:00
George Hotz
df18e7cc37
accept filename decorator [pr] ( #8049 )
...
* accept filename decorator [pr]
* add test for safe_load
* bring old tar tests back
2024-12-05 11:40:59 +08:00
Francis Lata
c3187087f7
QwQ-32B-Preview support ( #7962 )
...
* load weights with some debugging
* start running a prompt
* cleanup
* optionally permute layers and cleanup
* add validation for simple prompt
* small cleanup
* minor cleanup with formatting download links
* add a longer prompt
* add timing option
* some typings
* remove unused arg
* reset GlobalCounters
* minor cleanups
2024-12-04 21:46:37 -05:00
chenyu
b3220ca7b1
test cases of always True/False lt ( #8048 )
...
* test cases of always True/False lt
* one more
2024-12-04 20:38:40 -05:00
chenyu
8bb806888b
hook_overflow -> safe_exp2 [pr] ( #8047 )
...
that's the only use case, so no need for indirection
2024-12-04 19:05:38 -05:00
chenyu
99abdc6d39
minor push_swizzle_down_through_elementwise cleanup [pr] ( #8046 )
...
walrus, and if x are the same, prod(x) must be the same
2024-12-04 17:22:37 -05:00
chenyu
5933ec8dc3
use argfix in smax/smin and remove if [pr] ( #8045 )
2024-12-04 17:06:13 -05:00
chenyu
4e518334b8
minor get_grouped_dims cleanup [pr] ( #8044 )
2024-12-04 16:22:51 -05:00
geohotstan
5ce8090d42
simple onnx_ops cleanups ( #8003 )
...
* simple clean ups first
* more work
* kinda have adam
* ooo momentum worked nicely
* almost there
* wow.. is the onnx test wrong
* nicer optim stuff
* just skip that test
* small comment changes
* use naming convention from other parts of codebase
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-04 15:33:03 -05:00
Sieds Lykles
70db1bab5c
Fold nested div with const ( #8010 )
...
* Rebase nested div and with const
* Update the ordering
* return None on vectors
Fixes cpu test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-04 14:59:09 -05:00
chenyu
0693158d28
lower v_theoretical gemv on red ( #8042 )
...
tiny7 is still slower https://github.com/tinygrad/tinygrad/actions/runs/12166149038/job/33931736130#step:8:209
2024-12-04 13:59:40 -05:00
chenyu
5c2b1089b2
vectorized input in div_and_mod_folding returns None [pr] ( #8041 )
2024-12-04 13:36:41 -05:00
qazal
ff6def9ffb
simple contiguous_while_contiguous prereqs [pr] ( #8038 )
...
* simple contiguous_while_contiguous prereqs [pr]
* early realize
* fine if it's folding a non-contig buffer
2024-12-04 23:00:28 +08:00
Ahmed Harmouche
c9e7701417
Fast YoloV8 on WebGPU ( #8036 )
...
* Fast yolov8 with downscaled input
* Faster + FPS meter
* Add loader while model is downloading/compiling
* Title touchup
2024-12-04 15:23:09 +01:00
qazal
b116e1511d
make device on uop optional [pr] ( #8034 )
2024-12-04 20:18:00 +08:00
Ahmed Harmouche
13eedd373b
Run WebGPU tests on ubuntu ( #8033 )
2024-12-04 12:42:04 +01:00
leopf
fb89971e73
use BufferedReader ( #8032 )
2024-12-04 19:08:54 +08:00
George Hotz
08657cb7b0
hotfix: bump expectations in speed_v_theoretical
2024-12-04 19:00:33 +08:00
George Hotz
ea65c79ba2
hotfix: don't spam BEAM debug in speed_v_theoretical
2024-12-04 18:47:16 +08:00
George Hotz
09b00b1b04
hotfix: use kernel timings instead of python timings in speed_v_theoretical
2024-12-04 18:36:17 +08:00
George Hotz
8f65c1fafb
simpler block reorder function [pr] ( #8031 )
...
* simpler block reorder function [pr]
* simpler
* block_reorder in substitute, so wasteful otherwise
* extend and count
* leave push logic for same order
* sort new ctx
* less loop
* Revert "less loop"
This reverts commit 30249d097a .
2024-12-04 17:57:35 +08:00
leopf
f0401e14e8
tar_extract with Tensors ( #7853 )
...
* initial
* USTAR, PAX and GNU support + testing
* from_bytes byteorder
* use TarInfo.frombuf
* tensor only usage
* remove contextlib.suppress
* shorter ow,pax
* more tests
* testing length + move tests
* cleanup
* new approach: RawTensorIO
* fix fetch
* enable read test
* cleanup and ignore fix
* fix for python < 3.12
* make it RawIO
* functions
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-04 17:03:19 +08:00
George Hotz
1e06aefde7
bunch up ops for lines [pr] ( #8030 )
2024-12-04 17:03:01 +08:00
uuuvn
e9c5b23ba1
Use MTLCompiler directly (v2) ( #7920 )
...
* Use MTLCompiler directly (v2)
* to_block_literal and REQUEST_TYPE_COMPILE
* Rewrite command encoding
* Revert to_block_literal
* Maybe that's more readable to some people?
* Typo and comment about stdlib caching
* Update ops_metal.py
* Update ops_metal.py
* Update ops_metal.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-12-04 16:36:48 +08:00
George Hotz
bb98bae751
local reordering in block ( #8029 )
...
* local reordering in block
* load (and parents) is highest priority
* minor loads in order
* comments
* explicit depth
* simpler
* matters less, but store early too
2024-12-04 15:11:29 +08:00