geohotstan
d52e91db7b
ONNX ops clean ups ( #9622 )
...
* combine work from remove numpy and onnx ops tests
* clippy
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-30 21:39:22 -04:00
geohotstan
a08b07b4da
Bump onnx==1.17.0 ( #9618 )
...
* bump
* remove resize tf_crop_and_resize
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-30 03:21:51 -04:00
nimlgen
54e1e59b44
am: rdna 4 support ( #9621 )
...
* hm
* fix
* return this
* fine
* g
* ruff
* fix
2025-03-29 23:16:27 +07:00
nimlgen
118bd1cbed
hotfix: amd imports ( #9620 )
2025-03-29 20:19:53 +07:00
uuuvn
dd9aae02c3
Refactor ops_amd.py (MI300X prereq) ( #9428 )
2025-03-29 00:17:20 +07:00
nimlgen
fa0ebbd237
jit: optimize before pickle ( #9611 )
...
* jit: optimize before pickle
* optimize weights
* fix
* mypy
* mypy2
2025-03-28 19:06:09 +07:00
Andrew Furey
50dee4a7b3
add test for checking const gradients ( #9598 )
2025-03-27 15:17:37 -04:00
chenyu
5358b0904b
update uop_given_valid if a node becomes const ( #9604 )
...
* update uop_given_valid if a node becomes const
* cleanup
2025-03-27 14:57:46 -04:00
qazal
bf94924d5a
fix viz with nested graph_rewrite ( #9595 )
2025-03-27 13:14:28 +08:00
qazal
e5ff7b23d7
refactor to @track_matches + add failing test_nested_rewrite ( #9592 )
...
* test_nested_rewrite
* refactor to track_matches
* positional arg
2025-03-27 11:11:56 +08:00
nimlgen
dc9da1d917
memplan into one buffer ( #9526 )
...
* new memplanner
* new should works
* fix
* VALIDATE_MEMORY_PLANNER
* hm?
* ugh
* fix alignment
* fix2
* rm
* tiny fixes
* test
* comments and fixes
* fix2
* liiiinetr
* t
* fix
2025-03-27 01:46:50 +07:00
nimlgen
e88a640ca5
fix _access_resources for offset buffers ( #9580 )
...
* fix _access_resources for offset buffers
* test
2025-03-26 18:42:43 +07:00
George Hotz
9115ce8860
linearizer fixups from DSP branch ( #9581 )
2025-03-26 18:28:15 +08:00
nimlgen
ccbcdca473
add memplanner tests ( #9577 )
2025-03-26 10:59:39 +07:00
chenyu
cddd750d68
add a failed test case for jit/nojit rand [pr] ( #9574 )
...
currently adding jit produced different rand values
2025-03-25 13:32:44 -04:00
qazal
52301fe68e
move Buffer refcount increment out of schedule.py ( #9564 )
...
* move Buffer refcount increment out of schedule.py
* add TestGC.test_assign_refcount
* refcount refers to Ops.BUFFER UOps
2025-03-25 12:08:27 +08:00
chenyu
6427272bf6
minor update to rand [pr] ( #9566 )
2025-03-24 18:49:50 -04:00
qazal
d7c754ce49
failing test for UOp buffer ref count ( #9563 )
...
* failing test for UOp buffer ref count
* lint
2025-03-25 00:10:48 +08:00
b1tg
f90001e1a6
amd llvm render (no_comgr prereq) ( #9543 )
...
* amd llvm render
* skip test_div_rounding_mode
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-03-24 22:50:51 +08:00
George Hotz
74d98eafb8
add onnx frontend stub [pr] ( #9558 )
2025-03-24 12:24:34 +08:00
chenyu
ba41076e94
update embedding test to not use dtypes.long [pr] ( #9556 )
2025-03-23 21:33:38 -04:00
nimlgen
d5667419af
am: move out pte creation logic ( #9548 )
...
* am: move out pte creation logic
* emu
* ops
2025-03-23 18:29:10 +07:00
geohotstan
309afa20b7
add Tensor.max_unpool2d ( #9518 )
...
* why does max_unpool2d feel slower than out.gradient ...
* slightly cleaner
* what happened to ruff
* need to think about this some more
* slightly faster now?
* clean up, 1 more failing edge case
* ok good
* working TINY_BACKEND
* nit doc wording
* retry CI
2025-03-22 12:11:33 -04:00
quortus
bdd44d4255
Fix DSP transcendentals ( #9542 )
2025-03-22 11:08:18 +08:00
chenyu
c33679c47b
increase size in test_multinomial_counterexample ( #9540 )
...
should be less flaky
2025-03-21 17:46:52 -04:00
Francis Lata
1a1087e3a0
cleanups on losses and dataset tests ( #9538 )
2025-03-21 17:03:18 -04:00
Francis Lata
8cbe4009fc
RetinaNet losses ( #9536 )
...
* add sigmoid_focal_loss and l1_loss
* update ref implementation comment
2025-03-21 15:52:54 -04:00
Francis Lata
e6389184c5
update comment for retinanet dataloader implementations ( #9534 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-21 15:07:45 -04:00
Francis Lata
eb95825eea
RetinaNet dataloader ( #9442 )
...
* retinanet dataloader
* remove batch_size from generate_anchors
* refactor kits19 dataset tests
* add tests for dataloader
* fix testing setup and cleanups
* remove unused import
2025-03-21 13:36:41 -04:00
b1tg
58206fa8a9
add amd llvm compiler ( #9519 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-21 23:13:27 +08:00
George Hotz
8e555c586c
switch quantization to unsigned/unsigned + add Ops.REDUCE ( #9527 )
...
* switch quantization to unsigned/unsigned + add Ops.REDUCE
* tests
* nhwc + replay pkl
2025-03-21 17:02:37 +08:00
George Hotz
3c5161b4cb
add validation of the bounds of Ops.INDEX ( #9503 )
...
* add validation of the bounds of Ops.INDEX
* do mask properly
* more validation
* correct
* fix gated
* add CAST support to vmin/vmax
* fix ptx and image
* ptx no diff
* upat.index also stays
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-03-20 12:15:55 +08:00
qazal
0b20f91ce7
remove move_mask from the devectorizer ( #9511 )
...
* remove move_mask from the devectorizer
* add (wrong) ptx
* reason
* enable index addition in PTX, we won't have the INDEX anyways
* space
2025-03-20 11:53:12 +08:00
qazal
1839e8c9b3
place masks in INDEX for TestGatedStoreRewrite [pr] ( #9512 )
2025-03-20 09:46:53 +08:00
geohotstan
8c0d0a122c
Add return_indices to max_pool ( #9506 )
...
* wow argmax is so good
* 1 less line
* clean up and better variable names
* is this torch thing right...?
* add more tests
* slap a TODO on it
* clean ups
* prettier looking code and fix ceil mode test
* add return types and some docs
* ok that was a bad example since indices == value, just no example
2025-03-19 15:25:37 -04:00
chenyu
189f62d44f
add rounding to tqdm unit scale ( #9507 )
...
fixed `AssertionError: ' 1.00/10.0 1000it/s]' != ' 1.00/10.0 1.00kit/s]'`
2025-03-19 12:08:46 -04:00
b1tg
2c87a22cf2
fix prg size calculation when there are adjacent mapped ranges ( #9498 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-03-19 11:55:03 +08:00
chenyu
f8976dd2eb
enable more webgpu tests ( #9502 )
...
OSX has larger buffer number limit, and it supports fp16 now
2025-03-18 23:03:54 -04:00
qazal
ae688e4103
simple failing test for scheduling parallel reduce [pr] ( #9501 )
...
* simple failing test for scheduling parallel reduce [pr]
* atol
2025-03-19 10:52:13 +08:00
George Hotz
117b7a16ef
VALIDATE_WITH_CPU [pr] ( #9488 )
...
* VALIDATE_WITH_CPU [pr]
* fix test
2025-03-18 15:15:04 +08:00
qazal
935cd01f56
simple failing test for graph_rewrite children [pr] ( #9489 )
...
* simple failing test for graph_rewrite children [pr]
* lint
* update too
2025-03-18 13:07:21 +08:00
Anish Umale
5e58f4b65b
Tiny backend test_ops fix part 3 ( #9483 )
...
* extract straightforward things from https://github.com/tinygrad/tinygrad/pull/9302
* pass dtype and device for ones_like
2025-03-17 18:01:51 -04:00
TJ
9fcef4d009
add masked_select to tensor.py ( #9468 )
...
* add masked_select to tensor.py
* fix tests
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-03-17 16:05:36 -04:00
chenyu
4f8eac59ea
failed test case for threefry ( #9469 )
...
* failed test case for threefry
not sure if it's always like this, but increment before _threefry_random_bits is incorrect. the counts should start with random numbers generated so far.
use jax to generate 20 + 20 + 10 random numbers, the first 20 + 20 matches and the last 10 are different. just moving increment after _threefry_random_bits matches the number but jit test failes
* workaround
* why is this different?
* revert those
* and that
2025-03-17 14:52:10 -04:00
geohotstan
53d6f1e1bb
Add bitonic cat sort ( #9422 )
...
* poc
* repeated values fail, sigh
* is this being timed out?
* fix up down names
* bitonic v2, does this run?
* bitonic v3, faster
* bitonic v3.1, faster
* bitonic v3.1.1, same speed unlucky
* support dim and indices
* bitonic v3.2, simpler code, TODO repeated indices
* bruv gimme green for once cmon
* cat (stack) implementation, slow but maybe one day when cat is fast meow
* revert to v3.2
* bitonic v4, who let the cats out edition
* clean up variable names
* figured out repeated indices :D
* ruff check --fix
* use sort for topk
* add Tensor.sort everywhere
* fix docs and add some types
* slightly better variable names
* am I doing torch inplace correctly?
* delegate sort to values_stable
* add a contig, faster first sort
* maybe don't test_inplace
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-17 12:01:23 -04:00
qazal
e03c0aacf2
more explicit DONT_PUSH_VIEWS [pr] ( #9479 )
...
* more explicit DONT_PUSH_VIEWS [pr]
* update tests to not handcode ast
* lint
* test_recursive_swizzle and test_simple_store_reshape
2025-03-17 20:43:21 +08:00
qazal
3b00a778ba
fix view_left for unsafe pad ops [pr] ( #9478 )
2025-03-17 19:02:02 +08:00
qazal
813f713edc
merge_views for buffer ops + create valids last ( #9472 )
...
* merge_views for buffer ops + create valids last
* view.arg
* pass
2025-03-17 17:15:44 +08:00
qazal
bd1f71c1e2
simple failing test for extra ops in VALID [pr] ( #9474 )
...
* simple failing test for extra valids [pr]
* this has DEBUG=4
2025-03-17 17:02:40 +08:00
qazal
e26caf4c3a
hotfix: skip test_mean_half_precision_underflow on amd ci ( #9476 )
...
The global size is very large (781250 gidx) and the emulated version takes more than 1
minute to execute the kernel.
2025-03-17 16:47:48 +08:00