Commit Graph

4433 Commits

Author SHA1 Message Date
qazal
b515d796fb inline viz get_name [pr] (#10682)
* inline viz get_name [pr]

* changing name_fxn makes this simpler

* waitUntil dom
2025-06-07 11:16:16 +03:00
wozeparrot
e3805171e2 feat: variable bs bitcast (#10674) 2025-06-06 17:21:53 -07:00
George Hotz
54db1f8ee8 prevent huge waste of multi ram (#10669)
* prevent huge waste of multi ram

* fix ram usage

* only define var

* add resolve

* fix tests

* fix cifar training

* remove that logic

* fix test without long
2025-06-06 17:17:21 -07:00
George Hotz
b68b7dbc2a test winograd is close to normal conv [pr] (#10557)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-06 19:11:49 -04:00
leopf
eb7305e6a4 Tensor.keccak("sha3_256") (#7186)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-06-06 15:24:05 -07:00
chenyu
bdede4924e fix odd number in get_test_global_size (#10671)
factor might not be a integer if input global_size has an odd number in it
2025-06-06 17:31:35 -04:00
George Hotz
7f0f97aa76 new test_multitensor tests (#10667)
* new test_multitensor tests

* cleanup scheduler
2025-06-06 10:26:28 -07:00
chenyu
4a6d84c4c3 hotfix llama start_pos vmax is max_context-1 (#10659)
* hotfix llama start_pos vmax is max_context-1

fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0`

* hotfix: multitensor transformer test tests kv cache

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-06-06 00:41:25 -04:00
George Hotz
5eb6e1e65a Revert "hotfix: multitensor transformer test tests kv cache"
This reverts commit ad9f88419a.
2025-06-05 21:15:34 -07:00
George Hotz
ad9f88419a hotfix: multitensor transformer test tests kv cache 2025-06-05 21:08:57 -07:00
George Hotz
8325c4f192 tests for multi assign (#10658)
* tests for multi assign

* transformer tests

* add that assert
2025-06-05 20:56:40 -07:00
wozeparrot
0d86f8d375 fix failed threefry (#10646) 2025-06-05 17:17:42 -07:00
chenyu
ff1aad7b69 fix const float pow to int tensor (#10655)
was incorrectly casted into int
2025-06-05 19:15:12 -04:00
George Hotz
baba274a76 minimal mstack pr to fix allreduce (#10649)
* minimal mstack pr to fix allreduce

* fix webgpu
2025-06-05 15:14:53 -07:00
George Hotz
4c315f8e17 MSTACK little non-functional changes (#10648) 2025-06-05 13:20:22 -07:00
chenyu
46811d0d3c minor external_model_benchmark cleanup (#10644) 2025-06-05 14:13:28 -04:00
qazal
26afbc954f delete redundant tests from test_schedule [pr] (#10643) 2025-06-05 20:08:39 +03:00
chenyu
80ebce421d remove metal buffer limit in external_model_benchmark [pr] (#10642)
not needed anymore
2025-06-05 13:00:51 -04:00
qazal
28c4997236 check for matching shape order in fused reduce (#10641)
* failing test

* shapes match with ones removed
2025-06-05 19:37:22 +03:00
qazal
1190062812 prevent grouper can_chase while fusing arange [pr] (#10623) 2025-06-05 18:50:21 +03:00
qazal
8c5ea00522 push permutes through fused reduces (#10628)
* fix pushing reshapes through reduceops

* reduceop_view_right should assert on ndims mismatch

* update that, view.reshape asserts it
2025-06-05 16:14:04 +03:00
chenyu
d0969f5a1f cleanup multi tests (#10635) 2025-06-05 00:28:44 -04:00
qazal
571c0296a9 linearizer failure from FUSE_ARANGE default diff (#10629)
* start with test_arange_sum

* test_arange_avgpool2d

* device.renderer.supports_float4
2025-06-04 19:11:52 +03:00
qazal
5056d21b29 add failing TestSchedule.test_arange_sum [pr] (#10627) 2025-06-04 17:23:59 +03:00
qazal
7114b6ab31 viz browser tests (#10626)
* viz browser tests

* expect failure if js/ isn't included

* back green
2025-06-04 14:58:24 +03:00
wozeparrot
4d1686f767 clean: becnhmark -> benchmark (#10620) 2025-06-03 19:28:18 -07:00
qazal
ce9f12dc13 reorder cast before masking constants (#10609)
* failing test from fuzzer

* .numpy() handles bfloat16 better

* const->view->cast becomes const->cast->view

* update TestMovedConstFolding.test_cast_padded
2025-06-03 15:44:03 +03:00
qazal
910cabb081 add kernel count to grouper process replay differ [pr] (#10611) 2025-06-03 15:21:27 +03:00
Ahmed Harmouche
650404a143 [webgpu] Proper shared mem size for packed types (#10585)
* Proper shared mem size in webgpu

* Add test

* Refactor test
2025-06-01 20:18:33 -04:00
qazal
3cc73a0172 simpler process replay main loop [pr] (#10588)
* simpler process replay main loop [pr]

* use logging

* default to 1
2025-06-01 15:03:21 +03:00
qazal
dc882d3d7d merge process replay and viz captures [pr] (#10581)
* refactoring

* test script

* work

* more work

* diff

* repr splits lines correctly

* that

* add location

* add location

* also don't need name_override

* k.copy

* [pr]

* name_override 2

* err
2025-06-01 12:30:10 +03:00
qazal
1f8a8721e9 remove test_unaligns_idxs, UOps don't have order like this [pr] (#10587) 2025-06-01 12:16:14 +03:00
Ahmed Harmouche
35eb4d357a [webgpu] Fix atomic shared mem load inside loop (#10530)
* Disable shared mem atomics on webgpu

* allow_any_len in load pattern matcher to fix temp load inside loop
2025-05-31 09:29:02 -04:00
qazal
5b59728c75 refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541)
* changes to core tinygrad

* fixups pt1

TC=3
docs/abstractions2.py
IMAGE=2
test_quantize_dsp
test_schedule

* more tests

* green now

* images stay images
2025-05-30 14:27:58 +03:00
chenyu
116ffc4e92 cstyle strips paren for AND and OR (#10560) 2025-05-30 07:09:05 -04:00
qazal
bbf05110a2 use kernelize in TestLinearizer.test_indexing_multireduce [pr] (#10571) 2025-05-30 11:27:09 +03:00
qazal
7051bf3fd5 fixup hardcoded asts ptr dtype and constants [pr] (#10570)
* fixup hardcoded asts ptr dtype and constants [pr]

* use kernelize for test_kernel_count
2025-05-30 09:38:32 +03:00
qazal
066196415f UOp.valid and const_like work with just shapes [pr] (#10569)
* UOp.valid and const_like work with just shapes [pr]

* pm_quant left

* pm_quant
2025-05-30 08:55:06 +03:00
George Hotz
b3b43a82c4 remove Tensor.no_grad, it's meaningless now [pr] (#10556) 2025-05-28 22:20:02 -07:00
George Hotz
e140f8f0d8 linearizer test_failure_61 (#10552)
* enumerate cases of Tensors in the JIT

* optional fused optimizers

* add fused optimizer test

* move that there

* ugh

* work on beautiful_cifar

* speed close to hlb_cifar

* test_failure_61

* just the failure
2025-05-28 21:30:50 -07:00
Sieds Lykles
ae02a1e232 [bounty] Z3 symbolic fuzzer [pr] (#10514)
* First version, caught a bug?

* Nicely print failure to reproduce

* Remove that

* Put the assert back

* Change fuzzing to use testing_unit so it has z3

* Test key to match

* Add rule

* Add test

* Add test for edge case 0

* Merge patterns

* update comment

* consistent whitespace

* whitespace

* add condition

* add test

* update comment

* use Variable

* fuzzer using z3_renderer

* Cleaned up printing and debugging

* working new fuzzer

* change some comments and printing

* more formatting

* fuzz failures in seperate file

* fix fstring

* more tests

* naming

* remove added line

* remove comment

* print number of skipped expressions

* use self.assertEqual

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-28 16:28:37 -04:00
George Hotz
98f3d1c26d enumerate cases of Tensors in the JIT (#10548) 2025-05-28 11:51:27 -07:00
qazal
d1f0043331 use store_val helper in test_schedule asserts [pr] (#10540) 2025-05-27 21:48:06 +03:00
George Hotz
5b268121d4 remove becomes map (#10533)
* remove becomes map

* add comment and delete dead code

* multi is a view
2025-05-27 11:47:11 -07:00
George Hotz
a07caaca0d handle stride 0 variable reshape (#10536) 2025-05-27 10:00:24 -07:00
George Hotz
41e3d07d7f view gradient is tricky (#10528)
* view gradient is tricky

* explicit
2025-05-26 22:28:30 -07:00
uuuvn
c29c46853f Very basic mock sqtt (#10512)
This mockgpu sqtt emulation will just ignore basically everything and end
up with a 0x1000 size trace full of zeroes, but just testing for things
like register rename is better than nothing i guess
2025-05-26 14:38:28 -07:00
qazal
6d07087fe1 remove contiguous from MSELECT 2 (#10522)
* remove contiguous from MSELECT

* test_shrink_on_shard_axis

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-05-26 19:19:01 +03:00
geohotstan
602a145f8f Add Tensor.unfold (#10518)
* yoinked 10272

* eitanturok's fixes

* hmmm should size be sint?

* add test
2025-05-26 11:15:44 -04:00
qazal
9169dcfb49 do not create kernels with more inputs than the backend allows (#10510)
* work

* no itertools + top down pass

* clean viz

* python can do that

* webgpu

* gbarrier of gbarrier is gbarrier

* device can be tuple

* bug in toposort

* failing test for gated toposort

* contiguous of gbarrier is gbarrier

* check for binops

* Revert "check for binops"

This reverts commit 53e3cdf720.

* viz + match on gbarrier, self exists by default

* alt

* green now

* cleanup
2025-05-26 18:02:03 +03:00