Commit Graph

194 Commits

Author SHA1 Message Date
chenyu
35c9701df0 update outdated tests and comments (#14090) 2026-01-10 01:00:48 -05:00
chenyu
2833c5a54b few more jit tests with multi tensor inputs (#14047) 2026-01-06 22:05:22 -05:00
nimlgen
25440f0f72 all2all (#13902)
* all2all

* um

* fix

* x

* um

* simler

* mypy

* fix

* t

* cmnts
2025-12-31 16:38:32 +03:00
chenyu
784b919f7f Revert "optim empty shard #13513 (#13598)" (#13855)
* Revert "optim empty shard #13513 (#13598)"

This reverts commit 76d465dbc3.

* test_arange_shrink

* update test
2025-12-27 21:10:23 -05:00
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
George Hotz
3dbde178c1 mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
chenyu
fda73c8180 support LAMB param offload (#13730)
also added Tensor.shard_like
2025-12-16 19:56:30 -05:00
Nino Risteski
76d465dbc3 optim empty shard #13513 (#13598)
* optim empty shard

* remove tuple

* simplify

* lint

* lint2

* test

* remove original buffer unique id

* new rule

* reset shard

* update

* reset shard
2025-12-09 12:28:36 -05:00
George Hotz
6bd355fa26 add needs_second_gpu decorator (#13543)
* add needs_second_gpu decorator

* more skips

* two more fixes
2025-12-02 19:08:23 -08:00
George Hotz
e1051d00d7 multi like on full_like as well as rand_like (#13402)
* multi like on full_like as well as rand_like

* add test and fix bug

* mismatch, optim match

* one line
2025-11-20 20:46:48 -08:00
George Hotz
9b2b535fa4 fix issue with multi flip (#13115) 2025-11-05 15:28:50 -08:00
George Hotz
a59439d013 use UOp.shape property instead of UOp.st (#12664)
* work on shape property

* reshape causing issues

* more mops

* all mops

* need to cache it

* _shape is like _device

* mostly works

* shape is good

* const uses _shape

* fix tests

* size doesn't use st

* close

* test is broken

* one less st

* hack for 3 op assign

* oops, i didn't mean to change that

* support emulate in the NullDevice

* reproed failure in emulation

* fix wmma
2025-10-15 10:01:34 +08:00
Sieds Lykles
dccdd190aa uop_given_valid uses less simplify (#12612)
* uop_given_valid uses less simplify

* enable test
2025-10-11 10:57:39 +02:00
chenyu
f2c3a72b0c remove RANGEIFY flag [pr] (#12577) 2025-10-09 21:52:54 -04:00
George Hotz
945cc46475 delete children tracking from uop (#12491)
* delete children tracking from uop

* uop children no longer exists

* no tracked children

* that test is flaky too
2025-10-08 09:04:14 +08:00
qazal
76e8a3250c rangeify: late zero folding (#12464)
* rangeify: late zero folding

* early

* not kernels

* none

* multi

* linter

* mstack is sink comment

* more comment
2025-10-06 12:52:33 +03:00
chenyu
c1e85f699c multi test case for sharded ring allreduce (#12462)
* multi test case for sharded ring allreduce

triggers `children not making progress` with RANGEIFY

* expect_rangeify_fails
2025-10-05 23:18:24 -04:00
qazal
6a56d3c859 rangeify: only test correctness in multi (#12339)
* work

* more work

* back here

* skip tests

* work
2025-09-30 09:55:59 +03:00
qazal
9513f025c5 apply multi before rangeify (#12298)
* it doesn't realize it when i reshape

* cleaner graph

* map out

* REDUCE_AXIS also gives the wrong answer

* maybe

* work

* back here

* try

* more

* refactor tests

* check MultiBuffer

* or copy

* fine with this

* don't need graph_rewrite_map in rangeify
2025-09-29 14:16:31 +03:00
qazal
57c7e0a8f8 RANGEIFY=1 test_jit (#12254)
* RANGEIFY=1 test_jit

* don't do any of that

* disk

* simple disk tensor

* more work

* run more tests

* it also doesn't copy everytime

* skip tests that hang everything
2025-09-20 17:34:32 +03:00
chenyu
9ad6a56d17 smaller test_simple_reduce (#12124) 2025-09-11 15:45:38 -04:00
nimlgen
1c6c42715f unify cpu and llvm (#11982)
* try unify cpu and llvm

* fixes

* fix

* ops

* no llvm

* fix

* rm

* lvmm is ot

* oops

* override

* no llvm

* ignore

* skip llvm

* ooops
2025-09-09 13:54:44 +03:00
George Hotz
a75da49951 use AxisType for UPCAST/UNROLL (#11800)
* use AxisType for UPCAST/UNROLL

* fixes

* fix the bug

* fix hack

* bad test

* flaky test
2025-08-23 14:44:48 -07:00
George Hotz
1d307f568c move device tests to test/device + test cleanups (#11735)
* move device tests to test/device

* test speedups

* test device

* linalg to unit

* upd

* so pytest just works

* more divide and skip

* speed

* test devectorize

* add pillow
2025-08-19 16:02:20 -07:00
chenyu
dbc7807c61 enable WEBGPU tests with buffer limit (#11489)
TestSample still fails?
2025-08-03 13:02:44 -07:00
George Hotz
3923e78061 no_vectorized_acc keeps single DEFINE_REG (#11387)
* no_vectorized_acc keeps single DEFINE_REG

* fix ptx, skip flaky test
2025-07-26 11:44:09 -07:00
qazal
ac39f27ae6 viz: non blocking UOp tracing (#10913)
* viz: non blocking UOp tracing

* u.arg

* no if Ops.KENREL

* drop replace

* switch to weakref.WeakKeyDictionary

* back

* remove ram usage skips, viz works here

* cache on reconstruct
2025-06-23 19:59:28 +03:00
George Hotz
531d143780 bring back old sharded rand behavior (#10842) 2025-06-16 17:23:47 -07:00
George Hotz
81b9c04574 move high level stuff to unit tests [pr] (#10708)
* move high level stuff to unit tests [pr]

* process replay on unit tests

* fix pr, less compute

* set omp num threads

* set 200MB buffer size limit

* delete junk

* fix tests

* faster

* move test_indexing to unit

* faster
2025-06-08 14:05:56 -07:00
George Hotz
32e9949052 rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
uuuvn
8e3f337075 Skip flaky test in ci (#10696)
`test_data_parallel_resnet_train_step` is already skipped on LLVM/CPU:

```python
@unittest.skipIf(CI and REAL_DEV in ("CUDA", "NV", "LLVM", "CPU"), "slow, and flaky on LLVM/CPU")
@unittest.skipIf(REAL_DEV == "WEBGPU" and not OSX, "WEBGPU Vulkan can only run kernels with up to 10 buffers")
def test_data_parallel_resnet_train_step(self):
```

It looks like `test_data_parallel_resnet` (no `_train_step`) is flaky in a similar way:
https://github.com/tinygrad/tinygrad/actions/runs/15472667248/job/43560773882?pr=10642#step:9:64
2025-06-08 08:24:09 -07:00
George Hotz
54db1f8ee8 prevent huge waste of multi ram (#10669)
* prevent huge waste of multi ram

* fix ram usage

* only define var

* add resolve

* fix tests

* fix cifar training

* remove that logic

* fix test without long
2025-06-06 17:17:21 -07:00
George Hotz
7f0f97aa76 new test_multitensor tests (#10667)
* new test_multitensor tests

* cleanup scheduler
2025-06-06 10:26:28 -07:00
chenyu
4a6d84c4c3 hotfix llama start_pos vmax is max_context-1 (#10659)
* hotfix llama start_pos vmax is max_context-1

fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0`

* hotfix: multitensor transformer test tests kv cache

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-06-06 00:41:25 -04:00
George Hotz
5eb6e1e65a Revert "hotfix: multitensor transformer test tests kv cache"
This reverts commit ad9f88419a.
2025-06-05 21:15:34 -07:00
George Hotz
ad9f88419a hotfix: multitensor transformer test tests kv cache 2025-06-05 21:08:57 -07:00
George Hotz
8325c4f192 tests for multi assign (#10658)
* tests for multi assign

* transformer tests

* add that assert
2025-06-05 20:56:40 -07:00
George Hotz
4c315f8e17 MSTACK little non-functional changes (#10648) 2025-06-05 13:20:22 -07:00
chenyu
d0969f5a1f cleanup multi tests (#10635) 2025-06-05 00:28:44 -04:00
qazal
6d07087fe1 remove contiguous from MSELECT 2 (#10522)
* remove contiguous from MSELECT

* test_shrink_on_shard_axis

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-05-26 19:19:01 +03:00
uuuvn
ec9955c956 Use REAL_DEV for test skips (#10420)
This should fix remote cpu tests flakiness (segfaults were in
`test_data_parallel_resnet_train_step` which is skipped on cpu but wasn't
skipped on remote cpu)
2025-05-19 17:32:14 -07:00
qazal
cc8dda1d75 move multi_map to grouper rewrite pass (#10409)
* move multi_map to grouper rewrite pass

* delete that
2025-05-19 10:44:06 +03:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
George Hotz
6ec88d94df add tests for multi ram usage [pr] (#10376) 2025-05-17 15:33:40 -07:00
George Hotz
e13f2a3092 multi is O(1) (#10183)
* multi is O(1)

* allreduce

* no new uops needed

* junk

* something

* simple

* that's really what i want

* closer

* inject _device_num

* pretty print

* cleanups

* this

* early dnum

* ops allreduce is good

* ish

* device is the tuple and this is fine

* simpler

* progress

* copy_multi

* work

* more tests

* more tests pass

* work

* no None axis

* tests

* no none multi

* type fixes

* pre commit passes

* lil

* remove this

* mlperf dataloader on mac

* that test was wrong

* unbind

* support DEBUG=2

* realize

* only unbind bound vars

* don't include fixedvars

* graph test

* one test

* fixedvars in hcq

* new ring reduce

* ring reduce

* simpler ring

* mselect

* mselect doesn't work

* Revert "mselect doesn't work"

This reverts commit c78b77bd7d.

* Revert "mselect"

This reverts commit bb2e430ac3.

* simpler

* fixups

* no optional

* fix jit

* move things around

* cleanup multi

* simpler multi

* simpler reshape
2025-05-16 23:14:23 -07:00
George Hotz
e1a40e8040 add hcq fixedvars support [pr] (#10356)
* add hcq fixedvars support [pr]

* different test

* fixedvars are only for comp_queues

* fix hcq varvals
2025-05-16 22:05:53 -07:00
George Hotz
a4a25720b2 add test_multitensor_jit_input [pr] (#10347) 2025-05-15 20:47:57 -07:00
George Hotz
568d6d96e7 small changes from new multi [pr] (#10318) 2025-05-14 20:50:59 -07:00
George Hotz
42e70193c9 multi: instead of real, just copy (#10289)
* multi: instead of real, just copy

* fix test

* remove real
2025-05-14 10:36:55 -07:00
George Hotz
5f64bbc63d improve multi tests + add support for fixedvars [pr] (#10281)
* improve multi tests + add support for fixedvars [pr]

* add support for fixedvars
2025-05-13 09:27:00 -07:00