tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-08 22:48:25 -05:00

Author	SHA1	Message	Date
George Hotz	3dbde178c1	mark slow tests as slow instead of as CI (#13736 ) * mark slow tests as slow instead of as CI * CI shouldn't have different behavior * more skips / CI * slow	2025-12-17 10:29:57 -04:00
chenyu	fda73c8180	support LAMB param offload (#13730 ) also added Tensor.shard_like	2025-12-16 19:56:30 -05:00
Nino Risteski	76d465dbc3	optim empty shard #13513 (#13598 ) * optim empty shard * remove tuple * simplify * lint * lint2 * test * remove original buffer unique id * new rule * reset shard * update * reset shard	2025-12-09 12:28:36 -05:00
George Hotz	6bd355fa26	add needs_second_gpu decorator (#13543 ) * add needs_second_gpu decorator * more skips * two more fixes	2025-12-02 19:08:23 -08:00
George Hotz	e1051d00d7	multi like on full_like as well as rand_like (#13402 ) * multi like on full_like as well as rand_like * add test and fix bug * mismatch, optim match * one line	2025-11-20 20:46:48 -08:00
George Hotz	9b2b535fa4	fix issue with multi flip (#13115 )	2025-11-05 15:28:50 -08:00
George Hotz	a59439d013	use UOp.shape property instead of UOp.st (#12664 ) * work on shape property * reshape causing issues * more mops * all mops * need to cache it * _shape is like _device * mostly works * shape is good * const uses _shape * fix tests * size doesn't use st * close * test is broken * one less st * hack for 3 op assign * oops, i didn't mean to change that * support emulate in the NullDevice * reproed failure in emulation * fix wmma	2025-10-15 10:01:34 +08:00
Sieds Lykles	dccdd190aa	uop_given_valid uses less simplify (#12612 ) * uop_given_valid uses less simplify * enable test	2025-10-11 10:57:39 +02:00
chenyu	f2c3a72b0c	remove RANGEIFY flag [pr] (#12577 )	2025-10-09 21:52:54 -04:00
George Hotz	945cc46475	delete children tracking from uop (#12491 ) * delete children tracking from uop * uop children no longer exists * no tracked children * that test is flaky too	2025-10-08 09:04:14 +08:00
qazal	76e8a3250c	rangeify: late zero folding (#12464 ) * rangeify: late zero folding * early * not kernels * none * multi * linter * mstack is sink comment * more comment	2025-10-06 12:52:33 +03:00
chenyu	c1e85f699c	multi test case for sharded ring allreduce (#12462 ) * multi test case for sharded ring allreduce triggers `children not making progress` with RANGEIFY * expect_rangeify_fails	2025-10-05 23:18:24 -04:00
qazal	6a56d3c859	rangeify: only test correctness in multi (#12339 ) * work * more work * back here * skip tests * work	2025-09-30 09:55:59 +03:00
qazal	9513f025c5	apply multi before rangeify (#12298 ) * it doesn't realize it when i reshape * cleaner graph * map out * REDUCE_AXIS also gives the wrong answer * maybe * work * back here * try * more * refactor tests * check MultiBuffer * or copy * fine with this * don't need graph_rewrite_map in rangeify	2025-09-29 14:16:31 +03:00
qazal	57c7e0a8f8	RANGEIFY=1 test_jit (#12254 ) * RANGEIFY=1 test_jit * don't do any of that * disk * simple disk tensor * more work * run more tests * it also doesn't copy everytime * skip tests that hang everything	2025-09-20 17:34:32 +03:00
chenyu	9ad6a56d17	smaller test_simple_reduce (#12124 )	2025-09-11 15:45:38 -04:00
nimlgen	1c6c42715f	unify cpu and llvm (#11982 ) * try unify cpu and llvm * fixes * fix * ops * no llvm * fix * rm * lvmm is ot * oops * override * no llvm * ignore * skip llvm * ooops	2025-09-09 13:54:44 +03:00
George Hotz	a75da49951	use AxisType for UPCAST/UNROLL (#11800 ) * use AxisType for UPCAST/UNROLL * fixes * fix the bug * fix hack * bad test * flaky test	2025-08-23 14:44:48 -07:00
George Hotz	1d307f568c	move device tests to test/device + test cleanups (#11735 ) * move device tests to test/device * test speedups * test device * linalg to unit * upd * so pytest just works * more divide and skip * speed * test devectorize * add pillow	2025-08-19 16:02:20 -07:00
chenyu	dbc7807c61	enable WEBGPU tests with buffer limit (#11489 ) TestSample still fails?	2025-08-03 13:02:44 -07:00
George Hotz	3923e78061	no_vectorized_acc keeps single DEFINE_REG (#11387 ) * no_vectorized_acc keeps single DEFINE_REG * fix ptx, skip flaky test	2025-07-26 11:44:09 -07:00
qazal	ac39f27ae6	viz: non blocking UOp tracing (#10913 ) * viz: non blocking UOp tracing * u.arg * no if Ops.KENREL * drop replace * switch to weakref.WeakKeyDictionary * back * remove ram usage skips, viz works here * cache on reconstruct	2025-06-23 19:59:28 +03:00
George Hotz	531d143780	bring back old sharded rand behavior (#10842 )	2025-06-16 17:23:47 -07:00
George Hotz	81b9c04574	move high level stuff to unit tests [pr] (#10708 ) * move high level stuff to unit tests [pr] * process replay on unit tests * fix pr, less compute * set omp num threads * set 200MB buffer size limit * delete junk * fix tests * faster * move test_indexing to unit * faster	2025-06-08 14:05:56 -07:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
uuuvn	8e3f337075	Skip flaky test in ci (#10696 ) `test_data_parallel_resnet_train_step` is already skipped on LLVM/CPU: ```python @unittest.skipIf(CI and REAL_DEV in ("CUDA", "NV", "LLVM", "CPU"), "slow, and flaky on LLVM/CPU") @unittest.skipIf(REAL_DEV == "WEBGPU" and not OSX, "WEBGPU Vulkan can only run kernels with up to 10 buffers") def test_data_parallel_resnet_train_step(self): ``` It looks like `test_data_parallel_resnet` (no `_train_step`) is flaky in a similar way: https://github.com/tinygrad/tinygrad/actions/runs/15472667248/job/43560773882?pr=10642#step:9:64	2025-06-08 08:24:09 -07:00
George Hotz	54db1f8ee8	prevent huge waste of multi ram (#10669 ) * prevent huge waste of multi ram * fix ram usage * only define var * add resolve * fix tests * fix cifar training * remove that logic * fix test without long	2025-06-06 17:17:21 -07:00
George Hotz	7f0f97aa76	new test_multitensor tests (#10667 ) * new test_multitensor tests * cleanup scheduler	2025-06-06 10:26:28 -07:00
chenyu	4a6d84c4c3	hotfix llama start_pos vmax is max_context-1 (#10659 ) * hotfix llama start_pos vmax is max_context-1 fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0` * hotfix: multitensor transformer test tests kv cache --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-06-06 00:41:25 -04:00
George Hotz	5eb6e1e65a	Revert "hotfix: multitensor transformer test tests kv cache" This reverts commit `ad9f88419a`.	2025-06-05 21:15:34 -07:00
George Hotz	ad9f88419a	hotfix: multitensor transformer test tests kv cache	2025-06-05 21:08:57 -07:00
George Hotz	8325c4f192	tests for multi assign (#10658 ) * tests for multi assign * transformer tests * add that assert	2025-06-05 20:56:40 -07:00
George Hotz	4c315f8e17	MSTACK little non-functional changes (#10648 )	2025-06-05 13:20:22 -07:00
chenyu	d0969f5a1f	cleanup multi tests (#10635 )	2025-06-05 00:28:44 -04:00
qazal	6d07087fe1	remove contiguous from MSELECT 2 (#10522 ) * remove contiguous from MSELECT * test_shrink_on_shard_axis --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-05-26 19:19:01 +03:00
uuuvn	ec9955c956	Use REAL_DEV for test skips (#10420 ) This should fix remote cpu tests flakiness (segfaults were in `test_data_parallel_resnet_train_step` which is skipped on cpu but wasn't skipped on remote cpu)	2025-05-19 17:32:14 -07:00
qazal	cc8dda1d75	move multi_map to grouper rewrite pass (#10409 ) * move multi_map to grouper rewrite pass * delete that	2025-05-19 10:44:06 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
George Hotz	6ec88d94df	add tests for multi ram usage [pr] (#10376 )	2025-05-17 15:33:40 -07:00
George Hotz	e13f2a3092	multi is O(1) (#10183 ) * multi is O(1) * allreduce * no new uops needed * junk * something * simple * that's really what i want * closer * inject _device_num * pretty print * cleanups * this * early dnum * ops allreduce is good * ish * device is the tuple and this is fine * simpler * progress * copy_multi * work * more tests * more tests pass * work * no None axis * tests * no none multi * type fixes * pre commit passes * lil * remove this * mlperf dataloader on mac * that test was wrong * unbind * support DEBUG=2 * realize * only unbind bound vars * don't include fixedvars * graph test * one test * fixedvars in hcq * new ring reduce * ring reduce * simpler ring * mselect * mselect doesn't work * Revert "mselect doesn't work" This reverts commit `c78b77bd7d`. * Revert "mselect" This reverts commit `bb2e430ac3`. * simpler * fixups * no optional * fix jit * move things around * cleanup multi * simpler multi * simpler reshape	2025-05-16 23:14:23 -07:00
George Hotz	e1a40e8040	add hcq fixedvars support [pr] (#10356 ) * add hcq fixedvars support [pr] * different test * fixedvars are only for comp_queues * fix hcq varvals	2025-05-16 22:05:53 -07:00
George Hotz	a4a25720b2	add test_multitensor_jit_input [pr] (#10347 )	2025-05-15 20:47:57 -07:00
George Hotz	568d6d96e7	small changes from new multi [pr] (#10318 )	2025-05-14 20:50:59 -07:00
George Hotz	42e70193c9	multi: instead of real, just copy (#10289 ) * multi: instead of real, just copy * fix test * remove real	2025-05-14 10:36:55 -07:00
George Hotz	5f64bbc63d	improve multi tests + add support for fixedvars [pr] (#10281 ) * improve multi tests + add support for fixedvars [pr] * add support for fixedvars	2025-05-13 09:27:00 -07:00
uuuvn	dba073e5c0	Less messy broken graph on paravirtualized metal workaround (#10182 ) * Less messy broken graph on paravirtualized metal workaround GitHub CI macOS runners use paravirtualized metal which is broken with graph (some comments say that ICB in particular is broken but in my testing it was fine sometimes, but other times hitting an assert inside metal's code related to resouces, so not sure). > Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458. This can be reproduced locally with any virtualization software (like utm) that can create macOS VMs with apple's own virtualization framework. * unused import	2025-05-06 20:41:02 +03:00
George Hotz	d81acbeef6	multi: move shrink after copy (#10109 ) * multi: move shrink after copy * passing now	2025-04-30 10:29:51 -04:00
George Hotz	2ed3acd767	toposort is a function [pr] (#10004 )	2025-04-23 16:25:03 +01:00
chenyu	c8f47c1d07	not_support_multi_device helper (#9831 ) unify the test helper to skip ci device that does not support multi	2025-04-10 05:25:29 -04:00
chenyu	bca0c85193	skip CI CPU test_data_parallel_resnet_train_step (#9685 ) flaky	2025-04-02 01:04:54 -04:00

1 2 3 4

189 Commits