tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
uuuvn	8e3f337075	Skip flaky test in ci (#10696 ) `test_data_parallel_resnet_train_step` is already skipped on LLVM/CPU: ```python @unittest.skipIf(CI and REAL_DEV in ("CUDA", "NV", "LLVM", "CPU"), "slow, and flaky on LLVM/CPU") @unittest.skipIf(REAL_DEV == "WEBGPU" and not OSX, "WEBGPU Vulkan can only run kernels with up to 10 buffers") def test_data_parallel_resnet_train_step(self): ``` It looks like `test_data_parallel_resnet` (no `_train_step`) is flaky in a similar way: https://github.com/tinygrad/tinygrad/actions/runs/15472667248/job/43560773882?pr=10642#step:9:64	2025-06-08 08:24:09 -07:00
George Hotz	54db1f8ee8	prevent huge waste of multi ram (#10669 ) * prevent huge waste of multi ram * fix ram usage * only define var * add resolve * fix tests * fix cifar training * remove that logic * fix test without long	2025-06-06 17:17:21 -07:00
George Hotz	7f0f97aa76	new test_multitensor tests (#10667 ) * new test_multitensor tests * cleanup scheduler	2025-06-06 10:26:28 -07:00
chenyu	4a6d84c4c3	hotfix llama start_pos vmax is max_context-1 (#10659 ) * hotfix llama start_pos vmax is max_context-1 fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0` * hotfix: multitensor transformer test tests kv cache --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-06-06 00:41:25 -04:00
George Hotz	5eb6e1e65a	Revert "hotfix: multitensor transformer test tests kv cache" This reverts commit `ad9f88419a`.	2025-06-05 21:15:34 -07:00
George Hotz	ad9f88419a	hotfix: multitensor transformer test tests kv cache	2025-06-05 21:08:57 -07:00
George Hotz	8325c4f192	tests for multi assign (#10658 ) * tests for multi assign * transformer tests * add that assert	2025-06-05 20:56:40 -07:00
George Hotz	4c315f8e17	MSTACK little non-functional changes (#10648 )	2025-06-05 13:20:22 -07:00
chenyu	d0969f5a1f	cleanup multi tests (#10635 )	2025-06-05 00:28:44 -04:00
qazal	6d07087fe1	remove contiguous from MSELECT 2 (#10522 ) * remove contiguous from MSELECT * test_shrink_on_shard_axis --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-05-26 19:19:01 +03:00
uuuvn	ec9955c956	Use REAL_DEV for test skips (#10420 ) This should fix remote cpu tests flakiness (segfaults were in `test_data_parallel_resnet_train_step` which is skipped on cpu but wasn't skipped on remote cpu)	2025-05-19 17:32:14 -07:00
qazal	cc8dda1d75	move multi_map to grouper rewrite pass (#10409 ) * move multi_map to grouper rewrite pass * delete that	2025-05-19 10:44:06 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
George Hotz	6ec88d94df	add tests for multi ram usage [pr] (#10376 )	2025-05-17 15:33:40 -07:00
George Hotz	e13f2a3092	multi is O(1) (#10183 ) * multi is O(1) * allreduce * no new uops needed * junk * something * simple * that's really what i want * closer * inject _device_num * pretty print * cleanups * this * early dnum * ops allreduce is good * ish * device is the tuple and this is fine * simpler * progress * copy_multi * work * more tests * more tests pass * work * no None axis * tests * no none multi * type fixes * pre commit passes * lil * remove this * mlperf dataloader on mac * that test was wrong * unbind * support DEBUG=2 * realize * only unbind bound vars * don't include fixedvars * graph test * one test * fixedvars in hcq * new ring reduce * ring reduce * simpler ring * mselect * mselect doesn't work * Revert "mselect doesn't work" This reverts commit `c78b77bd7d`. * Revert "mselect" This reverts commit `bb2e430ac3`. * simpler * fixups * no optional * fix jit * move things around * cleanup multi * simpler multi * simpler reshape	2025-05-16 23:14:23 -07:00
George Hotz	e1a40e8040	add hcq fixedvars support [pr] (#10356 ) * add hcq fixedvars support [pr] * different test * fixedvars are only for comp_queues * fix hcq varvals	2025-05-16 22:05:53 -07:00
George Hotz	a4a25720b2	add test_multitensor_jit_input [pr] (#10347 )	2025-05-15 20:47:57 -07:00
George Hotz	568d6d96e7	small changes from new multi [pr] (#10318 )	2025-05-14 20:50:59 -07:00
George Hotz	42e70193c9	multi: instead of real, just copy (#10289 ) * multi: instead of real, just copy * fix test * remove real	2025-05-14 10:36:55 -07:00
George Hotz	5f64bbc63d	improve multi tests + add support for fixedvars [pr] (#10281 ) * improve multi tests + add support for fixedvars [pr] * add support for fixedvars	2025-05-13 09:27:00 -07:00
uuuvn	dba073e5c0	Less messy broken graph on paravirtualized metal workaround (#10182 ) * Less messy broken graph on paravirtualized metal workaround GitHub CI macOS runners use paravirtualized metal which is broken with graph (some comments say that ICB in particular is broken but in my testing it was fine sometimes, but other times hitting an assert inside metal's code related to resouces, so not sure). > Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458. This can be reproduced locally with any virtualization software (like utm) that can create macOS VMs with apple's own virtualization framework. * unused import	2025-05-06 20:41:02 +03:00
George Hotz	d81acbeef6	multi: move shrink after copy (#10109 ) * multi: move shrink after copy * passing now	2025-04-30 10:29:51 -04:00
George Hotz	2ed3acd767	toposort is a function [pr] (#10004 )	2025-04-23 16:25:03 +01:00
chenyu	c8f47c1d07	not_support_multi_device helper (#9831 ) unify the test helper to skip ci device that does not support multi	2025-04-10 05:25:29 -04:00
chenyu	bca0c85193	skip CI CPU test_data_parallel_resnet_train_step (#9685 ) flaky	2025-04-02 01:04:54 -04:00
chenyu	ba41076e94	update embedding test to not use dtypes.long [pr] (#9556 )	2025-03-23 21:33:38 -04:00
chenyu	f8976dd2eb	enable more webgpu tests (#9502 ) OSX has larger buffer number limit, and it supports fp16 now	2025-03-18 23:03:54 -04:00
George Hotz	117b7a16ef	VALIDATE_WITH_CPU [pr] (#9488 ) * VALIDATE_WITH_CPU [pr] * fix test	2025-03-18 15:15:04 +08:00
Ali Ladjevardi	00028e87bb	Failing test for not realizing intermediate expand in multi-GPU (#9320 )	2025-03-02 12:54:48 +01:00
qazal	2eab8021fb	remove inputs+outputs attributes from ScheduleItem [pr] (#9192 ) * remove inputs/outputs from ScheduleItem * fix test_linearizer * fix test_conv_shapetracker * fix test_schedule + lint * test_image_dtype + multitensor + search	2025-02-21 13:48:11 +01:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
qazal	79fb5c6470	hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] (#8928 )	2025-02-06 16:27:59 +02:00
chenyu	48349efdc1	copy is already contiguous (#8886 )	2025-02-04 17:53:33 -05:00
chenyu	836cf42c2e	fix rand_like for multi (#8880 )	2025-02-03 19:00:14 -05:00
chenyu	746d899dbd	move multi axis to property (#8879 ) also updated tests so that axis is known prior to realize	2025-02-03 16:02:09 -05:00
George Hotz	431a86615d	fix multi Ops.CONTIGUOUS_BACKWARD [pr] (#8843 )	2025-02-01 09:21:31 +08:00
George Hotz	62655e4999	move multi into engine [pr] (#8778 ) * move multi into engine [pr] * all runtime is one sz	2025-01-28 09:15:29 +09:00
George Hotz	0ffd572e1e	fix multi with no real srcs (#8749 )	2025-01-26 08:41:00 +09:00
chenyu	e2b380b743	make UOp.multi real a tuple instead of list [pr] (#8744 ) tuple is immutable. also updated test_rand_like_from_alu test	2025-01-24 20:47:27 -05:00
chenyu	e0e176efbc	failed test case for multi rand_like [pr] (#8740 ) new multi broke multi device dropout	2025-01-24 13:56:51 -05:00
George Hotz	e82ba1454b	MultiLazyBuffer is UOp [pr] (#8662 ) * MultiLazyBuffer is UOp [pr] * this is new mlb * this is the idea * progress * multitensor works * more movement ops * this * MultiLazyBuffer is UOp * cleanups * multi axis * fix more tests * work * not that * add multi grad and move shard to ops * mops not views * no double contig * sweet, all mt tests passing * port old logic * remove lbs * fix realized * whitespace * assign tweak * test_assign_kv_cache_multi passes * fix is_realized * fix JIT for multi * just a few more lines i'll pay them back soon i swear please bro just a few more * no split reduceop for multi	2025-01-24 13:28:55 +09:00
George Hotz	46a8c5e1e5	delete forced_realize (#8615 ) * delete forced_realize * put that back * expectedFailures * cleaner create_subbuffer * more comments --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-20 09:40:36 -08:00
George Hotz	8609b880bd	hotfix: test_backward_sum	2025-01-17 10:25:02 -08:00
chenyu	f8cc971c3b	raise RuntimeError for uneven shards in Tensor.shard [pr] (#8656 )	2025-01-17 12:48:39 -05:00
qazal	23f0ff0ed8	add bitcast to multi [pr] (#8652 )	2025-01-17 03:17:19 -05:00
qazal	2b7db9b45d	delete unused cast/bitcast lines from ops.py [pr] (#8651 ) * move cast and bitcast out * more deletion of bitcast arg * fix test_bitcast_fuses * update tests * work	2025-01-17 03:04:18 -05:00
George Hotz	f29d6f54b8	support multilb gradient [pr] (#8624 )	2025-01-14 18:33:33 -08:00
chenyu	0790d8059f	remove MultiLazyBuffer.from_sharded [pr] (#8620 ) it's eqivalent to taking the lazydata from Tensor.split, then copy to devices	2025-01-14 18:00:49 -05:00
George Hotz	fdd46c9f28	delete view instant rule (#8616 ) * remove cast before view * greener * indexing * delete view instant rule * that passes too * openpilot too * ack * base on cast_before_view * add it as a rewrite rule * VIEW(DEVICE) is also fine * test_shard_memory depends on forced_realize removal * put that back, will go soon * UOp representations change once we don't instantly fold things * do not duplicate tests --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 16:15:13 -05:00

1 2 3 4

165 Commits