tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-08 14:43:57 -05:00

Author	SHA1	Message	Date
George Hotz	ceb9d94eab	Update AGENTS.md	2025-05-19 17:59:59 -07:00
George Hotz	9389edf7ac	hotfix: add AGENTS.md	2025-05-19 17:48:42 -07:00
uuuvn	ec9955c956	Use REAL_DEV for test skips (#10420 ) This should fix remote cpu tests flakiness (segfaults were in `test_data_parallel_resnet_train_step` which is skipped on cpu but wasn't skipped on remote cpu)	2025-05-19 17:32:14 -07:00
nimlgen	9a199ccd81	am: try to modprobe vfio (#10418 ) * am: try to modprobe vfio * fix	2025-05-19 23:46:50 +03:00
chenyu	67d1364106	update LOGMLPERF in red resnet run_and_time (#10416 )	2025-05-19 13:23:33 -04:00
Sieds Lykles	db09676250	Dont simplify gate in gate, fix `FUSE_ARANGE=1 python test/test_ops.py TestOps.test_scatter_add` (#10411 ) * substitute out index * Add test * change comment	2025-05-19 13:16:21 -04:00
chenyu	116d9e6306	run mlperf resnet on red box (#10413 ) also made push to `update_mlperf` branch trigger	2025-05-19 12:48:36 -04:00
George Hotz	f1fe1f93c1	hotfix: 14000 lines	2025-05-19 09:40:53 -07:00
qazal	90eb3c0e5d	add MobileNetV2 benchmark to comma CI (#10250 ) * add MobileNetV2 to comma CI * symlink imagenet * also the signature * comment that out * need imagenetmock * same train and test set * quantize on CPU=1 * verbose * need __hexagon_divsf3 * 0x858d6c15 * quant cpu + CC=clang-19	2025-05-19 18:22:50 +03:00
qazal	f9a5ad24c5	faster viz to_program [pr] (#10410 ) * faster viz to_program [pr] * Callable	2025-05-19 12:27:49 +03:00
qazal	cc8dda1d75	move multi_map to grouper rewrite pass (#10409 ) * move multi_map to grouper rewrite pass * delete that	2025-05-19 10:44:06 +03:00
George Hotz	b06291077c	no amdgpu kernel driver (#10408 ) * no amdgpu kernel driver * don't test hip * lower req	2025-05-18 20:52:39 -07:00
George Hotz	4b1f1a47bb	hotfix: allow ModuleNotFoundError in metal llvm import	2025-05-18 20:46:31 -07:00
chenyu	485e80da69	run_and_time for resnet ci (#10405 )	2025-05-18 23:39:57 -04:00
qazal	d1eeb19437	count viz javascript in lines (#10403 ) * count viz javascript in lines * don't count } * it's javascript * share with autogen	2025-05-18 19:34:00 -07:00
qazal	260d194523	merge insert_fuse and do_fuse [pr] (#10406 )	2025-05-19 04:44:36 +03:00
uuuvn	33cf33902a	Slightly less slow remote copyin (#10404 ) bytes concat is slow, don't do it if data is already present in self._h also don't cast memoryview into bytes (copy, +100ms) before it's needed this mitigates shard copying before shrink master: ``` * REMOTE 6 copy 1073.74M, REMOTE <- METAL arg 2 mem 2.15 GB tm 806.84ms/ 829.61ms ( 0.00 GFLOPS 1.3\|1.3 GB/s) * REMOTE: 7 copy 1073.74M, REMOTE: <- METAL arg 2 mem 3.22 GB tm 797.41ms/ 1627.02ms ( 0.00 GFLOPS 1.3\|1.3 GB/s) * REMOTE: 8 copy 1073.74M, REMOTE: <- METAL arg 2 mem 4.29 GB tm 677.89ms/ 2304.91ms ( 0.00 GFLOPS 1.6\|1.6 GB/s) * REMOTE: 9 copy 1073.74M, REMOTE: <- METAL arg 2 mem 5.37 GB tm 659.81ms/ 2964.72ms ( 0.00 GFLOPS 1.6\|1.6 GB/s) * REMOTE: 10 copy 1073.74M, REMOTE: <- METAL arg 2 mem 6.44 GB tm 679.21ms/ 3643.93ms ( 0.00 GFLOPS 1.6\|1.6 GB/s) * REMOTE: 11 copy 1073.74M, REMOTE: <- METAL arg 2 mem 7.52 GB tm 673.90ms/ 4317.83ms ``` this: ``` * REMOTE 6 copy 1073.74M, REMOTE <- METAL arg 2 mem 2.15 GB tm 867.06ms/ 895.58ms ( 0.00 GFLOPS 1.2\|1.2 GB/s) * REMOTE: 7 copy 1073.74M, REMOTE: <- METAL arg 2 mem 3.22 GB tm 433.35ms/ 1328.93ms ( 0.00 GFLOPS 2.5\|2.5 GB/s) * REMOTE: 8 copy 1073.74M, REMOTE: <- METAL arg 2 mem 4.29 GB tm 433.19ms/ 1762.12ms ( 0.00 GFLOPS 2.5\|2.5 GB/s) * REMOTE: 9 copy 1073.74M, REMOTE: <- METAL arg 2 mem 5.37 GB tm 432.71ms/ 2194.83ms ( 0.00 GFLOPS 2.5\|2.5 GB/s) * REMOTE: 10 copy 1073.74M, REMOTE: <- METAL arg 2 mem 6.44 GB tm 433.68ms/ 2628.51ms ( 0.00 GFLOPS 2.5\|2.5 GB/s) * REMOTE: 11 copy 1073.74M, REMOTE: <- METAL arg 2 mem 7.52 GB tm 432.91ms/ 3061.42ms ``` The 430ms is basically all sha256 time.	2025-05-18 16:20:43 -07:00
qazal	e55ee28b29	little smaller viz/worker.js [pr] (#10402 )	2025-05-18 23:44:46 +03:00
qazal	8a6fb37560	move viz /prof to extra [pr] (#10401 )	2025-05-18 23:25:59 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
uuuvn	0f825e12f2	Remote fixedvars (#10371 ) * amd mockgpu graph support For testing remote graph stuff (prompted by #10371) in ci * Remote fixedvars Somehow none of existing tests failed when fixedvars were added, looking what to add as an regression test for this --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-18 09:57:13 -07:00
uuuvn	27c12be471	amd mockgpu graph support (#10385 ) For testing remote graph stuff (prompted by #10371) in ci	2025-05-18 09:43:16 -07:00
George Hotz	a3308e145d	hotfix: remote print -> DEBUG=3	2025-05-18 09:09:04 -07:00
qazal	04b23087d8	grouper tests from fuse_arange_default [pr] (#10394 )	2025-05-18 18:42:43 +03:00
qazal	17f0f5e764	add v_rcp_f32_e64 to remu (#10393 ) * tests from the box * add v_rcp_f32_e64 to remu * f32::from_bits utils * v_cndmask_b32 tests	2025-05-18 17:08:21 +03:00
qazal	9e2089dcd4	don't raise Exception in process replay [pr] (#10392 ) * don't raise Exception in process replay [pr] * continue generating diffs unless [pr] is set, exit(1) otherwise * change * works	2025-05-18 11:23:23 +03:00
chenyu	9b4e2a75cd	symlink datasets in mlperf workflow (#10391 )	2025-05-18 03:26:05 -04:00
uuuvn	f20c5aac1f	Use `itertools.count` instead of manual increment in remote (#10389 ) Similar to how it's done with `UOp.unique_num`, looks a bit nicer	2025-05-18 00:15:37 -07:00
qazal	0294bfe507	simpler can_pad (#10364 ) * simpler can_pad [pr] * 3 kernels * tests * less kernels	2025-05-18 10:00:07 +03:00
George Hotz	c91f2c4580	use float32 for sgd momentum (#10387 )	2025-05-17 21:56:44 -07:00
George Hotz	305a3231c4	fix beam none if buf is optimized out (#10388 )	2025-05-17 21:50:33 -07:00
George Hotz	6f77b938d7	Move getbits tests into test_helpers (#10382 )	2025-05-17 17:04:00 -07:00
George Hotz	6ebfb505e9	docs: fix crossentropy name (#10377 )	2025-05-17 16:39:14 -07:00
George Hotz	0b733ba75e	multi device training with GPT2 [pr] (#10375 ) * multi device training with GPT2 [pr] * Update grouper.py	2025-05-17 15:33:56 -07:00
George Hotz	6ec88d94df	add tests for multi ram usage [pr] (#10376 )	2025-05-17 15:33:40 -07:00
uuuvn	5a18eab908	Fix __del__ in remote program (#10372 ) Similar to #10341, broke after hypotesis unpin	2025-05-17 21:29:44 +03:00
वेदांत	2453d99050	rms matching pytorch implementation (#10319 ) * rms matching pytorch implementation * pre commit fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-17 08:23:11 -07:00
nimlgen	da2b1834b4	hotfix: metal graph var vals (#10370 )	2025-05-17 17:22:55 +03:00
qazal	e054b53a75	kernel count tests for pad [pr] (#10369 ) * kernel count tests for pads * handcoded rand one kernel * comment * prerealize device rng counter * test_rand_handcoded generates /0 * remove track_rewrites	2025-05-17 17:20:46 +03:00
nimlgen	90c4bb10c0	fixedvars in all graphs (#10365 ) * cuda fixedvars * metal: fixevars * f * ups * count fixedvars	2025-05-17 16:18:52 +03:00
chenyu	efa8dfe7fb	test cron job to run resnet (#10368 )	2025-05-17 08:57:02 -04:00
uuuvn	2c706d363e	Remote higher timeout and overridable via `REMOTE_TIMEOUT` (#10367 ) sometimes a minute is not enough, 5 minutes should be but if it isn't for some huge workload it can be overridden	2025-05-17 15:30:49 +03:00
nimlgen	4fa1837916	metal: do not require icb fix on m3+ (#10366 )	2025-05-17 15:30:40 +03:00
Xingyu	286b0f4051	Add equal function implementation and corresponding test (#10351 ) - Implemented a new function `equal` in the torch backend to compare two tensors for equality. - Added unit tests for the `equal` function to verify its correctness with different tensor inputs.	2025-05-16 23:39:49 -07:00
George Hotz	e13f2a3092	multi is O(1) (#10183 ) * multi is O(1) * allreduce * no new uops needed * junk * something * simple * that's really what i want * closer * inject _device_num * pretty print * cleanups * this * early dnum * ops allreduce is good * ish * device is the tuple and this is fine * simpler * progress * copy_multi * work * more tests * more tests pass * work * no None axis * tests * no none multi * type fixes * pre commit passes * lil * remove this * mlperf dataloader on mac * that test was wrong * unbind * support DEBUG=2 * realize * only unbind bound vars * don't include fixedvars * graph test * one test * fixedvars in hcq * new ring reduce * ring reduce * simpler ring * mselect * mselect doesn't work * Revert "mselect doesn't work" This reverts commit `c78b77bd7d`. * Revert "mselect" This reverts commit `bb2e430ac3`. * simpler * fixups * no optional * fix jit * move things around * cleanup multi * simpler multi * simpler reshape	2025-05-16 23:14:23 -07:00
George Hotz	e1a40e8040	add hcq fixedvars support [pr] (#10356 ) * add hcq fixedvars support [pr] * different test * fixedvars are only for comp_queues * fix hcq varvals	2025-05-16 22:05:53 -07:00
George Hotz	11b5895c85	hotfix: schedule timing in tensor.py	2025-05-16 20:10:32 -07:00
uuuvn	64409a8bda	Remote beam (#10357 ) * Use renderer properties instead of `.device` * Remote beam	2025-05-16 18:59:22 -07:00
George Hotz	7cc35a031b	don't use UOp.multi in Tensor.rand (#10362 )	2025-05-16 16:09:36 -07:00
George Hotz	7703dbef99	view substitute [pr] (#10360 )	2025-05-16 15:08:24 -07:00

1 2 3 4 5 ...

8888 Commits