tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
qazal	9e2089dcd4	don't raise Exception in process replay [pr] (#10392 ) * don't raise Exception in process replay [pr] * continue generating diffs unless [pr] is set, exit(1) otherwise * change * works	2025-05-18 11:23:23 +03:00
chenyu	9b4e2a75cd	symlink datasets in mlperf workflow (#10391 )	2025-05-18 03:26:05 -04:00
uuuvn	f20c5aac1f	Use `itertools.count` instead of manual increment in remote (#10389 ) Similar to how it's done with `UOp.unique_num`, looks a bit nicer	2025-05-18 00:15:37 -07:00
qazal	0294bfe507	simpler can_pad (#10364 ) * simpler can_pad [pr] * 3 kernels * tests * less kernels	2025-05-18 10:00:07 +03:00
George Hotz	c91f2c4580	use float32 for sgd momentum (#10387 )	2025-05-17 21:56:44 -07:00
George Hotz	305a3231c4	fix beam none if buf is optimized out (#10388 )	2025-05-17 21:50:33 -07:00
George Hotz	6f77b938d7	Move getbits tests into test_helpers (#10382 )	2025-05-17 17:04:00 -07:00
George Hotz	6ebfb505e9	docs: fix crossentropy name (#10377 )	2025-05-17 16:39:14 -07:00
George Hotz	0b733ba75e	multi device training with GPT2 [pr] (#10375 ) * multi device training with GPT2 [pr] * Update grouper.py	2025-05-17 15:33:56 -07:00
George Hotz	6ec88d94df	add tests for multi ram usage [pr] (#10376 )	2025-05-17 15:33:40 -07:00
uuuvn	5a18eab908	Fix __del__ in remote program (#10372 ) Similar to #10341, broke after hypotesis unpin	2025-05-17 21:29:44 +03:00
वेदांत	2453d99050	rms matching pytorch implementation (#10319 ) * rms matching pytorch implementation * pre commit fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-17 08:23:11 -07:00
nimlgen	da2b1834b4	hotfix: metal graph var vals (#10370 )	2025-05-17 17:22:55 +03:00
qazal	e054b53a75	kernel count tests for pad [pr] (#10369 ) * kernel count tests for pads * handcoded rand one kernel * comment * prerealize device rng counter * test_rand_handcoded generates /0 * remove track_rewrites	2025-05-17 17:20:46 +03:00
nimlgen	90c4bb10c0	fixedvars in all graphs (#10365 ) * cuda fixedvars * metal: fixevars * f * ups * count fixedvars	2025-05-17 16:18:52 +03:00
chenyu	efa8dfe7fb	test cron job to run resnet (#10368 )	2025-05-17 08:57:02 -04:00
uuuvn	2c706d363e	Remote higher timeout and overridable via `REMOTE_TIMEOUT` (#10367 ) sometimes a minute is not enough, 5 minutes should be but if it isn't for some huge workload it can be overridden	2025-05-17 15:30:49 +03:00
nimlgen	4fa1837916	metal: do not require icb fix on m3+ (#10366 )	2025-05-17 15:30:40 +03:00
Xingyu	286b0f4051	Add equal function implementation and corresponding test (#10351 ) - Implemented a new function `equal` in the torch backend to compare two tensors for equality. - Added unit tests for the `equal` function to verify its correctness with different tensor inputs.	2025-05-16 23:39:49 -07:00
George Hotz	e13f2a3092	multi is O(1) (#10183 ) * multi is O(1) * allreduce * no new uops needed * junk * something * simple * that's really what i want * closer * inject _device_num * pretty print * cleanups * this * early dnum * ops allreduce is good * ish * device is the tuple and this is fine * simpler * progress * copy_multi * work * more tests * more tests pass * work * no None axis * tests * no none multi * type fixes * pre commit passes * lil * remove this * mlperf dataloader on mac * that test was wrong * unbind * support DEBUG=2 * realize * only unbind bound vars * don't include fixedvars * graph test * one test * fixedvars in hcq * new ring reduce * ring reduce * simpler ring * mselect * mselect doesn't work * Revert "mselect doesn't work" This reverts commit `c78b77bd7d`. * Revert "mselect" This reverts commit `bb2e430ac3`. * simpler * fixups * no optional * fix jit * move things around * cleanup multi * simpler multi * simpler reshape	2025-05-16 23:14:23 -07:00
George Hotz	e1a40e8040	add hcq fixedvars support [pr] (#10356 ) * add hcq fixedvars support [pr] * different test * fixedvars are only for comp_queues * fix hcq varvals	2025-05-16 22:05:53 -07:00
George Hotz	11b5895c85	hotfix: schedule timing in tensor.py	2025-05-16 20:10:32 -07:00
uuuvn	64409a8bda	Remote beam (#10357 ) * Use renderer properties instead of `.device` * Remote beam	2025-05-16 18:59:22 -07:00
George Hotz	7cc35a031b	don't use UOp.multi in Tensor.rand (#10362 )	2025-05-16 16:09:36 -07:00
George Hotz	7703dbef99	view substitute [pr] (#10360 )	2025-05-16 15:08:24 -07:00
Elnur Rakhmatullin	de2b323d97	Fixed a typo in "simplify" (#10358 )	2025-05-16 14:45:14 -07:00
Harald Schäfer	ee5258328a	You never want multiple backends (#10354 )	2025-05-16 13:10:39 -07:00
George Hotz	876d2275a1	changes from new multi (#10353 ) * changes from new multi * revert hcq change	2025-05-16 13:07:29 -07:00
wozeparrot	66e00c04dd	fix: skip kernel timing tests on ci cuda (#10348 )	2025-05-16 11:48:06 -07:00
Ignacio Sica	a54fd745c3	simpler barrier match in remu (#10339 ) * s_barrier * remove s_barrier from syncs	2025-05-16 14:40:58 +03:00
qazal	e9e5b54e43	grouper cleanups and merge with insert_kernels [pr] (#10349 ) * grouper cleanups and merge with insert_kernels [pr] * remove that	2025-05-16 14:39:56 +03:00
b1tg	caded2f413	llvm diagnostic error (#10267 ) * llvm diagnostic info * use decorator * better error reporting * fix mypy * collect all diag msgs * test diag error --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-16 02:03:20 -04:00
George Hotz	a4a25720b2	add test_multitensor_jit_input [pr] (#10347 )	2025-05-15 20:47:57 -07:00
chenyu	c798f2f427	brew --quiet to suppress already installed warnings (#10346 ) example https://github.com/tinygrad/tinygrad/actions/runs/15057000247	2025-05-15 23:31:18 -04:00
wozeparrot	12a1ccc680	clean: double import (#10345 )	2025-05-15 20:15:09 -07:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
wozeparrot	f59ecf2116	fix: mockgpu cuda timing (#10343 )	2025-05-15 14:14:14 -07:00
nimlgen	a825608dc2	hcq: fix progs' __del__ when shutdown (#10341 ) * debug ci * better? * and mute this? * revrt that	2025-05-15 23:26:48 +03:00
Ignacio Sica	47b3055fe2	set fail-fast behavior (#10336 )	2025-05-15 11:24:45 -07:00
uuuvn	c2bf2c6bb0	Remote offset (#10311 ) For memory savings from memory planner. Also for some reason it makes hlb cifar on mac noticeably faster. master: ``` 3 210.12 ms run, 4.34 ms python, 205.78 ms REMOTE, 2075.90 loss, 0.002698 LR, 2.07 GB used, 1558.41 GFLOPS, 327.45 GOPS 4 210.40 ms run, 4.33 ms python, 206.07 ms REMOTE, 2481.94 loss, 0.002262 LR, 2.07 GB used, 1556.34 GFLOPS, 327.45 GOPS 5 188.08 ms run, 4.41 ms python, 183.67 ms REMOTE, 1967.49 loss, 0.001827 LR, 2.07 GB used, 1741.00 GFLOPS, 327.45 GOPS 6 211.19 ms run, 4.26 ms python, 206.93 ms REMOTE, 1511.62 loss, 0.001392 LR, 2.07 GB used, 1550.51 GFLOPS, 327.45 GOPS ``` this: ``` 3 189.05 ms run, 4.50 ms python, 184.55 ms REMOTE, 2075.90 loss, 0.002698 LR, 1.60 GB used, 1732.08 GFLOPS, 327.45 GOPS 4 187.81 ms run, 4.11 ms python, 183.71 ms REMOTE, 2481.94 loss, 0.002262 LR, 1.60 GB used, 1743.49 GFLOPS, 327.45 GOPS 5 186.70 ms run, 4.09 ms python, 182.62 ms REMOTE, 1967.49 loss, 0.001827 LR, 1.60 GB used, 1753.89 GFLOPS, 327.45 GOPS 6 187.18 ms run, 4.06 ms python, 183.12 ms REMOTE, 1511.62 loss, 0.001392 LR, 1.60 GB used, 1749.36 GFLOPS, 327.45 GOPS ``` (`PYTHONPATH=. REMOTE=1 REMOTEDEV=METAL BS=256 STEPS=10 python examples/hlb_cifar10.py`) Clouldn't reliably reproduce the faster thing on tinybox though.	2025-05-15 11:20:01 -07:00
Ignacio Sica	3c453e96a9	add ds_load_b96 and ds_store_b96 instructions (#10338 )	2025-05-15 18:11:08 +03:00
qazal	be8202b293	add s_abs_i32 instruction to remu (#10334 )	2025-05-15 16:47:58 +03:00
nimlgen	5efbe1c947	print offset only for subbuf (#10332 )	2025-05-15 15:35:19 +03:00
qazal	7cfe367c07	failing test for slow embedding kernel with FUSE_ARANGE=1 [pr] (#10330 )	2025-05-15 14:58:11 +03:00
nimlgen	5f03688280	usbgpu: remove max_read_len (#10328 )	2025-05-15 14:49:58 +03:00
qazal	27b3dbe67e	remove FUSE_ARANGE_UINT [pr] (#10324 )	2025-05-15 14:39:54 +03:00
qazal	0a45cd0cbe	grouper: merge views in fuse elementwise (#10325 ) * grouper: merge views in fuse elementwise * with gradient api	2025-05-15 13:17:09 +03:00
qazal	89d8d5b25e	add dims check in FUSE_ARANGE (#10323 )	2025-05-15 11:33:21 +03:00
qazal	8fad0f0124	grouper: check for unsafe PAD in FUSE (#10322 )	2025-05-15 10:53:44 +03:00
chenyu	f008e5f233	test_dtype_alu should cast bf16 input (#10320 ) when testing alu for bfloat16, it should cast inputs to bfloat16 first, otherwise numpy has both errors from input and errors from alu which is more inaccurate	2025-05-15 01:11:39 -04:00

1 2 3 4 5 ...

8863 Commits