tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-27 07:48:07 -05:00

Author	SHA1	Message	Date
chenyu	5358b0904b	update uop_given_valid if a node becomes const (#9604 ) * update uop_given_valid if a node becomes const * cleanup	2025-03-27 14:57:46 -04:00
chenyu	a187dfd3df	bert BEAM_UOPS_MAX 3000->4000 (#9603 ) more stable for the final step time green 410ms (master) -> 397ms (BEAM=4) -> 392ms (this) red 561ms (master) -> 550ms (this)	2025-03-27 11:58:47 -04:00
qazal	088a677e25	rescale to fit viz graph [pr] (#9599 ) * zoom to fit the graph in viz [pr] * always on screen fit graph * space key recenters	2025-03-27 23:33:51 +08:00
nimlgen	3737821b9e	prepare for clang graph (#9600 ) * prepare for clang graph * emu * ops * ops2 * better type * fix	2025-03-27 20:09:37 +07:00
qazal	bf94924d5a	fix viz with nested graph_rewrite (#9595 )	2025-03-27 13:14:28 +08:00
qazal	c011751b41	statically define viz arrow heads (#9594 )	2025-03-27 12:22:04 +08:00
qazal	0877497bad	hotfix: use captured uops in viz render [pr] (#9593 ) * hotfix: use captured uops in viz render [pr] * better error	2025-03-27 11:52:12 +08:00
qazal	e5ff7b23d7	refactor to @track_matches + add failing test_nested_rewrite (#9592 ) * test_nested_rewrite * refactor to track_matches * positional arg	2025-03-27 11:11:56 +08:00
chenyu	62888614f6	lower bert eval bs to 24 (#9590 ) oom during eval	2025-03-26 21:25:23 -04:00
nimlgen	dc9da1d917	memplan into one buffer (#9526 ) * new memplanner * new should works * fix * VALIDATE_MEMORY_PLANNER * hm? * ugh * fix alignment * fix2 * rm * tiny fixes * test * comments and fixes * fix2 * liiiinetr * t * fix	2025-03-27 01:46:50 +07:00
qazal	8b717c345c	cache viz worker at launch (#9589 )	2025-03-27 01:10:02 +08:00
George Hotz	d62ced8981	symbolic -> symbolic_flat (#9588 )	2025-03-26 23:34:43 +08:00
George Hotz	8aaa5e1ec5	generate the individual indexes (#9587 )	2025-03-26 22:32:06 +08:00
George Hotz	5c6cd884e3	multiple simplifies is faster [pr] (#9586 ) * multiple simplifies is faster [pr] * cleanup * cleanup	2025-03-26 21:42:52 +08:00
George Hotz	1e6e75e39a	little changes from dsp branch (#9582 ) * little changes from dsp branch * not that one * need the where * Revert "need the where" This reverts commit `140f89c878`.	2025-03-26 20:01:21 +08:00
nimlgen	e88a640ca5	fix _access_resources for offset buffers (#9580 ) * fix _access_resources for offset buffers * test	2025-03-26 18:42:43 +07:00
Andrey	7b865ed03d	use tuple in isinstance for type checking (#9583 )	2025-03-26 19:36:48 +08:00
George Hotz	9115ce8860	linearizer fixups from DSP branch (#9581 )	2025-03-26 18:28:15 +08:00
qazal	e799df537e	prep viz UI cleanup for grid scales (#9579 ) * less ways to make a button * move collapse out * work * do not create extra resizers * better * ul * safari	2025-03-26 17:48:15 +08:00
nimlgen	ccbcdca473	add memplanner tests (#9577 )	2025-03-26 10:59:39 +07:00
qazal	c03dadfcb9	add TORCHVIZ=1 to beautiful_mnist_torch (#9576 )	2025-03-26 11:17:08 +08:00
qazal	93bcb974c5	select torch device in examples/beautiful_mnist_torch.py (#9575 )	2025-03-26 11:01:25 +08:00
uuuvn	2c32126fc8	am: AMRegister refactor (#9572 )	2025-03-26 00:52:40 +07:00
chenyu	cddd750d68	add a failed test case for jit/nojit rand [pr] (#9574 ) currently adding jit produced different rand values	2025-03-25 13:32:44 -04:00
nimlgen	4cf2b68ca8	am_smi: fix init for newer versions (#9559 )	2025-03-25 23:48:05 +07:00
qazal	a6a5c0aec5	add NULL=1 backend (#9573 ) * add NULL=1 backend * NullAllocator * line * metadata should still work * it shouldn't have memory usage * Revert "it shouldn't have memory usage" This reverts commit `a9080fdd43`. * back * null flops	2025-03-25 22:20:52 +08:00
qazal	b60d9976b4	better yaxis formatting in viz memory graph (#9570 ) * better bytes format * pluralize * 1 less line	2025-03-25 16:50:22 +08:00
qazal	faf3b5b245	display kernel metadata in memory viz (#9569 ) * display kernel metadata in memory viz * fix that	2025-03-25 13:14:54 +08:00
qazal	52301fe68e	move Buffer refcount increment out of schedule.py (#9564 ) * move Buffer refcount increment out of schedule.py * add TestGC.test_assign_refcount * refcount refers to Ops.BUFFER UOps	2025-03-25 12:08:27 +08:00
qazal	262f5a2bd3	hotfix: replace link in viz/readme (#9568 )	2025-03-25 10:24:49 +08:00
chenyu	6427272bf6	minor update to rand [pr] (#9566 )	2025-03-24 18:49:50 -04:00
chenyu	b0e070e737	remove MOCKGPU workaround in rand (#9565 ) also `requires_grad_` to save a line	2025-03-24 17:49:45 -04:00
qazal	d7c754ce49	failing test for UOp buffer ref count (#9563 ) * failing test for UOp buffer ref count * lint	2025-03-25 00:10:48 +08:00
b1tg	f90001e1a6	amd llvm render (no_comgr prereq) (#9543 ) * amd llvm render * skip test_div_rounding_mode --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-24 22:50:51 +08:00
Priyank Patel	4f5e03bd60	better fix inplace detach (#9557 )	2025-03-24 22:50:28 +08:00
qazal	1c40873962	show buffer info in memory viz (#9562 )	2025-03-24 22:12:30 +08:00
qazal	efaee75656	start viz of memory usage (#9561 ) * start viz of memory usage * polygons/bars + use d3	2025-03-24 19:05:35 +08:00
qazal	1cfe6d02fe	refactor uop_to_json to return a dict [pr] (#9560 )	2025-03-24 16:38:17 +08:00
nimlgen	edf9e1bf8d	am: move out soc21 to a sep module (#9551 ) * am: soc module is not part of am * am: soc module is not part of am	2025-03-24 14:17:42 +07:00
George Hotz	74d98eafb8	add onnx frontend stub [pr] (#9558 )	2025-03-24 12:24:34 +08:00
George Hotz	de7d6cec3a	hotfix: DEBUG 5 prints the ast	2025-03-24 11:43:11 +08:00
chenyu	ba41076e94	update embedding test to not use dtypes.long [pr] (#9556 )	2025-03-23 21:33:38 -04:00
chenyu	c965f4c20b	update bert config (#9555 ) BEAM 4->5 for green, 2% faster use AMD driver instead of AM for red, 5% faster	2025-03-23 16:14:41 -04:00
chenyu	d734e24c01	minor WEBGPU_PATH cleanup [pr] (#9552 ) also mypy recognizes `sys.platform == 'win32'` but does not recognizes it if wrapped inside a helper...	2025-03-23 09:10:02 -04:00
Ahmed Harmouche	7ce7fe0574	Refactor webgpu_dawn lib finding (#9547 ) * Refactor webgpu_dawn lib finding * Fix ruff	2025-03-23 08:23:29 -04:00
uuuvn	c631c72f22	HCQ: Increment timeline signal before submitting (#9550 ) `AMDComputeQueue.__del__` frees `hw_page` which is safe because `AMDAllocator._free` does `self.dev.synchronize()` which is supposed to wait for execution of IB to finish, however that doesn't happen if AMDComputeQueue is dropped right after submit before timeline signal is incremented, which it is in most places leading to a race if .bind() is also used (required for multi-xcc because bug in mec fw treats all PACKET3_PRED_EXECs outside IBs as if they had EXEC_COUNT of zero).	2025-03-23 18:30:38 +07:00
nimlgen	d5667419af	am: move out pte creation logic (#9548 ) * am: move out pte creation logic * emu * ops	2025-03-23 18:29:10 +07:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
quortus	bdd44d4255	Fix DSP transcendentals (#9542 )	2025-03-22 11:08:18 +08:00
Ignacio Sica	eddafb84e5	Bugfix for `TC=3` (#9464 ) * wrong but uses less shared * for size 8 tc1 with devectorize in 0 loads into local before wmma and works * improvements over tc1 devectorize * fix tc=3 * works for handcoded tc opts * clean bugfix tc=3 * fix * revert changes	2025-03-21 16:43:42 -07:00

1 2 3 4 5 ...

8255 Commits