tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
nimlgen	d224d0ed7f	nv: fix fault info (#8587 ) * nv: fix fault info * and emu for amd * skip if not mock	2025-01-13 14:38:43 +03:00
qazal	586e730d32	use UOp.st for kernel reduce axes (#8499 ) * use UOp.st for kernel reduce axes [pr] * do not return dict	2025-01-13 06:24:11 -05:00
qazal	7562cc0399	better test for reduce swizzle + don't use double dtype [pr] (#8586 ) * better test_permute_rewrite * use float32	2025-01-13 05:02:21 -05:00
George Hotz	df59b072db	rename to top_down_rewrite [pr] (#8583 )	2025-01-12 18:36:38 -08:00
chenyu	994944920b	simpler batch_load_train_bert [pr] (#8582 ) don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step. https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview	2025-01-12 20:25:05 -05:00
George Hotz	05e5de6a91	ugh, remove that binary blob	2025-01-12 17:02:28 -08:00
George Hotz	4ac4c1415a	free intermediate buffers in the jit [pr] (#8581 ) * free intermediate buffers in the jit [pr] * intermediates_freed * deallocate if not allocated * self._first_run is simpler	2025-01-12 15:41:41 -08:00
George Hotz	d817dc10db	start on test rewrite map [pr] (#8432 ) * start on test rewrite map [pr] * chatgpt writes dumb tests * comment out failing * fix that test * fix gc issue * oh, frame 2 * remove uop mutability * map is only the map * simplier + more tests * test tiny passes * tests that need to pass * parent test passes * child test passes * remove uop mutability [pr] * test fixups * most tests pass * more tests pass * lil test fixups * them too * fix test * unneeded * err, that * fix test_hcq * fix test failures * fix that test * tensor universe * does this pass test * Revert "does this pass test" This reverts commit `ed516b3169`. * Revert "tensor universe" This reverts commit `c21301852a`. * test_mutate_add passes * this can pass * Revert "Merge remote-tracking branch 'origin/no_uop_mutability' into test_rewrite_map" This reverts commit `657822dcdc`, reversing changes made to `2a126c145b`. * Revert "test_mutate_add passes" This reverts commit `ab4fc4c78e`. * correct enough * remove test_rewrite_map_schedule.py * viz * uops are immutable --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-12 13:13:51 -05:00
qazal	2f71a00236	remove PYTHONPATH=. from mypy ci [pr] (#8578 )	2025-01-12 09:52:03 -08:00
qazal	cde18fddce	fix DEBUG=2 output for copy runners [pr] (#8579 ) * fix DEBUG=2 output for copy runners [pr] * itemsize is constant	2025-01-12 12:03:01 -05:00
eliotgolding	867004fbeb	use unravel in views_to_indexed_uops [pr] (#8560 ) * use unravel in shape * make process replay work * earlier View.minify() * fix * fix tests * mypy * get rid of early minify * fix * linter * clean and add test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-12 10:25:55 -05:00
nimlgen	38b5ac4d4a	mypy for mockgpu/cuda & dsp/run (#8575 )	2025-01-12 18:25:39 +03:00
chenyu	def90b22f6	EVAL_BS=36 for bert [pr] (#8576 ) 3X faster eval compared to BS=6. green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview	2025-01-12 09:43:56 -05:00
qazal	ae241e96db	fix half4 on qcom and gpu (#8573 ) * add test_setitem_half * this fixes comma benchmark	2025-01-12 06:23:05 -05:00
qazal	cff1ee9038	add SINK folding from the tensor_map branch [pr] (#8562 ) * delete is_constant from the scheduler * add sink folding * always give BUFFER uops Buffers [pr] * spec for view, var (bind) and const * add test_buffer_only_after_realize * work * 3 lines * more work	2025-01-12 03:39:34 -05:00
qazal	87cbff3ac0	always give BUFFER uops Buffers [pr] (#8572 ) * always give BUFFER uops Buffers [pr] * add test_buffer_only_after_realize	2025-01-11 23:17:09 +02:00
qazal	98c9e23560	remove global PYTHONPATH setting in CI (test.yml) [pr] (#8568 ) * remove global PYTHONPATH setting in CI [pr] * only run mypy in tinygrad/ * still needed for benchmarks	2025-01-11 12:47:50 -05:00
geohotstan	815c505e1d	fixes from adapting tvm tests (#8570 )	2025-01-11 11:38:36 -05:00
qazal	79738d768c	do not require PYTHONPATH=. for process replay [pr] (#8567 )	2025-01-11 09:45:34 -05:00
qazal	a70d1bf439	move print_diff to process replay [pr] (#8566 ) * move print_diff to process replay [pr] * ruff rightfully complians	2025-01-11 09:28:45 -05:00
nimlgen	2f0856c1e2	qcom: use hwinterface (#8565 ) * qcom: use hwinterface * ops * not needed anymore	2025-01-11 17:11:23 +03:00
qazal	60503c8621	use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564 )	2025-01-11 06:03:48 -05:00
nimlgen	61665a63c9	am logs to debug2 (#8563 )	2025-01-11 13:33:18 +03:00
George Hotz	c7acd40574	more aggressive onnx const creation [pr] (#8561 )	2025-01-10 17:38:32 -08:00
ignaciosica	8891495996	minor arg spec check on wmma (#8525 )	2025-01-10 15:42:56 -08:00
chenyu	d09897c2aa	allow double copy [pr] (#8559 ) fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark	2025-01-10 18:21:01 -05:00
George Hotz	70fa65cd95	viz fixups + scheduler option [pr] (#8557 )	2025-01-10 15:09:31 -08:00
nimlgen	f457cb64d6	am: do not reload fw each run (#8466 ) * am do not reload fw each run * works * comment this * clean + comment * warn message * linter * move out pci en master * useless * more correct * oops * oops	2025-01-10 23:33:38 +03:00
nimlgen	337328e409	am: fini gpu after use (#8556 ) * am: fini gpu after use * mypy	2025-01-10 21:02:34 +03:00
chenyu	6a7f971fa0	hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] (#8553 )	2025-01-10 12:57:44 -05:00
George Hotz	cd4edc5206	hotfix: pylint ignores runtime for speed	2025-01-10 09:07:18 -08:00
nimlgen	92b59c9b7a	test_hcq limits for mockgpu not (only) ci (#8555 ) * test_hcq limits for mockgpu not (only) ci * rm CI	2025-01-10 17:37:28 +03:00
George Hotz	9833fe83d8	more work on onnx imagenet [pr] (#8552 ) * more work on onnx imagenet [pr] * working quantization * static quant * benchmark onnx 0 dim	2025-01-09 20:28:18 -08:00
George Hotz	e172b759f0	more working (#8550 )	2025-01-09 18:40:08 -08:00
chenyu	2cbb34535c	simpler allreduce script [pr] (#8551 ) time everything on tensor level and get time from GlobalCounters.time_sum_s	2025-01-09 21:38:13 -05:00
chenyu	23c56817d8	update and clean up allreduce script [pr] (#8549 ) make `run` to able to run with ring only	2025-01-09 19:35:28 -05:00
George Hotz	5720871903	onnx consts are const [pr] (#8548 )	2025-01-09 16:09:22 -08:00
chenyu	88661cd96f	fix checking DiskBuffer is opened [pr] (#8547 ) `assert self.device.mem is not None` did not assert because `.mem` triggers AttributeError first	2025-01-09 18:58:36 -05:00
George Hotz	62447c253d	viz cleanups [pr] (#8498 ) * viz cleanups [pr] * Update serve.py	2025-01-09 15:46:48 -08:00
geohotstan	299d333806	Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx (#8478 ) * QLinearEverything * ok ort verify passes * this should be int instead * cast to int then char to do wraparound * cleaner * move contrib ops to microsoft ops --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-09 15:08:53 -08:00
qazal	2fd068ffc0	delete empty op (#8544 ) * simple delete EMPTY op * there's no schedule for empty	2025-01-09 14:10:15 -05:00
qazal	f6eb0574f2	start tests for putting the tensor graph in a single kernel [pr] (#8542 ) * start tests for putting the tensor graph in a single kernel [pr] * parallel actually * better view_left test * test a softmax * put all that in sym	2025-01-09 13:33:21 -05:00
qazal	83a8217cbf	hotfix: TRACK_MATCH_STATS=2 should not launch viz [pr] (#8543 )	2025-01-09 11:10:15 -05:00
qazal	1efb1188d8	support pickling a realized BUFFER uop [pr] (#8541 ) * try 2 at this diff * process replay * delete uops from buffer * free buffers * test_pickle_buffer_uop	2025-01-09 06:37:22 -05:00
qazal	7595352dfc	refactor buffer_view op structure [pr] (#8540 ) * refactor buffer_view op [pr] * only empty now * same st * empty shape is fine	2025-01-09 03:07:46 -05:00
eliotgolding	4c5c32ff5f	Small bug in _reshape_mask (#8538 )	2025-01-08 22:11:24 -05:00
nimlgen	aa3d612df2	add script to install amd mockgpu on macOS (#8536 ) * upload artifact every time * hm * sh script * hm * hm2 * hm2 * hm2 * no sudo * def paths * small comments * text * try auth for bigger limits	2025-01-09 01:29:25 +03:00
nimlgen	31fcfe764d	adjust hcq test for ci macos (#8534 )	2025-01-08 16:18:31 +03:00
qazal	49abe6d3a6	little more compact tensor_uop_spec [pr] (#8533 ) * little more compact tensor_uop_spec [pr] * space * fix	2025-01-08 08:01:53 -05:00
patrini32	21c7d7c71a	MOCKGPU amd test on OSX (#8505 ) * add tests * Refactor * cache only amd/comgr/build (saves a lot of space) * fix * silence warning and add check for cache hit before installing cmake * run only pytest * use actions/cache * lower timeout-minutes and add Device.DEFAULT step * add nvidia to Device.DEFAULT check * typo * fix * Check only for amd and run only 2 test	2025-01-08 14:27:56 +03:00

... 71 72 73 74 75 ...

11094 Commits