tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 22:38:16 -05:00

Author	SHA1	Message	Date
eliotgolding	867004fbeb	use unravel in views_to_indexed_uops [pr] (#8560 ) * use unravel in shape * make process replay work * earlier View.minify() * fix * fix tests * mypy * get rid of early minify * fix * linter * clean and add test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-12 10:25:55 -05:00
nimlgen	38b5ac4d4a	mypy for mockgpu/cuda & dsp/run (#8575 )	2025-01-12 18:25:39 +03:00
chenyu	def90b22f6	EVAL_BS=36 for bert [pr] (#8576 ) 3X faster eval compared to BS=6. green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview	2025-01-12 09:43:56 -05:00
qazal	ae241e96db	fix half4 on qcom and gpu (#8573 ) * add test_setitem_half * this fixes comma benchmark	2025-01-12 06:23:05 -05:00
qazal	cff1ee9038	add SINK folding from the tensor_map branch [pr] (#8562 ) * delete is_constant from the scheduler * add sink folding * always give BUFFER uops Buffers [pr] * spec for view, var (bind) and const * add test_buffer_only_after_realize * work * 3 lines * more work	2025-01-12 03:39:34 -05:00
qazal	87cbff3ac0	always give BUFFER uops Buffers [pr] (#8572 ) * always give BUFFER uops Buffers [pr] * add test_buffer_only_after_realize	2025-01-11 23:17:09 +02:00
qazal	98c9e23560	remove global PYTHONPATH setting in CI (test.yml) [pr] (#8568 ) * remove global PYTHONPATH setting in CI [pr] * only run mypy in tinygrad/ * still needed for benchmarks	2025-01-11 12:47:50 -05:00
geohotstan	815c505e1d	fixes from adapting tvm tests (#8570 )	2025-01-11 11:38:36 -05:00
qazal	79738d768c	do not require PYTHONPATH=. for process replay [pr] (#8567 )	2025-01-11 09:45:34 -05:00
qazal	a70d1bf439	move print_diff to process replay [pr] (#8566 ) * move print_diff to process replay [pr] * ruff rightfully complians	2025-01-11 09:28:45 -05:00
nimlgen	2f0856c1e2	qcom: use hwinterface (#8565 ) * qcom: use hwinterface * ops * not needed anymore	2025-01-11 17:11:23 +03:00
qazal	60503c8621	use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564 )	2025-01-11 06:03:48 -05:00
nimlgen	61665a63c9	am logs to debug2 (#8563 )	2025-01-11 13:33:18 +03:00
George Hotz	c7acd40574	more aggressive onnx const creation [pr] (#8561 )	2025-01-10 17:38:32 -08:00
ignaciosica	8891495996	minor arg spec check on wmma (#8525 )	2025-01-10 15:42:56 -08:00
chenyu	d09897c2aa	allow double copy [pr] (#8559 ) fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark	2025-01-10 18:21:01 -05:00
George Hotz	70fa65cd95	viz fixups + scheduler option [pr] (#8557 )	2025-01-10 15:09:31 -08:00
nimlgen	f457cb64d6	am: do not reload fw each run (#8466 ) * am do not reload fw each run * works * comment this * clean + comment * warn message * linter * move out pci en master * useless * more correct * oops * oops	2025-01-10 23:33:38 +03:00
nimlgen	337328e409	am: fini gpu after use (#8556 ) * am: fini gpu after use * mypy	2025-01-10 21:02:34 +03:00
chenyu	6a7f971fa0	hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] (#8553 )	2025-01-10 12:57:44 -05:00
George Hotz	cd4edc5206	hotfix: pylint ignores runtime for speed	2025-01-10 09:07:18 -08:00
nimlgen	92b59c9b7a	test_hcq limits for mockgpu not (only) ci (#8555 ) * test_hcq limits for mockgpu not (only) ci * rm CI	2025-01-10 17:37:28 +03:00
George Hotz	9833fe83d8	more work on onnx imagenet [pr] (#8552 ) * more work on onnx imagenet [pr] * working quantization * static quant * benchmark onnx 0 dim	2025-01-09 20:28:18 -08:00
George Hotz	e172b759f0	more working (#8550 )	2025-01-09 18:40:08 -08:00
chenyu	2cbb34535c	simpler allreduce script [pr] (#8551 ) time everything on tensor level and get time from GlobalCounters.time_sum_s	2025-01-09 21:38:13 -05:00
chenyu	23c56817d8	update and clean up allreduce script [pr] (#8549 ) make `run` to able to run with ring only	2025-01-09 19:35:28 -05:00
George Hotz	5720871903	onnx consts are const [pr] (#8548 )	2025-01-09 16:09:22 -08:00
chenyu	88661cd96f	fix checking DiskBuffer is opened [pr] (#8547 ) `assert self.device.mem is not None` did not assert because `.mem` triggers AttributeError first	2025-01-09 18:58:36 -05:00
George Hotz	62447c253d	viz cleanups [pr] (#8498 ) * viz cleanups [pr] * Update serve.py	2025-01-09 15:46:48 -08:00
geohotstan	299d333806	Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx (#8478 ) * QLinearEverything * ok ort verify passes * this should be int instead * cast to int then char to do wraparound * cleaner * move contrib ops to microsoft ops --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-09 15:08:53 -08:00
qazal	2fd068ffc0	delete empty op (#8544 ) * simple delete EMPTY op * there's no schedule for empty	2025-01-09 14:10:15 -05:00
qazal	f6eb0574f2	start tests for putting the tensor graph in a single kernel [pr] (#8542 ) * start tests for putting the tensor graph in a single kernel [pr] * parallel actually * better view_left test * test a softmax * put all that in sym	2025-01-09 13:33:21 -05:00
qazal	83a8217cbf	hotfix: TRACK_MATCH_STATS=2 should not launch viz [pr] (#8543 )	2025-01-09 11:10:15 -05:00
qazal	1efb1188d8	support pickling a realized BUFFER uop [pr] (#8541 ) * try 2 at this diff * process replay * delete uops from buffer * free buffers * test_pickle_buffer_uop	2025-01-09 06:37:22 -05:00
qazal	7595352dfc	refactor buffer_view op structure [pr] (#8540 ) * refactor buffer_view op [pr] * only empty now * same st * empty shape is fine	2025-01-09 03:07:46 -05:00
eliotgolding	4c5c32ff5f	Small bug in _reshape_mask (#8538 )	2025-01-08 22:11:24 -05:00
nimlgen	aa3d612df2	add script to install amd mockgpu on macOS (#8536 ) * upload artifact every time * hm * sh script * hm * hm2 * hm2 * hm2 * no sudo * def paths * small comments * text * try auth for bigger limits	2025-01-09 01:29:25 +03:00
nimlgen	31fcfe764d	adjust hcq test for ci macos (#8534 )	2025-01-08 16:18:31 +03:00
qazal	49abe6d3a6	little more compact tensor_uop_spec [pr] (#8533 ) * little more compact tensor_uop_spec [pr] * space * fix	2025-01-08 08:01:53 -05:00
patrini32	21c7d7c71a	MOCKGPU amd test on OSX (#8505 ) * add tests * Refactor * cache only amd/comgr/build (saves a lot of space) * fix * silence warning and add check for cache hit before installing cmake * run only pytest * use actions/cache * lower timeout-minutes and add Device.DEFAULT step * add nvidia to Device.DEFAULT check * typo * fix * Check only for amd and run only 2 test	2025-01-08 14:27:56 +03:00
nimlgen	2f530adb04	hwiface: close fd when valid (#8530 )	2025-01-08 10:43:59 +03:00
qazal	947de23cac	add VIEW(DEVICE) to tensor variable [pr] (#8529 ) * add VIEW(DEVICE) to tensor variable [pr] * bind 2 * restrict shapetracker * move var and bind closer * one less line	2025-01-08 01:39:42 -05:00
qazal	b22494b710	restrict tensor const ShapeTracker in spec [pr] (#8447 ) * restrict tensor const ShapeTracker in spec [pr] * pass sink srcs * reject if any of the specs disagree * deceive mypy * viz * default to float * just check the view * create_schedule is gone * test_verify_arg is flaky	2025-01-07 19:05:11 -05:00
patrini32	afef69a37d	MOCKGPU on mac os (#8520 ) * tweaks for macos * fix * fix * typo * remove nvidia changes * remove nv related changes * change address back	2025-01-07 20:27:43 +03:00
nimlgen	ab3ac2b58d	hw interface abstraction (#8524 ) * use HWInterface in autogen * mockgpu * HWInterface * more HWInterface * fix * fix * old code * fix * implicit field definition * add offset check to mockgpu too * refactor * forgot to pass flags + read rewrite * test * play with vfio * nv: this should be kept * try this * vfio * rm overwrite=True * linetr * do not reinit kfd * minor * mypy * mock * init them once --------- Co-authored-by: patrini32 <patrini23@proton.me>	2025-01-07 18:18:28 +03:00
qazal	0e97f807e0	test fixup prereqs for delete_buffer_view [pr] (#8523 )	2025-01-07 11:52:18 +02:00
chenyu	85a4397f27	fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522 ) * fix create_schedule_with_vars usage in allreduce benchmark [pr] because i didn't know how to use it... * increase time limit because tiny17 is slow	2025-01-07 01:30:01 -05:00
chenyu	0061dc7447	fix benchmark allreduce and add to ci [pr] (#8521 )	2025-01-07 00:37:59 -05:00
geohotstan	c69f459c96	Add checking variable dimension to onnx (#8518 ) * validate variable dims and fix buffer_parse to not use numpy * fix var_dim parsing * gah float16 * revert buffer_parse stuff * revert that revert * correct some err msges * add some more debug msgs I find helpful * tensor init noop * add an assert just for the sake of it. --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-07 00:30:35 -05:00
nimlgen	5cb9443ebb	PROFILE is enabled when VIZ is enabled (#8516 )	2025-01-06 19:47:16 +03:00

1 2 3 4 5 ...

7484 Commits