tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 22:08:08 -05:00

Author	SHA1	Message	Date
Francis Lata	76a03e950a	make kits19 dataset samples have small sizes (#8591 )	2025-01-14 08:27:45 -08:00
qazal	5aab2806f0	rename to test_tensor_uop + use upats for asserting [pr] (#8604 ) * rename to test_tensor_uop + use upats for asserting [pr] * fix pr	2025-01-14 05:09:56 -05:00
qazal	863abc7140	scheduling graph_rewrite prereqs for BLOCK in ASSIGN (#8598 ) * remove the BUF_LIMIT assert * skip the base one * work * work * good error * ok comment * shorter check	2025-01-14 03:01:59 -05:00
chenyu	d443e91d82	remove custom splits in Tensor.shard [pr] (#8602 ) towards even split only	2025-01-13 21:29:13 -05:00
chenyu	c4e33048c6	test Tensor.clone has a different lazydata [pr] (#8600 )	2025-01-13 20:13:44 -05:00
qazal	ae2229d727	assert kernel buffer limit at compile time [pr] (#8595 ) * remove the BUF_LIMIT assert * skip the base one	2025-01-13 16:32:07 -05:00
geohotstan	4abe631b56	fix onnx mobilenetv2-7-quantized.onnx (#8574 ) * is 67% considered fixed? * move test up * share function * add qgemm too * make sure qgemm comes out as int * actually that note is not right * remove qgemm (I did it wrong) and add it later lol.	2025-01-13 09:25:06 -08:00
George Hotz	d19c1c7f03	bump 75 -> 73 for test failure	2025-01-13 09:18:38 -08:00
nimlgen	d224d0ed7f	nv: fix fault info (#8587 ) * nv: fix fault info * and emu for amd * skip if not mock	2025-01-13 14:38:43 +03:00
qazal	586e730d32	use UOp.st for kernel reduce axes (#8499 ) * use UOp.st for kernel reduce axes [pr] * do not return dict	2025-01-13 06:24:11 -05:00
qazal	7562cc0399	better test for reduce swizzle + don't use double dtype [pr] (#8586 ) * better test_permute_rewrite * use float32	2025-01-13 05:02:21 -05:00
George Hotz	4ac4c1415a	free intermediate buffers in the jit [pr] (#8581 ) * free intermediate buffers in the jit [pr] * intermediates_freed * deallocate if not allocated * self._first_run is simpler	2025-01-12 15:41:41 -08:00
George Hotz	d817dc10db	start on test rewrite map [pr] (#8432 ) * start on test rewrite map [pr] * chatgpt writes dumb tests * comment out failing * fix that test * fix gc issue * oh, frame 2 * remove uop mutability * map is only the map * simplier + more tests * test tiny passes * tests that need to pass * parent test passes * child test passes * remove uop mutability [pr] * test fixups * most tests pass * more tests pass * lil test fixups * them too * fix test * unneeded * err, that * fix test_hcq * fix test failures * fix that test * tensor universe * does this pass test * Revert "does this pass test" This reverts commit `ed516b3169`. * Revert "tensor universe" This reverts commit `c21301852a`. * test_mutate_add passes * this can pass * Revert "Merge remote-tracking branch 'origin/no_uop_mutability' into test_rewrite_map" This reverts commit `657822dcdc`, reversing changes made to `2a126c145b`. * Revert "test_mutate_add passes" This reverts commit `ab4fc4c78e`. * correct enough * remove test_rewrite_map_schedule.py * viz * uops are immutable --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-12 13:13:51 -05:00
qazal	cde18fddce	fix DEBUG=2 output for copy runners [pr] (#8579 ) * fix DEBUG=2 output for copy runners [pr] * itemsize is constant	2025-01-12 12:03:01 -05:00
eliotgolding	867004fbeb	use unravel in views_to_indexed_uops [pr] (#8560 ) * use unravel in shape * make process replay work * earlier View.minify() * fix * fix tests * mypy * get rid of early minify * fix * linter * clean and add test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-12 10:25:55 -05:00
nimlgen	38b5ac4d4a	mypy for mockgpu/cuda & dsp/run (#8575 )	2025-01-12 18:25:39 +03:00
qazal	ae241e96db	fix half4 on qcom and gpu (#8573 ) * add test_setitem_half * this fixes comma benchmark	2025-01-12 06:23:05 -05:00
qazal	cff1ee9038	add SINK folding from the tensor_map branch [pr] (#8562 ) * delete is_constant from the scheduler * add sink folding * always give BUFFER uops Buffers [pr] * spec for view, var (bind) and const * add test_buffer_only_after_realize * work * 3 lines * more work	2025-01-12 03:39:34 -05:00
qazal	87cbff3ac0	always give BUFFER uops Buffers [pr] (#8572 ) * always give BUFFER uops Buffers [pr] * add test_buffer_only_after_realize	2025-01-11 23:17:09 +02:00
qazal	79738d768c	do not require PYTHONPATH=. for process replay [pr] (#8567 )	2025-01-11 09:45:34 -05:00
qazal	a70d1bf439	move print_diff to process replay [pr] (#8566 ) * move print_diff to process replay [pr] * ruff rightfully complians	2025-01-11 09:28:45 -05:00
qazal	60503c8621	use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564 )	2025-01-11 06:03:48 -05:00
chenyu	d09897c2aa	allow double copy [pr] (#8559 ) fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark	2025-01-10 18:21:01 -05:00
chenyu	6a7f971fa0	hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] (#8553 )	2025-01-10 12:57:44 -05:00
nimlgen	92b59c9b7a	test_hcq limits for mockgpu not (only) ci (#8555 ) * test_hcq limits for mockgpu not (only) ci * rm CI	2025-01-10 17:37:28 +03:00
George Hotz	9833fe83d8	more work on onnx imagenet [pr] (#8552 ) * more work on onnx imagenet [pr] * working quantization * static quant * benchmark onnx 0 dim	2025-01-09 20:28:18 -08:00
chenyu	2cbb34535c	simpler allreduce script [pr] (#8551 ) time everything on tensor level and get time from GlobalCounters.time_sum_s	2025-01-09 21:38:13 -05:00
chenyu	23c56817d8	update and clean up allreduce script [pr] (#8549 ) make `run` to able to run with ring only	2025-01-09 19:35:28 -05:00
geohotstan	299d333806	Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx (#8478 ) * QLinearEverything * ok ort verify passes * this should be int instead * cast to int then char to do wraparound * cleaner * move contrib ops to microsoft ops --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-09 15:08:53 -08:00
qazal	2fd068ffc0	delete empty op (#8544 ) * simple delete EMPTY op * there's no schedule for empty	2025-01-09 14:10:15 -05:00
qazal	f6eb0574f2	start tests for putting the tensor graph in a single kernel [pr] (#8542 ) * start tests for putting the tensor graph in a single kernel [pr] * parallel actually * better view_left test * test a softmax * put all that in sym	2025-01-09 13:33:21 -05:00
qazal	1efb1188d8	support pickling a realized BUFFER uop [pr] (#8541 ) * try 2 at this diff * process replay * delete uops from buffer * free buffers * test_pickle_buffer_uop	2025-01-09 06:37:22 -05:00
eliotgolding	4c5c32ff5f	Small bug in _reshape_mask (#8538 )	2025-01-08 22:11:24 -05:00
nimlgen	aa3d612df2	add script to install amd mockgpu on macOS (#8536 ) * upload artifact every time * hm * sh script * hm * hm2 * hm2 * hm2 * no sudo * def paths * small comments * text * try auth for bigger limits	2025-01-09 01:29:25 +03:00
nimlgen	31fcfe764d	adjust hcq test for ci macos (#8534 )	2025-01-08 16:18:31 +03:00
qazal	947de23cac	add VIEW(DEVICE) to tensor variable [pr] (#8529 ) * add VIEW(DEVICE) to tensor variable [pr] * bind 2 * restrict shapetracker * move var and bind closer * one less line	2025-01-08 01:39:42 -05:00
qazal	b22494b710	restrict tensor const ShapeTracker in spec [pr] (#8447 ) * restrict tensor const ShapeTracker in spec [pr] * pass sink srcs * reject if any of the specs disagree * deceive mypy * viz * default to float * just check the view * create_schedule is gone * test_verify_arg is flaky	2025-01-07 19:05:11 -05:00
patrini32	afef69a37d	MOCKGPU on mac os (#8520 ) * tweaks for macos * fix * fix * typo * remove nvidia changes * remove nv related changes * change address back	2025-01-07 20:27:43 +03:00
nimlgen	ab3ac2b58d	hw interface abstraction (#8524 ) * use HWInterface in autogen * mockgpu * HWInterface * more HWInterface * fix * fix * old code * fix * implicit field definition * add offset check to mockgpu too * refactor * forgot to pass flags + read rewrite * test * play with vfio * nv: this should be kept * try this * vfio * rm overwrite=True * linetr * do not reinit kfd * minor * mypy * mock * init them once --------- Co-authored-by: patrini32 <patrini23@proton.me>	2025-01-07 18:18:28 +03:00
qazal	0e97f807e0	test fixup prereqs for delete_buffer_view [pr] (#8523 )	2025-01-07 11:52:18 +02:00
chenyu	85a4397f27	fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522 ) * fix create_schedule_with_vars usage in allreduce benchmark [pr] because i didn't know how to use it... * increase time limit because tiny17 is slow	2025-01-07 01:30:01 -05:00
chenyu	0061dc7447	fix benchmark allreduce and add to ci [pr] (#8521 )	2025-01-07 00:37:59 -05:00
qazal	ed618a72e7	do not use subbuffer for bitcast (#8514 ) * do not use subbuffer for bitcast * edit that test * explicit test for ptx * ptx	2025-01-06 18:40:46 +02:00
qazal	547fd5078f	cleanups for COPY uop implementation and spec [pr] (#8513 )	2025-01-06 11:39:12 +02:00
qazal	ed121d235c	spec for CAST_BEFORE_VIEW=1 [pr] (#8512 )	2025-01-06 10:43:58 +02:00
qazal	eb7df92136	dedup COPY UOp [pr] (#8506 )	2025-01-06 10:37:20 +02:00
geohotstan	9229867fec	Support asymmetrical pads for all pooling functions (#8109 ) * implemented in tensor * apply onnx tests to asymmetrical pads * better onnx op ordering * correct ceil_mode asymmetrical * fix onnx_ops comments * a few more TODOs and fix some stupidity * fix some typing * fix test * mypy still a little messed up * refactor out pad struct transformation * add simple docs for now * add whatever tests possible * add tests for _resolve_pool_pads * better err msg * whoops didn't mean to include this * retry CI * enable asymmetric pads onnx tests * better docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-05 16:01:08 -05:00
nimlgen	9bc317d5d2	mockcuda (#8503 ) * init mockcuda * run gpu ocelot * fix * sfixes * disable broken tests * linter * these fails as well * pylint * myypy * this fails on real platforms as well * mypy please	2025-01-05 01:23:57 +03:00
qazal	036efa9157	use UOp.substitute for VIZ=1 [pr] (#8497 ) * use UOp.substitute for VIZ=1 [pr] * more acceptable	2025-01-04 20:00:29 +02:00
geohotstan	3dfc8e1706	Share a _resolve_pool_pads function for pool ops in Tensor (#8485 ) * _padding2d -> _resolve_pool_pads * rephrase err msg * even better error msg * check asymmetric first os people don't hit error twice * test against torch	2025-01-03 23:54:11 -05:00

1 2 3 4 5 ...

3206 Commits