tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 23:48:01 -05:00

Author	SHA1	Message	Date
George Hotz	cd4edc5206	hotfix: pylint ignores runtime for speed	2025-01-10 09:07:18 -08:00
nimlgen	92b59c9b7a	test_hcq limits for mockgpu not (only) ci (#8555 ) * test_hcq limits for mockgpu not (only) ci * rm CI	2025-01-10 17:37:28 +03:00
George Hotz	9833fe83d8	more work on onnx imagenet [pr] (#8552 ) * more work on onnx imagenet [pr] * working quantization * static quant * benchmark onnx 0 dim	2025-01-09 20:28:18 -08:00
George Hotz	e172b759f0	more working (#8550 )	2025-01-09 18:40:08 -08:00
chenyu	2cbb34535c	simpler allreduce script [pr] (#8551 ) time everything on tensor level and get time from GlobalCounters.time_sum_s	2025-01-09 21:38:13 -05:00
chenyu	23c56817d8	update and clean up allreduce script [pr] (#8549 ) make `run` to able to run with ring only	2025-01-09 19:35:28 -05:00
George Hotz	5720871903	onnx consts are const [pr] (#8548 )	2025-01-09 16:09:22 -08:00
chenyu	88661cd96f	fix checking DiskBuffer is opened [pr] (#8547 ) `assert self.device.mem is not None` did not assert because `.mem` triggers AttributeError first	2025-01-09 18:58:36 -05:00
George Hotz	62447c253d	viz cleanups [pr] (#8498 ) * viz cleanups [pr] * Update serve.py	2025-01-09 15:46:48 -08:00
geohotstan	299d333806	Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx (#8478 ) * QLinearEverything * ok ort verify passes * this should be int instead * cast to int then char to do wraparound * cleaner * move contrib ops to microsoft ops --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-09 15:08:53 -08:00
qazal	2fd068ffc0	delete empty op (#8544 ) * simple delete EMPTY op * there's no schedule for empty	2025-01-09 14:10:15 -05:00
qazal	f6eb0574f2	start tests for putting the tensor graph in a single kernel [pr] (#8542 ) * start tests for putting the tensor graph in a single kernel [pr] * parallel actually * better view_left test * test a softmax * put all that in sym	2025-01-09 13:33:21 -05:00
qazal	83a8217cbf	hotfix: TRACK_MATCH_STATS=2 should not launch viz [pr] (#8543 )	2025-01-09 11:10:15 -05:00
qazal	1efb1188d8	support pickling a realized BUFFER uop [pr] (#8541 ) * try 2 at this diff * process replay * delete uops from buffer * free buffers * test_pickle_buffer_uop	2025-01-09 06:37:22 -05:00
qazal	7595352dfc	refactor buffer_view op structure [pr] (#8540 ) * refactor buffer_view op [pr] * only empty now * same st * empty shape is fine	2025-01-09 03:07:46 -05:00
eliotgolding	4c5c32ff5f	Small bug in _reshape_mask (#8538 )	2025-01-08 22:11:24 -05:00
nimlgen	aa3d612df2	add script to install amd mockgpu on macOS (#8536 ) * upload artifact every time * hm * sh script * hm * hm2 * hm2 * hm2 * no sudo * def paths * small comments * text * try auth for bigger limits	2025-01-09 01:29:25 +03:00
nimlgen	31fcfe764d	adjust hcq test for ci macos (#8534 )	2025-01-08 16:18:31 +03:00
qazal	49abe6d3a6	little more compact tensor_uop_spec [pr] (#8533 ) * little more compact tensor_uop_spec [pr] * space * fix	2025-01-08 08:01:53 -05:00
patrini32	21c7d7c71a	MOCKGPU amd test on OSX (#8505 ) * add tests * Refactor * cache only amd/comgr/build (saves a lot of space) * fix * silence warning and add check for cache hit before installing cmake * run only pytest * use actions/cache * lower timeout-minutes and add Device.DEFAULT step * add nvidia to Device.DEFAULT check * typo * fix * Check only for amd and run only 2 test	2025-01-08 14:27:56 +03:00
nimlgen	2f530adb04	hwiface: close fd when valid (#8530 )	2025-01-08 10:43:59 +03:00
qazal	947de23cac	add VIEW(DEVICE) to tensor variable [pr] (#8529 ) * add VIEW(DEVICE) to tensor variable [pr] * bind 2 * restrict shapetracker * move var and bind closer * one less line	2025-01-08 01:39:42 -05:00
qazal	b22494b710	restrict tensor const ShapeTracker in spec [pr] (#8447 ) * restrict tensor const ShapeTracker in spec [pr] * pass sink srcs * reject if any of the specs disagree * deceive mypy * viz * default to float * just check the view * create_schedule is gone * test_verify_arg is flaky	2025-01-07 19:05:11 -05:00
patrini32	afef69a37d	MOCKGPU on mac os (#8520 ) * tweaks for macos * fix * fix * typo * remove nvidia changes * remove nv related changes * change address back	2025-01-07 20:27:43 +03:00
nimlgen	ab3ac2b58d	hw interface abstraction (#8524 ) * use HWInterface in autogen * mockgpu * HWInterface * more HWInterface * fix * fix * old code * fix * implicit field definition * add offset check to mockgpu too * refactor * forgot to pass flags + read rewrite * test * play with vfio * nv: this should be kept * try this * vfio * rm overwrite=True * linetr * do not reinit kfd * minor * mypy * mock * init them once --------- Co-authored-by: patrini32 <patrini23@proton.me>	2025-01-07 18:18:28 +03:00
qazal	0e97f807e0	test fixup prereqs for delete_buffer_view [pr] (#8523 )	2025-01-07 11:52:18 +02:00
chenyu	85a4397f27	fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522 ) * fix create_schedule_with_vars usage in allreduce benchmark [pr] because i didn't know how to use it... * increase time limit because tiny17 is slow	2025-01-07 01:30:01 -05:00
chenyu	0061dc7447	fix benchmark allreduce and add to ci [pr] (#8521 )	2025-01-07 00:37:59 -05:00
geohotstan	c69f459c96	Add checking variable dimension to onnx (#8518 ) * validate variable dims and fix buffer_parse to not use numpy * fix var_dim parsing * gah float16 * revert buffer_parse stuff * revert that revert * correct some err msges * add some more debug msgs I find helpful * tensor init noop * add an assert just for the sake of it. --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-07 00:30:35 -05:00
nimlgen	5cb9443ebb	PROFILE is enabled when VIZ is enabled (#8516 )	2025-01-06 19:47:16 +03:00
qazal	ed618a72e7	do not use subbuffer for bitcast (#8514 ) * do not use subbuffer for bitcast * edit that test * explicit test for ptx * ptx	2025-01-06 18:40:46 +02:00
nimlgen	280143467b	am: tune all sleep timings to match kernel (#8515 ) * am: tune all sleep timings to match kernel * rm	2025-01-06 18:03:57 +03:00
qazal	547fd5078f	cleanups for COPY uop implementation and spec [pr] (#8513 )	2025-01-06 11:39:12 +02:00
qazal	ed121d235c	spec for CAST_BEFORE_VIEW=1 [pr] (#8512 )	2025-01-06 10:43:58 +02:00
qazal	eb7df92136	dedup COPY UOp [pr] (#8506 )	2025-01-06 10:37:20 +02:00
chenyu	76a138cdb6	simpler UOp.st [pr] (#8510 )	2025-01-05 22:08:14 -05:00
chenyu	b6be407bc6	fix handcode_opt bert [pr] (#8509 ) * fix handcode_opt bert [pr] * too slow	2025-01-05 19:14:12 -05:00
geohotstan	9229867fec	Support asymmetrical pads for all pooling functions (#8109 ) * implemented in tensor * apply onnx tests to asymmetrical pads * better onnx op ordering * correct ceil_mode asymmetrical * fix onnx_ops comments * a few more TODOs and fix some stupidity * fix some typing * fix test * mypy still a little messed up * refactor out pad struct transformation * add simple docs for now * add whatever tests possible * add tests for _resolve_pool_pads * better err msg * whoops didn't mean to include this * retry CI * enable asymmetric pads onnx tests * better docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-05 16:01:08 -05:00
uuuvn	c9c7f1be46	Remove unused R_AARCH64_CALL26 relocation (#8508 ) First iteration of the AMX fix was using symbol lookup + trampoline approach which required this, however later i replaced it by marking amx function `static` and assumed that relocation was still used when callee wasn't inlined, however this turned out not to be the case because the callee can't be moved around by linker at link-time and can't be overloaded by other symbols (`static` means priority + local visibility)	2025-01-06 00:00:21 +03:00
nimlgen	b4f4a3ac12	am: minor parts (#8507 )	2025-01-05 23:05:21 +03:00
qazal	0e0cba2cfc	move llvm_bf16_cast to the renderer [pr] (#8502 ) * move llvm_bf16_cast to the renderer [pr] * cast to half is fine too * delete the old one * wish i could just cast the ptr	2025-01-05 13:02:41 +02:00
chenyu	4143f6a7d9	unused `from __future__ import annotations` [pr] (#8504 )	2025-01-04 23:11:01 -05:00
nimlgen	9bc317d5d2	mockcuda (#8503 ) * init mockcuda * run gpu ocelot * fix * sfixes * disable broken tests * linter * these fails as well * pylint * myypy * this fails on real platforms as well * mypy please	2025-01-05 01:23:57 +03:00
George Hotz	ddad4d55da	add typing to tqdm [pr] (#8500 )	2025-01-04 13:55:52 -05:00
qazal	036efa9157	use UOp.substitute for VIZ=1 [pr] (#8497 ) * use UOp.substitute for VIZ=1 [pr] * more acceptable	2025-01-04 20:00:29 +02:00
uuuvn	615d5276b1	Suppress 'X warnings generated.' in MTLCompiler (#8489 ) '-fno-caret-diagnostics' is what clang-tidy uses when user passes --quiet	2025-01-04 10:22:37 -05:00
nimlgen	5df213d51e	am: remove alloc frags logic (#8491 )	2025-01-04 12:25:20 +03:00
geohotstan	3dfc8e1706	Share a _resolve_pool_pads function for pool ops in Tensor (#8485 ) * _padding2d -> _resolve_pool_pads * rephrase err msg * even better error msg * check asymmetric first os people don't hit error twice * test against torch	2025-01-03 23:54:11 -05:00
chenyu	6c639dee5c	more informative kernel opt error messages [pr] (#8487 )	2025-01-03 14:29:36 -05:00
uuuvn	5ffc50d58c	Clang JIT (#8481 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-03 11:12:55 -05:00

1 2 3 4 5 ...

7464 Commits