tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 06:18:01 -05:00

Author	SHA1	Message	Date
chenyu	6a7f971fa0	hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] (#8553 )	2025-01-10 12:57:44 -05:00
nimlgen	92b59c9b7a	test_hcq limits for mockgpu not (only) ci (#8555 ) * test_hcq limits for mockgpu not (only) ci * rm CI	2025-01-10 17:37:28 +03:00
George Hotz	9833fe83d8	more work on onnx imagenet [pr] (#8552 ) * more work on onnx imagenet [pr] * working quantization * static quant * benchmark onnx 0 dim	2025-01-09 20:28:18 -08:00
chenyu	2cbb34535c	simpler allreduce script [pr] (#8551 ) time everything on tensor level and get time from GlobalCounters.time_sum_s	2025-01-09 21:38:13 -05:00
chenyu	23c56817d8	update and clean up allreduce script [pr] (#8549 ) make `run` to able to run with ring only	2025-01-09 19:35:28 -05:00
geohotstan	299d333806	Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx (#8478 ) * QLinearEverything * ok ort verify passes * this should be int instead * cast to int then char to do wraparound * cleaner * move contrib ops to microsoft ops --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-09 15:08:53 -08:00
qazal	2fd068ffc0	delete empty op (#8544 ) * simple delete EMPTY op * there's no schedule for empty	2025-01-09 14:10:15 -05:00
qazal	f6eb0574f2	start tests for putting the tensor graph in a single kernel [pr] (#8542 ) * start tests for putting the tensor graph in a single kernel [pr] * parallel actually * better view_left test * test a softmax * put all that in sym	2025-01-09 13:33:21 -05:00
qazal	1efb1188d8	support pickling a realized BUFFER uop [pr] (#8541 ) * try 2 at this diff * process replay * delete uops from buffer * free buffers * test_pickle_buffer_uop	2025-01-09 06:37:22 -05:00
eliotgolding	4c5c32ff5f	Small bug in _reshape_mask (#8538 )	2025-01-08 22:11:24 -05:00
nimlgen	aa3d612df2	add script to install amd mockgpu on macOS (#8536 ) * upload artifact every time * hm * sh script * hm * hm2 * hm2 * hm2 * no sudo * def paths * small comments * text * try auth for bigger limits	2025-01-09 01:29:25 +03:00
nimlgen	31fcfe764d	adjust hcq test for ci macos (#8534 )	2025-01-08 16:18:31 +03:00
qazal	947de23cac	add VIEW(DEVICE) to tensor variable [pr] (#8529 ) * add VIEW(DEVICE) to tensor variable [pr] * bind 2 * restrict shapetracker * move var and bind closer * one less line	2025-01-08 01:39:42 -05:00
qazal	b22494b710	restrict tensor const ShapeTracker in spec [pr] (#8447 ) * restrict tensor const ShapeTracker in spec [pr] * pass sink srcs * reject if any of the specs disagree * deceive mypy * viz * default to float * just check the view * create_schedule is gone * test_verify_arg is flaky	2025-01-07 19:05:11 -05:00
patrini32	afef69a37d	MOCKGPU on mac os (#8520 ) * tweaks for macos * fix * fix * typo * remove nvidia changes * remove nv related changes * change address back	2025-01-07 20:27:43 +03:00
nimlgen	ab3ac2b58d	hw interface abstraction (#8524 ) * use HWInterface in autogen * mockgpu * HWInterface * more HWInterface * fix * fix * old code * fix * implicit field definition * add offset check to mockgpu too * refactor * forgot to pass flags + read rewrite * test * play with vfio * nv: this should be kept * try this * vfio * rm overwrite=True * linetr * do not reinit kfd * minor * mypy * mock * init them once --------- Co-authored-by: patrini32 <patrini23@proton.me>	2025-01-07 18:18:28 +03:00
qazal	0e97f807e0	test fixup prereqs for delete_buffer_view [pr] (#8523 )	2025-01-07 11:52:18 +02:00
chenyu	85a4397f27	fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522 ) * fix create_schedule_with_vars usage in allreduce benchmark [pr] because i didn't know how to use it... * increase time limit because tiny17 is slow	2025-01-07 01:30:01 -05:00
chenyu	0061dc7447	fix benchmark allreduce and add to ci [pr] (#8521 )	2025-01-07 00:37:59 -05:00
qazal	ed618a72e7	do not use subbuffer for bitcast (#8514 ) * do not use subbuffer for bitcast * edit that test * explicit test for ptx * ptx	2025-01-06 18:40:46 +02:00
qazal	547fd5078f	cleanups for COPY uop implementation and spec [pr] (#8513 )	2025-01-06 11:39:12 +02:00
qazal	ed121d235c	spec for CAST_BEFORE_VIEW=1 [pr] (#8512 )	2025-01-06 10:43:58 +02:00
qazal	eb7df92136	dedup COPY UOp [pr] (#8506 )	2025-01-06 10:37:20 +02:00
geohotstan	9229867fec	Support asymmetrical pads for all pooling functions (#8109 ) * implemented in tensor * apply onnx tests to asymmetrical pads * better onnx op ordering * correct ceil_mode asymmetrical * fix onnx_ops comments * a few more TODOs and fix some stupidity * fix some typing * fix test * mypy still a little messed up * refactor out pad struct transformation * add simple docs for now * add whatever tests possible * add tests for _resolve_pool_pads * better err msg * whoops didn't mean to include this * retry CI * enable asymmetric pads onnx tests * better docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-05 16:01:08 -05:00
nimlgen	9bc317d5d2	mockcuda (#8503 ) * init mockcuda * run gpu ocelot * fix * sfixes * disable broken tests * linter * these fails as well * pylint * myypy * this fails on real platforms as well * mypy please	2025-01-05 01:23:57 +03:00
qazal	036efa9157	use UOp.substitute for VIZ=1 [pr] (#8497 ) * use UOp.substitute for VIZ=1 [pr] * more acceptable	2025-01-04 20:00:29 +02:00
geohotstan	3dfc8e1706	Share a _resolve_pool_pads function for pool ops in Tensor (#8485 ) * _padding2d -> _resolve_pool_pads * rephrase err msg * even better error msg * check asymmetric first os people don't hit error twice * test against torch	2025-01-03 23:54:11 -05:00
qazal	12fa4340b3	pickle ContextVars in process replay [pr] (#8484 ) * pickle ContextVars in process replay * add test_pickle_context_var [pr] * more realistic	2025-01-03 23:11:54 +08:00
qazal	bd4d7dc4eb	return becomes_map from the scheduler (#8483 ) * return becomes_map from the scheduler * fix test_schedule * fix abstractions2 * s/becomes/becomes_map	2025-01-03 22:47:21 +08:00
qazal	0d33391038	delete unused allow_buffer_view=True arg from bitcast [pr] (#8462 )	2025-01-03 22:20:46 +08:00
uuuvn	048643e7f9	Skip test that counts Ops.LOAD on CLANG+AMX (upcasts up to float16) (#8475 ) This test assumes that float4 is the max upcast and tests that 8 float loads are upcasted to 2 float4 loads, however on CLANG+AMX upcasts can be up to float16 and in this test we get one float8 load instead. The @unittest.skipIf line is copied from test_linearizer.py where a bunch of tests make similar assumptions about upcasts.	2025-01-02 17:17:49 -05:00
geohotstan	de306c615b	[fixed] onnx pool cleanup (#8474 ) * pool janitor duty * actually conv allows asymmetric pads * a little prettier	2025-01-02 16:56:10 -05:00
qazal	08c9d980dc	use const_like in uop zero folding [pr] (#8470 )	2025-01-03 01:05:09 +08:00
chenyu	6fa38367bf	Revert "onnx pool ops clean up (#8471 )" (#8472 ) This reverts commit `241db29ede`.	2025-01-02 11:04:34 -05:00
uuuvn	e7c6282dd6	Fix uop.st for CLANG+AMX (#8460 )	2025-01-02 18:01:41 +02:00
geohotstan	241db29ede	onnx pool ops clean up (#8471 )	2025-01-02 10:45:30 -05:00
geohotstan	c4b13e2f6d	add onnx DequantizeLinear (#8468 ) * is this right? * small changes * dont support float8 * mergeable?	2025-01-02 09:52:49 -05:00
qazal	f2bee34197	tests for symbolic_simple failing tensor const spec [pr] (#8469 ) * tests for symbolic_simple failing tensor const spec [pr] * mul is correct	2025-01-02 19:13:16 +08:00
chenyu	e5c85ec684	type annotation of resolve [pr] (#8467 ) it takes UOp\|bool	2025-01-01 10:21:59 -05:00
nimlgen	c18307e749	AM driver (#6923 ) * connect to gpu * rlc init? * gfx comp start init * early init is hardoded, some progress with fw * gart * progress, next mqd * ring setup, still does not execute anything * ugh write correct reg * pci2: vm * pci2: start psp * vm seems to work * pci2: gfx start * pci2: fix psp ring resp * pci2: try ring * pci2: mes and some fixes * pci2: some progress * pci2: progress * pci2: mm * pci2: discovery * pci2: correct apertures * pci2: b * pci2: i * pci2: l * pci2: o * pci2: cmu * pci2: mes_kiq works * pci2: mes * pci2: kcq does not work( * pci2: unhalt gfx * ops_am * minor * check if amdgpu is there, or we will crash * bring back graph, it just works * less prints * do not init mes (not used) * remove unused files * ops_am: start move into core * ops_am: works * clcks, but still slower * faster + no mes_kiq * vm frags + remove mes * cleanup fw * gmc tiny cleanup * move to ops_amd * comment out what we dont really need * driverless * close in speed * am clean most of ips * gmc to ips * cleaner * new vm walker * comment old one * remove unsued autogens * last write ups * remove psp hardcoded values * more * add logs * ih * p2p and sdma * vfio hal and interrupts * smth * amd dev iface * minor after rebase * bind for sdma * Revert "bind for sdma" This reverts commit `a90766514d`. * tmp * debug new mm * ugh, allreduce hangs fixed * p1 * works * no pci.py * cleaner a bit * smth * tiny cleanups * cleaner a bit * pciiface * linter * linter 2 * linter 3 * linter * pylint * reverted unrelated changes * unrelated * cmp tool * ugh wrong fw * clockgating * unrelated * alloc smaller chunks * this * opt sigs * collect stat * ops * upd * proclogs * proclogs2 * vfio * ruff * linter pylint * oops * mypy p1 * mem fix * mypy p2 * mypy p3 * mypy p4 * correct * minor * more tests * linter in tests * pci_regs header * minor write up * setup * do not require libs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-31 23:06:17 +03:00
chenyu	f3fdec940d	Tensor.mod (#8458 ) it's a python style mod. possibily can be cleaner with a floor div relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs	2024-12-31 11:31:42 -05:00
George Hotz	4c94726bac	remove uop mutability [pr] (#8441 ) * remove uop mutability [pr] * test fixups * most tests pass * more tests pass * lil test fixups * them too * fix test * unneeded * err, that * fix test_hcq * fix test failures * fix that test * tensor universe * does this pass test * Revert "does this pass test" This reverts commit `ed516b3169`. * Revert "tensor universe" This reverts commit `c21301852a`. * proper spidering for uops * cleanups * all tensors * all tensors * slow but correct * fast * no WeakSet * faster * no need for list * revert that	2024-12-31 00:29:56 -05:00
George Hotz	e276b6eecd	use Tensor.replace [pr] (#8455 )	2024-12-30 23:20:46 -05:00
qazal	c7ec0ab674	delete unused View lt support (2) (#8451 ) * delete lt on view (2) * the scheduler uses symbolic_simple	2024-12-31 07:01:25 +08:00
qazal	866dfa1f23	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
George Hotz	180916257d	add children tracking to uop [pr] (#8448 )	2024-12-30 10:58:20 -05:00
George Hotz	29c14f1cbf	hotfix: update tests for no uop mut	2024-12-30 10:05:37 -05:00
qazal	7499139239	scheduler renames from the buffer_shape branch [pr] (#8444 ) * scheduler refactors and renames from the buffer_shape branch [pr] * all unmasked sts are allowed here * only renames	2024-12-30 16:33:38 +08:00
George Hotz	b71c51191b	tests from remove uop mutability [pr] (#8442 ) * tests from remove uop mutability [pr] * more test fix * simpler test fix * remove that	2024-12-29 12:14:10 -05:00
qazal	34987a03af	const copy folding spec + multi.py behavior [pr] (#8436 ) * const copy folding spec + multi behavior [pr] * copy from clang, move multi test	2024-12-29 23:12:13 +08:00

... 24 25 26 27 28 ...

4433 Commits