tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 06:18:01 -05:00

Author	SHA1	Message	Date
nimlgen	ab3ac2b58d	hw interface abstraction (#8524 ) * use HWInterface in autogen * mockgpu * HWInterface * more HWInterface * fix * fix * old code * fix * implicit field definition * add offset check to mockgpu too * refactor * forgot to pass flags + read rewrite * test * play with vfio * nv: this should be kept * try this * vfio * rm overwrite=True * linetr * do not reinit kfd * minor * mypy * mock * init them once --------- Co-authored-by: patrini32 <patrini23@proton.me>	2025-01-07 18:18:28 +03:00
qazal	0e97f807e0	test fixup prereqs for delete_buffer_view [pr] (#8523 )	2025-01-07 11:52:18 +02:00
chenyu	85a4397f27	fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522 ) * fix create_schedule_with_vars usage in allreduce benchmark [pr] because i didn't know how to use it... * increase time limit because tiny17 is slow	2025-01-07 01:30:01 -05:00
chenyu	0061dc7447	fix benchmark allreduce and add to ci [pr] (#8521 )	2025-01-07 00:37:59 -05:00
qazal	ed618a72e7	do not use subbuffer for bitcast (#8514 ) * do not use subbuffer for bitcast * edit that test * explicit test for ptx * ptx	2025-01-06 18:40:46 +02:00
qazal	547fd5078f	cleanups for COPY uop implementation and spec [pr] (#8513 )	2025-01-06 11:39:12 +02:00
qazal	ed121d235c	spec for CAST_BEFORE_VIEW=1 [pr] (#8512 )	2025-01-06 10:43:58 +02:00
qazal	eb7df92136	dedup COPY UOp [pr] (#8506 )	2025-01-06 10:37:20 +02:00
geohotstan	9229867fec	Support asymmetrical pads for all pooling functions (#8109 ) * implemented in tensor * apply onnx tests to asymmetrical pads * better onnx op ordering * correct ceil_mode asymmetrical * fix onnx_ops comments * a few more TODOs and fix some stupidity * fix some typing * fix test * mypy still a little messed up * refactor out pad struct transformation * add simple docs for now * add whatever tests possible * add tests for _resolve_pool_pads * better err msg * whoops didn't mean to include this * retry CI * enable asymmetric pads onnx tests * better docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-05 16:01:08 -05:00
nimlgen	9bc317d5d2	mockcuda (#8503 ) * init mockcuda * run gpu ocelot * fix * sfixes * disable broken tests * linter * these fails as well * pylint * myypy * this fails on real platforms as well * mypy please	2025-01-05 01:23:57 +03:00
qazal	036efa9157	use UOp.substitute for VIZ=1 [pr] (#8497 ) * use UOp.substitute for VIZ=1 [pr] * more acceptable	2025-01-04 20:00:29 +02:00
geohotstan	3dfc8e1706	Share a _resolve_pool_pads function for pool ops in Tensor (#8485 ) * _padding2d -> _resolve_pool_pads * rephrase err msg * even better error msg * check asymmetric first os people don't hit error twice * test against torch	2025-01-03 23:54:11 -05:00
qazal	12fa4340b3	pickle ContextVars in process replay [pr] (#8484 ) * pickle ContextVars in process replay * add test_pickle_context_var [pr] * more realistic	2025-01-03 23:11:54 +08:00
qazal	bd4d7dc4eb	return becomes_map from the scheduler (#8483 ) * return becomes_map from the scheduler * fix test_schedule * fix abstractions2 * s/becomes/becomes_map	2025-01-03 22:47:21 +08:00
qazal	0d33391038	delete unused allow_buffer_view=True arg from bitcast [pr] (#8462 )	2025-01-03 22:20:46 +08:00
uuuvn	048643e7f9	Skip test that counts Ops.LOAD on CLANG+AMX (upcasts up to float16) (#8475 ) This test assumes that float4 is the max upcast and tests that 8 float loads are upcasted to 2 float4 loads, however on CLANG+AMX upcasts can be up to float16 and in this test we get one float8 load instead. The @unittest.skipIf line is copied from test_linearizer.py where a bunch of tests make similar assumptions about upcasts.	2025-01-02 17:17:49 -05:00
geohotstan	de306c615b	[fixed] onnx pool cleanup (#8474 ) * pool janitor duty * actually conv allows asymmetric pads * a little prettier	2025-01-02 16:56:10 -05:00
qazal	08c9d980dc	use const_like in uop zero folding [pr] (#8470 )	2025-01-03 01:05:09 +08:00
chenyu	6fa38367bf	Revert "onnx pool ops clean up (#8471 )" (#8472 ) This reverts commit `241db29ede`.	2025-01-02 11:04:34 -05:00
uuuvn	e7c6282dd6	Fix uop.st for CLANG+AMX (#8460 )	2025-01-02 18:01:41 +02:00
geohotstan	241db29ede	onnx pool ops clean up (#8471 )	2025-01-02 10:45:30 -05:00
geohotstan	c4b13e2f6d	add onnx DequantizeLinear (#8468 ) * is this right? * small changes * dont support float8 * mergeable?	2025-01-02 09:52:49 -05:00
qazal	f2bee34197	tests for symbolic_simple failing tensor const spec [pr] (#8469 ) * tests for symbolic_simple failing tensor const spec [pr] * mul is correct	2025-01-02 19:13:16 +08:00
chenyu	e5c85ec684	type annotation of resolve [pr] (#8467 ) it takes UOp\|bool	2025-01-01 10:21:59 -05:00
nimlgen	c18307e749	AM driver (#6923 ) * connect to gpu * rlc init? * gfx comp start init * early init is hardoded, some progress with fw * gart * progress, next mqd * ring setup, still does not execute anything * ugh write correct reg * pci2: vm * pci2: start psp * vm seems to work * pci2: gfx start * pci2: fix psp ring resp * pci2: try ring * pci2: mes and some fixes * pci2: some progress * pci2: progress * pci2: mm * pci2: discovery * pci2: correct apertures * pci2: b * pci2: i * pci2: l * pci2: o * pci2: cmu * pci2: mes_kiq works * pci2: mes * pci2: kcq does not work( * pci2: unhalt gfx * ops_am * minor * check if amdgpu is there, or we will crash * bring back graph, it just works * less prints * do not init mes (not used) * remove unused files * ops_am: start move into core * ops_am: works * clcks, but still slower * faster + no mes_kiq * vm frags + remove mes * cleanup fw * gmc tiny cleanup * move to ops_amd * comment out what we dont really need * driverless * close in speed * am clean most of ips * gmc to ips * cleaner * new vm walker * comment old one * remove unsued autogens * last write ups * remove psp hardcoded values * more * add logs * ih * p2p and sdma * vfio hal and interrupts * smth * amd dev iface * minor after rebase * bind for sdma * Revert "bind for sdma" This reverts commit `a90766514d`. * tmp * debug new mm * ugh, allreduce hangs fixed * p1 * works * no pci.py * cleaner a bit * smth * tiny cleanups * cleaner a bit * pciiface * linter * linter 2 * linter 3 * linter * pylint * reverted unrelated changes * unrelated * cmp tool * ugh wrong fw * clockgating * unrelated * alloc smaller chunks * this * opt sigs * collect stat * ops * upd * proclogs * proclogs2 * vfio * ruff * linter pylint * oops * mypy p1 * mem fix * mypy p2 * mypy p3 * mypy p4 * correct * minor * more tests * linter in tests * pci_regs header * minor write up * setup * do not require libs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-31 23:06:17 +03:00
chenyu	f3fdec940d	Tensor.mod (#8458 ) it's a python style mod. possibily can be cleaner with a floor div relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs	2024-12-31 11:31:42 -05:00
George Hotz	4c94726bac	remove uop mutability [pr] (#8441 ) * remove uop mutability [pr] * test fixups * most tests pass * more tests pass * lil test fixups * them too * fix test * unneeded * err, that * fix test_hcq * fix test failures * fix that test * tensor universe * does this pass test * Revert "does this pass test" This reverts commit `ed516b3169`. * Revert "tensor universe" This reverts commit `c21301852a`. * proper spidering for uops * cleanups * all tensors * all tensors * slow but correct * fast * no WeakSet * faster * no need for list * revert that	2024-12-31 00:29:56 -05:00
George Hotz	e276b6eecd	use Tensor.replace [pr] (#8455 )	2024-12-30 23:20:46 -05:00
qazal	c7ec0ab674	delete unused View lt support (2) (#8451 ) * delete lt on view (2) * the scheduler uses symbolic_simple	2024-12-31 07:01:25 +08:00
qazal	866dfa1f23	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
George Hotz	180916257d	add children tracking to uop [pr] (#8448 )	2024-12-30 10:58:20 -05:00
George Hotz	29c14f1cbf	hotfix: update tests for no uop mut	2024-12-30 10:05:37 -05:00
qazal	7499139239	scheduler renames from the buffer_shape branch [pr] (#8444 ) * scheduler refactors and renames from the buffer_shape branch [pr] * all unmasked sts are allowed here * only renames	2024-12-30 16:33:38 +08:00
George Hotz	b71c51191b	tests from remove uop mutability [pr] (#8442 ) * tests from remove uop mutability [pr] * more test fix * simpler test fix * remove that	2024-12-29 12:14:10 -05:00
qazal	34987a03af	const copy folding spec + multi.py behavior [pr] (#8436 ) * const copy folding spec + multi behavior [pr] * copy from clang, move multi test	2024-12-29 23:12:13 +08:00
qazal	a44cd1e6f7	add collapse_view to the scheduler [pr] (#8440 )	2024-12-29 21:30:29 +08:00
qazal	da2fa0b37f	viz early serialize to json + don't viz rewrites on const [pr] (#8435 )	2024-12-28 21:32:25 +08:00
qazal	90ce2c6029	UOp shape spec and requirements from TIP 4 (#8428 ) * UOp ShapeTracker conceptual refactor [pr] * add the UOp shape spec * assign spec * test a permuted assign * lint + more work * collapse assign after it swizzles the store [pr] * more work, refine valid * permute the other way * shapetracker cleanup * this assert should work now	2024-12-28 18:32:52 +08:00
chenyu	f69ad7506a	simpler helpers.Context [pr] (#8433 ) instead of having a class var for whole stack, store the old context in each Context. also updated a test that ContextVar created in Context is not being cleared after the Context block	2024-12-27 17:27:14 -05:00
chenyu	ba59b0528f	update TestContextVars for recreation [pr] (#8430 ) raise RuntimeError instead of assert, and update all the skipped test cases	2024-12-27 13:34:23 -05:00
nimlgen	0a139b1436	amd iface abstraction (#8413 ) * start on amd iface * t * unused import * fixes * internal api	2024-12-27 15:53:53 +03:00
Francis Lata	5755ac1f72	Fix FC layer ResNet load_from_pretrained error (#8387 ) * validate that FC exists before loading pretrained weights * add test case for ResNet pretrained model without FC layer * remove extra newline * rename test case * reraise exception if not handled by check	2024-12-26 18:11:27 -05:00
qazal	b5820a5209	deletions from an ops.py "instant rule" audit [pr] (#8424 ) * UOp.st cleanup 2 [pr] * deletions from an ops.py instant rule audit [pr] * note	2024-12-27 00:49:04 +08:00
qazal	9defbc7d54	add symbolic_simple to the scheduler [pr] (#8419 )	2024-12-26 20:05:08 +08:00
Sieds Lykles	6bb54eb532	Add variations for some ADD patterns (#8393 ) * Add variations for some ADD patterns * Add test and remove redundant rule	2024-12-25 19:49:39 -05:00
qazal	313bdfa43f	Add View lt support back [pr] (#8407 ) * Revert "remove unused View.t and lt [pr] (#8374)" This reverts commit `8fdcb60461`. * green test_masked_const_elementwise	2024-12-26 01:09:59 +08:00
qazal	4cbe5919d6	tensor uops symbolic folding spec [pr] (#8406 )	2024-12-26 00:26:41 +08:00
qazal	6422936b62	fix pre-commit ruff error [pr] (#8405 )	2024-12-26 00:12:57 +08:00
chenyu	de3705168e	update idiv doc and test cases (#8398 ) test more cases when either numerator and denominator is negative and has remainder or not	2024-12-24 17:03:18 -05:00
nimlgen	a647f3dd2c	move mockgpu to tests [pr] (#8396 ) * move mockgpu to tests * linter * i'm so sorry * sorry, python * path	2024-12-24 23:48:02 +03:00

... 28 29 30 31 32 ...

4618 Commits