tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-27 15:58:10 -05:00

Author	SHA1	Message	Date
nimlgen	280143467b	am: tune all sleep timings to match kernel (#8515 ) * am: tune all sleep timings to match kernel * rm	2025-01-06 18:03:57 +03:00
qazal	547fd5078f	cleanups for COPY uop implementation and spec [pr] (#8513 )	2025-01-06 11:39:12 +02:00
qazal	ed121d235c	spec for CAST_BEFORE_VIEW=1 [pr] (#8512 )	2025-01-06 10:43:58 +02:00
qazal	eb7df92136	dedup COPY UOp [pr] (#8506 )	2025-01-06 10:37:20 +02:00
chenyu	76a138cdb6	simpler UOp.st [pr] (#8510 )	2025-01-05 22:08:14 -05:00
chenyu	b6be407bc6	fix handcode_opt bert [pr] (#8509 ) * fix handcode_opt bert [pr] * too slow	2025-01-05 19:14:12 -05:00
geohotstan	9229867fec	Support asymmetrical pads for all pooling functions (#8109 ) * implemented in tensor * apply onnx tests to asymmetrical pads * better onnx op ordering * correct ceil_mode asymmetrical * fix onnx_ops comments * a few more TODOs and fix some stupidity * fix some typing * fix test * mypy still a little messed up * refactor out pad struct transformation * add simple docs for now * add whatever tests possible * add tests for _resolve_pool_pads * better err msg * whoops didn't mean to include this * retry CI * enable asymmetric pads onnx tests * better docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-05 16:01:08 -05:00
uuuvn	c9c7f1be46	Remove unused R_AARCH64_CALL26 relocation (#8508 ) First iteration of the AMX fix was using symbol lookup + trampoline approach which required this, however later i replaced it by marking amx function `static` and assumed that relocation was still used when callee wasn't inlined, however this turned out not to be the case because the callee can't be moved around by linker at link-time and can't be overloaded by other symbols (`static` means priority + local visibility)	2025-01-06 00:00:21 +03:00
nimlgen	b4f4a3ac12	am: minor parts (#8507 )	2025-01-05 23:05:21 +03:00
qazal	0e0cba2cfc	move llvm_bf16_cast to the renderer [pr] (#8502 ) * move llvm_bf16_cast to the renderer [pr] * cast to half is fine too * delete the old one * wish i could just cast the ptr	2025-01-05 13:02:41 +02:00
chenyu	4143f6a7d9	unused `from __future__ import annotations` [pr] (#8504 )	2025-01-04 23:11:01 -05:00
nimlgen	9bc317d5d2	mockcuda (#8503 ) * init mockcuda * run gpu ocelot * fix * sfixes * disable broken tests * linter * these fails as well * pylint * myypy * this fails on real platforms as well * mypy please	2025-01-05 01:23:57 +03:00
George Hotz	ddad4d55da	add typing to tqdm [pr] (#8500 )	2025-01-04 13:55:52 -05:00
qazal	036efa9157	use UOp.substitute for VIZ=1 [pr] (#8497 ) * use UOp.substitute for VIZ=1 [pr] * more acceptable	2025-01-04 20:00:29 +02:00
uuuvn	615d5276b1	Suppress 'X warnings generated.' in MTLCompiler (#8489 ) '-fno-caret-diagnostics' is what clang-tidy uses when user passes --quiet	2025-01-04 10:22:37 -05:00
nimlgen	5df213d51e	am: remove alloc frags logic (#8491 )	2025-01-04 12:25:20 +03:00
geohotstan	3dfc8e1706	Share a _resolve_pool_pads function for pool ops in Tensor (#8485 ) * _padding2d -> _resolve_pool_pads * rephrase err msg * even better error msg * check asymmetric first os people don't hit error twice * test against torch	2025-01-03 23:54:11 -05:00
chenyu	6c639dee5c	more informative kernel opt error messages [pr] (#8487 )	2025-01-03 14:29:36 -05:00
uuuvn	5ffc50d58c	Clang JIT (#8481 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-03 11:12:55 -05:00
qazal	12fa4340b3	pickle ContextVars in process replay [pr] (#8484 ) * pickle ContextVars in process replay * add test_pickle_context_var [pr] * more realistic	2025-01-03 23:11:54 +08:00
qazal	bd4d7dc4eb	return becomes_map from the scheduler (#8483 ) * return becomes_map from the scheduler * fix test_schedule * fix abstractions2 * s/becomes/becomes_map	2025-01-03 22:47:21 +08:00
qazal	c163b2c5f0	give copy a device: COPY(device, copyin) (#8482 )	2025-01-03 22:34:38 +08:00
qazal	0d33391038	delete unused allow_buffer_view=True arg from bitcast [pr] (#8462 )	2025-01-03 22:20:46 +08:00
nimlgen	5d37d33fc5	update typing.Optional to 3.10 for hcq (#8479 )	2025-01-03 16:20:49 +03:00
uuuvn	048643e7f9	Skip test that counts Ops.LOAD on CLANG+AMX (upcasts up to float16) (#8475 ) This test assumes that float4 is the max upcast and tests that 8 float loads are upcasted to 2 float4 loads, however on CLANG+AMX upcasts can be up to float16 and in this test we get one float8 load instead. The @unittest.skipIf line is copied from test_linearizer.py where a bunch of tests make similar assumptions about upcasts.	2025-01-02 17:17:49 -05:00
geohotstan	de306c615b	[fixed] onnx pool cleanup (#8474 ) * pool janitor duty * actually conv allows asymmetric pads * a little prettier	2025-01-02 16:56:10 -05:00
qazal	08c9d980dc	use const_like in uop zero folding [pr] (#8470 )	2025-01-03 01:05:09 +08:00
chenyu	6fa38367bf	Revert "onnx pool ops clean up (#8471 )" (#8472 ) This reverts commit `241db29ede`.	2025-01-02 11:04:34 -05:00
uuuvn	e7c6282dd6	Fix uop.st for CLANG+AMX (#8460 )	2025-01-02 18:01:41 +02:00
geohotstan	241db29ede	onnx pool ops clean up (#8471 )	2025-01-02 10:45:30 -05:00
geohotstan	c4b13e2f6d	add onnx DequantizeLinear (#8468 ) * is this right? * small changes * dont support float8 * mergeable?	2025-01-02 09:52:49 -05:00
qazal	f2bee34197	tests for symbolic_simple failing tensor const spec [pr] (#8469 ) * tests for symbolic_simple failing tensor const spec [pr] * mul is correct	2025-01-02 19:13:16 +08:00
Kyunghyun Park	dc9af4e2fc	[VIZ] fix hljs.highlightElement to correctly target <code/> (#8465 ) * hljs.highlightElement target code not pre * createPre * no style change * real no style change * remove unnecessary scroll bar * horizontal scrollbar appears only when scrolled all the way to the bottom * misc	2025-01-02 14:50:51 +08:00
chenyu	e5c85ec684	type annotation of resolve [pr] (#8467 ) it takes UOp\|bool	2025-01-01 10:21:59 -05:00
George Hotz	e3c9cfad80	am driver: print on that assert (#8463 )	2024-12-31 18:01:59 -05:00
nimlgen	c18307e749	AM driver (#6923 ) * connect to gpu * rlc init? * gfx comp start init * early init is hardoded, some progress with fw * gart * progress, next mqd * ring setup, still does not execute anything * ugh write correct reg * pci2: vm * pci2: start psp * vm seems to work * pci2: gfx start * pci2: fix psp ring resp * pci2: try ring * pci2: mes and some fixes * pci2: some progress * pci2: progress * pci2: mm * pci2: discovery * pci2: correct apertures * pci2: b * pci2: i * pci2: l * pci2: o * pci2: cmu * pci2: mes_kiq works * pci2: mes * pci2: kcq does not work( * pci2: unhalt gfx * ops_am * minor * check if amdgpu is there, or we will crash * bring back graph, it just works * less prints * do not init mes (not used) * remove unused files * ops_am: start move into core * ops_am: works * clcks, but still slower * faster + no mes_kiq * vm frags + remove mes * cleanup fw * gmc tiny cleanup * move to ops_amd * comment out what we dont really need * driverless * close in speed * am clean most of ips * gmc to ips * cleaner * new vm walker * comment old one * remove unsued autogens * last write ups * remove psp hardcoded values * more * add logs * ih * p2p and sdma * vfio hal and interrupts * smth * amd dev iface * minor after rebase * bind for sdma * Revert "bind for sdma" This reverts commit `a90766514d`. * tmp * debug new mm * ugh, allreduce hangs fixed * p1 * works * no pci.py * cleaner a bit * smth * tiny cleanups * cleaner a bit * pciiface * linter * linter 2 * linter 3 * linter * pylint * reverted unrelated changes * unrelated * cmp tool * ugh wrong fw * clockgating * unrelated * alloc smaller chunks * this * opt sigs * collect stat * ops * upd * proclogs * proclogs2 * vfio * ruff * linter pylint * oops * mypy p1 * mem fix * mypy p2 * mypy p3 * mypy p4 * correct * minor * more tests * linter in tests * pci_regs header * minor write up * setup * do not require libs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-31 23:06:17 +03:00
George Hotz	d4a1d5211e	bring back the DSP runtime	2024-12-31 12:01:42 -05:00
George Hotz	24de25b52f	example to benchmark onnx [pr] (#8459 ) * example to benchmark onnx [pr] * reset global count	2024-12-31 11:38:33 -05:00
chenyu	f3fdec940d	Tensor.mod (#8458 ) it's a python style mod. possibily can be cleaner with a floor div relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs	2024-12-31 11:31:42 -05:00
qazal	ae00fa3b28	delete (slow) viz prepickle [pr] (#8456 )	2024-12-31 20:26:18 +08:00
George Hotz	4c94726bac	remove uop mutability [pr] (#8441 ) * remove uop mutability [pr] * test fixups * most tests pass * more tests pass * lil test fixups * them too * fix test * unneeded * err, that * fix test_hcq * fix test failures * fix that test * tensor universe * does this pass test * Revert "does this pass test" This reverts commit `ed516b3169`. * Revert "tensor universe" This reverts commit `c21301852a`. * proper spidering for uops * cleanups * all tensors * all tensors * slow but correct * fast * no WeakSet * faster * no need for list * revert that	2024-12-31 00:29:56 -05:00
George Hotz	e276b6eecd	use Tensor.replace [pr] (#8455 )	2024-12-30 23:20:46 -05:00
chenyu	19a54ae0b4	add Tensor.roll and Tensor.rearrange to doc (#8454 ) also moved rearrange in tensor.py to high level movement	2024-12-30 20:25:50 -05:00
Alessandro Benetti	12cccd8bc5	fix rearrange docs (#8453 ) * fix rearrange docs * just the typo	2024-12-30 20:04:06 -05:00
qazal	c7ec0ab674	delete unused View lt support (2) (#8451 ) * delete lt on view (2) * the scheduler uses symbolic_simple	2024-12-31 07:01:25 +08:00
George Hotz	803a47494e	Revert "Clang JIT (#8312 )" (#8452 ) This reverts commit `b6266c8e41`.	2024-12-30 17:49:35 -05:00
uuuvn	b6266c8e41	Clang JIT (#8312 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-30 17:37:53 -05:00
qazal	d157b20027	delete create_schedule, only create_schedule_with_vars [pr] (#8450 )	2024-12-31 04:20:53 +08:00
qazal	866dfa1f23	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
George Hotz	0addbad36d	Happy New Year! Let's get AM merged	2024-12-30 13:15:10 -05:00

... 63 64 65 66 67 ...

10633 Commits