tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 23:48:01 -05:00

Author	SHA1	Message	Date
qazal	c52cd2b437	kernel op cleanups + use ScheduleItem [pr] (#9009 )	2025-02-10 17:54:30 +01:00
chenyu	25fa5e4d5f	enable backward tests in test_std_one_in_axis [pr] (#9007 ) still one correction=0 case is broken Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-02-10 10:44:05 -05:00
qazal	d426f1ad6e	don't open devices in lowering (#9008 )	2025-02-10 15:28:51 +01:00
qazal	cfd3db7862	construct the schedule sink 2 (#8925 ) * work * delete preload * fix metadata * this can keep existing * assign pruning * dedup early * bfs * cycle asserts * move assign check * once	2025-02-10 22:23:02 +08:00
nimlgen	3e005ca0c2	am: resize bar0 to max supported (#9006 )	2025-02-10 16:48:44 +03:00
nimlgen	07cb7e701c	am: fix gfx usage at 100% (#9003 ) * am: fix gfx usage at 100% * not need * not needed * fix power con * not supported on 7600	2025-02-10 16:48:23 +03:00
nimlgen	f91409f038	am: fix proclogs (#9004 )	2025-02-10 16:38:58 +03:00
qazal	cd77e51810	fix tensor realization bug in #8975 (#8984 ) * fix tensor realization bug in #8975 * that's a reshape now * work * works * give those tests better names * test when multiple mops result in the same ShapeTracker * test_become_existing_buf_complex is enough * that too	2025-02-10 13:51:30 +01:00
qazal	b17ec42b56	remove const_arg (#9002 ) * remove const_arg * use -m pytest * remove test_const_arg test, variable arg on CONST does not exist. * use base in test_const_dtype	2025-02-10 12:45:11 +01:00
George Hotz	0568720a68	delete revectorize (#9000 ) * delete revectorize * test vectorized LLVM/CLANG * idk about that * was that the segfault?	2025-02-10 18:32:35 +08:00
qazal	fd9f9ec772	realized base tensors become RESHAPE(BUFFER) [pr] (#8994 )	2025-02-10 10:17:54 +01:00
George Hotz	910ae260cd	dsp float4 fold + revectorize [pr] (#8995 ) * dsp float4 fold [pr] * revectorize * fix reg issue * no bool vectorize * cleanups * no need for that	2025-02-10 12:14:32 +08:00
George Hotz	e618efce22	COMMUTATIVE flipping is only for ints (#8996 ) * COMMUTATIVE flipping is only for ints [pr] * no pr * comm fixes this	2025-02-10 12:01:28 +08:00
George Hotz	2983285315	use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993 ) * use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] * add quantize test to dsp * fix tests * older onnx * debug, let's see what's happening	2025-02-10 11:07:35 +08:00
chenyu	9119716761	update Tensor.maximum (#8992 ) now it's just broadcast and UOp.maximum	2025-02-09 21:26:27 -05:00
nimlgen	88add71c25	amd: increase sdma copy size (#8989 ) * amd: increase sdma max copy size * rm this * fix * fx * ops	2025-02-09 20:53:35 +03:00
qazal	7eba5fb413	Tensor.empty is RESHAPE(BUFFER) (#8987 ) * empty is RESHAPE(BUFFER) * eh * add test_empty_buf * can we unsupport this * linter * Revert "can we unsupport this" This reverts commit `0f71e1aadb`.	2025-02-09 18:42:51 +01:00
qazal	44479f8ad6	raise ValueError in view reshape for negative dims [pr] (#8988 )	2025-02-09 17:27:15 +01:00
nimlgen	c6c2373bc0	replace libpciaccess autogen with just pci regs (#8983 ) * replace libpciaccess autogen with just pci regs * add pci.py	2025-02-09 18:40:45 +03:00
qazal	55351ebb31	minimal failing test for #8975 [pr] (#8982 )	2025-02-09 14:10:37 +01:00
nimlgen	e5a3f60fc2	am: remove libpciaccess dep (#8980 ) * am: remove libpciaccess dep * offset in mockhwiface * op * fake regions	2025-02-09 16:06:55 +03:00
nimlgen	52a69dd5e9	Revert "use am in training benchmarks (#8965 )" (#8981 ) This reverts commit `107e616857`.	2025-02-09 15:43:45 +03:00
George Hotz	0b26cee2f1	fix some slow tests [pr] (#8979 )	2025-02-09 15:57:04 +08:00
George Hotz	208097d488	try reducing testing deps [pr] (#8976 ) * reduce testing deps * break out test models * add PR to models, add models to metal * okay, not that * mac cleanup * mac typo * other typo	2025-02-09 15:22:32 +08:00
George Hotz	6ffee2fca9	reduce speed example [pr] (#8978 ) * reduce speed example * fast like a nascar	2025-02-09 14:13:59 +08:00
Samuel Ayala	ac3765c043	use getpass instead of os.getlogin() (#8972 )	2025-02-08 23:29:26 +03:00
qazal	308516e439	fix viz paginate + cleanups [pr] (#8973 ) * fix viz paginate [pr] * cleanups * remove the extra font definition * more work * none for the first graph	2025-02-08 20:26:57 +01:00
nimlgen	107e616857	use am in training benchmarks (#8965 ) * am in training benchmarks * fix * not needed anymore	2025-02-08 20:20:47 +03:00
nimlgen	79de980565	am: do not fork pci bars (#8969 )	2025-02-08 19:03:17 +03:00
chenyu	0cac941af1	move xpow to sym instead of late_rewrite (#8968 ) does not need to be in late_rewrite and can be simplified further	2025-02-08 10:09:24 -05:00
qazal	e7182bbb2c	fix "fatal bad object" log in process replay [pr] (#8966 )	2025-02-08 11:57:38 +01:00
uuuvn	9b9c1e14da	Late MTLCompiler load (#8963 ) Moved loading MTLCompiler (and trying to load normal llvm before it) to MetalCompiler, like in CPUProgram with helper	2025-02-08 17:29:23 +08:00
George Hotz	a3c78d47b3	speed docs + upgrades [pr] (#8964 ) * add some docs about speed [pr] * better torch gemm * enable locals on llvm/clang * disable locals for beam speed on LLVM/CLANG * 0x20 alignment in llvm allows ymm use	2025-02-08 17:28:52 +08:00
George Hotz	5bdd6a1cc4	increase CI speed with more runners [pr] (#8961 ) * increase CI speed with more runners [pr] * splits + cleanups [pr] * more runners * need that dep * split that too * can't be minimal * move test readme * bugfix + naming * one more split * bump to 22.04	2025-02-08 09:04:36 +08:00
nimlgen	11d50324d8	am: tiny cleanups (#8958 ) * am: start cleanups * am	2025-02-07 23:44:43 +03:00
chenyu	cfd28517df	move pow folding tests to test_schedule [pr] (#8955 ) not really belongs to test_const_folding	2025-02-07 12:51:43 -05:00
George Hotz	c2b4c43edb	handle stride 0 reduce (#8068 ) * handle stride 0 reduce [pr] * more test fixups * a few more --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-02-07 15:40:58 +01:00
qazal	cf21e27d78	little better VIEW simplifier pattern [pr] (#8954 )	2025-02-07 12:55:54 +01:00
qazal	329013f577	fix UOp.metadata on KERNEL op [pr] (#8953 ) * fix UOp.metadata on KERNEL op [pr] * hotfix: is not None	2025-02-07 12:40:11 +01:00
George Hotz	4de084a835	cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952 ) * cleanup ci [pr] * testing_minimal * add hypothesis to minimal * fail tiktoken import okay * add LLVM speed test * llvm speed w/o beam	2025-02-07 19:01:59 +08:00
uuuvn	6090cbe3be	Try to open llvm first when opening metal (#8949 ) * Try to open llvm first when opening metal * Use more specific FileNotFoundError	2025-02-07 18:58:37 +08:00
uuuvn	67b70e4f6c	Fix incorrect __del__ (#8950 ) CPython doesn't make any guarantees about order in which globals like `msg` or `libobjc` are destroyed when the interpreter shuts down https://github.com/tinygrad/tinygrad/pull/8949 triggered the unlucky ordering which lead to a bunch of errors at exit There is also a bunch of other places where similar problems exist	2025-02-07 18:21:44 +08:00
George Hotz	9ed2d0dfa2	refactor into subactions (#8946 ) * refactor into subactions * this work? * add shell * move install opencl * valid? * support mac os x * refactor other osx * fix linux/osx * fixes * cleanups * used everywhere * no quotes * quotes on true * bugfixes * this run? * hardcode * that * process replay action * fix checkout * restore to branch * fix caching * fix osx python cache * does replace function exist * Revert "does replace function exist" This reverts commit `622177c5a0`. * Revert "fix osx python cache" This reverts commit `e70d55cd93`. * user on osx to fix untar issue * that	2025-02-07 18:06:44 +08:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
George Hotz	dbda72f91d	hotfix: raise line limit to 11200 for new webgpu backend	2025-02-07 14:29:20 +08:00
George Hotz	b1e1319972	ci speed on the enterprise plan [pr] (#8942 )	2025-02-07 11:18:12 +08:00
Bhavya Gada	3b67712892	[bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937 ) * fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple * remove expectedFailure --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 10:07:54 +08:00
George Hotz	f54242849d	failing test for the devectorize [pr] (#8940 ) * failing test for the devectorize [pr] * add DEVECTORIZE to method_cache	2025-02-07 09:44:54 +08:00
nimlgen	ee1a0fb8ec	am_smi: print device name (#8939 )	2025-02-07 03:01:25 +03:00
chenyu	a092b6395d	Tuple -> tuple, List -> list [pr] (#8936 )	2025-02-06 14:21:19 -05:00

1 2 3 4 5 ...

7817 Commits