tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 14:28:09 -05:00

Author	SHA1	Message	Date
eliotgolding	bb5ded85cc	Don't rewrite idiv to rshift when numerator is negative (#8885 ) * more conditions for shift rewrite mul/idiv * make ptx test uint so the new condition is true * delete idiv test * rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division * mul/div by 2**(large count) is unsupported anyway	2025-02-05 07:47:33 +08:00
pedro	666b6149bc	Use full soname for libgcc_s in CPUProgram (#8642 ) (#8896 ) Number after .so is abi version, it is always 1 for libgcc_s. Most linux systems set default library versions via symlinks that are simply followed to get actual elf, however conda does it via linker scripts which ctypes doesn't follow (below contents of libgcc_s.so): ``` /* GNU ld script Use the shared library, but some functions are only in the static library. */ GROUP ( libgcc_s.so.1 -lgcc ) ``` ctypes.util.find_library thinks that this is the actual elf and ctypes.CDLL just loads this text file as a shared library. The result is: ``` File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__ self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header ``` Co-authored-by: uuuvn <83587632+uuuvn@users.noreply.github.com>	2025-02-05 07:45:48 +08:00
chenyu	48349efdc1	copy is already contiguous (#8886 )	2025-02-04 17:53:33 -05:00
nimlgen	4c28235bd1	am: remove hardcodes (#8895 ) * am: remove hardcodes for 7900 * h	2025-02-05 00:52:53 +03:00
geohotstan	057c70b05f	add onnx_helpers to extra and add ort validate to benchmark_onnx (#8890 ) * start * log severity * only change this * change abstraction so it's more usable for huggingface --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-04 16:36:01 -05:00
chenyu	89eebd4bfb	pow cleanups (#8894 ) more readable	2025-02-04 15:52:57 -05:00
qazal	7a9e3247c2	simple start to the Kernel UOp [pr] (#8893 ) * simple start to a kernel [pr] * add the sched_sink and spec * rename kernels to sinks * pylint complains	2025-02-04 21:48:15 +01:00
qazal	b4e8878e01	remove tensor_uops tracking from ScheduleContext [pr] (#8892 ) * remove tensor_uops tracking from ScheduleContext [pr] * cleaner	2025-02-04 20:34:15 +01:00
qazal	6a0da51ed0	truncate process replay logs [pr] (#8891 ) * truncate process replay logs [pr] * work * max_lines * bump to 1K	2025-02-04 20:26:48 +01:00
qazal	c7c279a6bd	unbind ShapeTrackers without maintaining a cache [pr] (#8889 ) * replace with a try [pr] * check vars * ahaa	2025-02-04 19:43:41 +01:00
chenyu	61de654efa	minor shard cleanup [pr] (#8888 )	2025-02-04 13:22:31 -05:00
qazal	6ec7f1b00f	replace UPat(name="x") with UPat.var("x") [pr] (#8887 ) * replace UPat(name="x") with UPat.var("x") [pr] * a few more	2025-02-04 19:12:40 +01:00
qazal	c26b06eaeb	delete fold_img_cast [pr] (#8875 )	2025-02-04 18:43:45 +01:00
qazal	acf0baefee	process replay from tensor uops to kernel ast (#8883 ) * process replay from tensor uops to kernel ast * this dedups * switch back to string key	2025-02-04 18:09:20 +01:00
Ignacio Sica	dcf104ee68	ptx wmma render refactor (#8873 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-04 11:01:23 -05:00
qazal	b92f36179d	don't use set in schedule + add GroupOp.All [pr] (#8882 ) * don't use set in schedule + add GroupOp.All [pr] * update that	2025-02-04 08:19:27 +01:00
George Hotz	56fa5c1191	dsp simulator (#8869 ) * dsp simulator * progress * fix * close on test tiny * working * less waste * line savings * Device DSP compiler * mock DSP at the bottom * DSP tests * docker caching * test update * need load * skip that test for CI DSP * last touch * ugh	2025-02-04 09:45:04 +08:00
chenyu	836cf42c2e	fix rand_like for multi (#8880 )	2025-02-03 19:00:14 -05:00
chenyu	746d899dbd	move multi axis to property (#8879 ) also updated tests so that axis is known prior to realize	2025-02-03 16:02:09 -05:00
nimlgen	fa90079370	amd: reallocate scratch (#8872 ) * amd: reallocate scratch * use it * oops * allocate default * mypy * ops * address realloc from none better * types correct * this better * ops * rm	2025-02-03 23:21:37 +03:00
chenyu	ec447a31e7	factor out get_axis in multi [pr] (#8878 ) ALU/REDUCE_AXIS/RESHAPE/PERMUTE can change axis. prereq to move this logic to ops.py	2025-02-03 14:39:08 -05:00
chenyu	cce26009f0	simplify pow to not call cos (#8877 ) use %2 instead of cos to detect even numbers	2025-02-03 12:54:18 -05:00
geohotstan	d1aa9f30bc	copy onnx_ops into onnx (#8876 ) * just copy it over * make OnnxOps a global var * some small style stuff * rerun CI but also some small clean up * some comments	2025-02-03 12:15:07 -05:00
Ali Ladjevardi	73c75d6ee1	DEFINE_LOCAL variable names start from temp0, not temp1 (#8870 )	2025-02-03 22:50:38 +08:00
qazal	b6c617272a	New schedule.py Order [pr] (#8874 )	2025-02-03 14:59:11 +02:00
George Hotz	b075aefc12	hotfix: revert llvm host_arch	2025-02-03 16:46:19 +08:00
George Hotz	a5753095dc	llvm cleanups [pr] (#8867 )	2025-02-03 15:32:41 +08:00
George Hotz	f484db0e63	dsp cleanups [pr] (#8866 )	2025-02-03 15:18:53 +08:00
George Hotz	af2c2837f6	hotfix: skip broken test, add KERNEL Op	2025-02-03 14:02:55 +08:00
qazal	565c37c681	start simplifying the scheduler context [pr] (#8830 )	2025-02-02 18:11:36 +02:00
qazal	d64af3c884	reorder simplifier and grouper logic in scheduler [pr] (#8861 )	2025-02-02 17:19:52 +02:00
qazal	83a904aaad	just schedule in test_recursive_pad [pr] (#8860 )	2025-02-02 15:01:24 +02:00
uuuvn	6dadb60c93	LLVM JIT (+autogen llvm instead of llvmlite) (#8486 ) * LLVM JIT * Autogen LLVM * Update autogen * Move things around * even more non-determinism * windows * more autogen weirdness * more windows stuff * blind windows development try 2 * more blind windows development * even more blind windows development * maybe i should just set up a windows vm... * why can't everyone just use sysv abi? * cleanup debugging stuff * unused import * icache flushing isn't required on x86 * merge jit_nt and jit_unix * more * Temporary hack to not segfault * better error * bad conflict resolution * Attempt to simplify support/llvm.py * More refactoring --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-02 19:52:42 +08:00
FICTURE7	66306b5321	Fix disk tensor assignment (#8855 ) * Add test for disk tensor assignment failure * Fix disk tensor assignment --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-02-02 13:50:34 +02:00
Ali Ladjevardi	6e523e4d17	Remove size arg from DEFINE_LOCAL [pr] (#8845 ) * remove size arg form DEFINE_LOCAL * make mypy happy * whitespace * dont change code in extra * revert to temp1 to pass pr	2025-02-02 19:47:32 +08:00
nimlgen	7841852870	hcq pci signal fuzzer (#8854 ) * hcq pci signal fuzzer * kk * correct	2025-02-01 23:42:27 +03:00
qazal	dc34a4146f	better process_replay context print [pr] (#8856 ) * better process_replay context print [pr] * test: revert push cast * Revert "test: revert push cast" This reverts commit `38a2aef6f8`.	2025-02-01 21:50:23 +02:00
chenyu	5b1fc4dcb2	push cast to branches in UOp where (#8850 )	2025-02-01 13:55:24 -05:00
chenyu	73ee2d74c0	raise RuntimeError for int base pow (#8852 ) current implementation is not precise and blocking other simplification change	2025-02-01 12:11:57 -05:00
qazal	72e1f41f8e	add unbind_vars pattern matcher (#8851 ) * add unbind_vars pattern matcher [pr] * this can be cvar * this is empty	2025-02-01 18:25:44 +02:00
nimlgen	b3fa76419a	am: move queues to gpus (#8848 ) * am: fix * add flsg for thos * do not depend on host parameter,	2025-02-01 18:02:52 +03:00
George Hotz	42d7c800a1	hotfix: add missing tinychat fonts + other assets	2025-02-01 09:34:44 +08:00
George Hotz	431a86615d	fix multi Ops.CONTIGUOUS_BACKWARD [pr] (#8843 )	2025-02-01 09:21:31 +08:00
Ahmed Harmouche	07d3676019	weights_only=False (#8839 )	2025-01-31 17:16:47 -05:00
nimlgen	741bbc900d	Revert "am: queues allocated on gpus (#8836 )" (#8837 ) This reverts commit `7bbb568dec`.	2025-01-31 22:53:41 +03:00
nimlgen	7bbb568dec	am: queues allocated on gpus (#8836 ) * am: fix * add flsg for thos	2025-01-31 22:14:43 +03:00
chenyu	1f730ae8f8	remove retain_graph in Tensor.backward [pr] (#8835 ) not used. gradient accumulation works directly	2025-01-31 13:41:26 -05:00
chenyu	0a59db936a	raise RuntimeError in schedule_step if not Tensor.training [pr] (#8834 )	2025-01-31 12:03:04 -05:00
qazal	af4f9d1aa9	use matchers to verify AST shape [pr] (#8828 ) * use matchers to verify kernel AST [pr] * work * use swizzle_cnt * add comment * imports * modified_ast comment * brief	2025-01-31 09:17:42 +02:00
George Hotz	643c09a6c6	tensor uop spec should be in spec.py [pr] (#8827 ) * tensor uop spec should be in spec.py [pr] * err, spec.py * print uops can stay	2025-01-31 13:54:04 +08:00

1 2 3 4 5 ...

7726 Commits