tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 22:08:08 -05:00

Author	SHA1	Message	Date
chenyu	0dd3ca59cd	simpler ModNode.__mod__ and ModNode.__floordiv__ (#2983 ) `gcd(self.b, b) == b` is equivalent to `self.b % b == 0`. use the same condition and format in __floordiv__ too.	2024-01-02 18:52:42 -05:00
chenyu	c07907e644	grad -> grad_output in mlops for consistency (#2982 )	2024-01-02 18:03:55 -05:00
chenyu	ad0d710ec4	merge apply_opt OptOps.LOCAL and OptOps.LASTLOCAL into one block (#2980 ) and other minor apply_opt cleanups	2024-01-02 16:40:10 -05:00
George Hotz	8de160d08e	hotfix: remove dead code, save lines	2024-01-02 12:52:20 -08:00
chenyu	878e869663	simpler SumNode.__mod__ (#2979 ) * simpler SumNode.__mod__ delegate simplification to individual node * ModNode.__mod__ simplification case * Revert "ModNode.__mod__ simplification case" This reverts commit `73a42205a8`.	2024-01-02 15:09:15 -05:00
chenyu	91ddda244f	minor cleanups in dtype.py (#2978 ) * minor cleanups in dtype.py * all not	2024-01-02 13:42:37 -05:00
chenyu	ff5399f053	move one last dtype test from test_helpers to test_dtype (#2975 )	2024-01-02 12:37:56 -05:00
qazal	deb3722aac	refactor workitems (#2973 )	2024-01-02 09:16:52 -08:00
qazal	01cdd6596f	share hip and cuda (#2972 )	2024-01-02 06:34:24 -08:00
Kevin Herro	bd6a0c90a0	add Tensor.split (#2750 ) * add Tensor.split (#2677) * fix mypy errors * add list support for Tensor.split * fix ruff comments * match tensor.split api * simplify split and test_split --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-01 22:09:04 -08:00
George Hotz	e7a432b479	search refactor (#2969 ) * minor search cleanup * now that saves lines * fix	2024-01-01 17:39:26 -08:00
chenyu	b1d9e54ea3	regenerate kernel ast dataset (#2968 ) added back the log ast function and removed hacks that work around the old dataset	2024-01-01 20:26:17 -05:00
George Hotz	cc2969f690	simpler cstyle (#2966 ) * simpler cstyle * save lines	2024-01-01 16:20:10 -08:00
George Hotz	17f0c3006b	hotfix: do stable diffusion first on mac	2024-01-01 15:38:25 -08:00
chenyu	58d3d5030b	vars_from_ast -> LazyOp.vars (#2965 )	2024-01-01 18:12:38 -05:00
George Hotz	980f421442	hotfix: remove cast from beautiful_cartpole	2024-01-01 15:02:03 -08:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
chenyu	fadaa2ec28	remove type check for LazyOp.src now it's always LazyOp (#2963 ) * remove type check for LazyOp.src now it's always LazyOp also matched MULACC criteria between interpreted and compiled (that probably need to be refactored somewhere else) * disable that test	2024-01-01 17:27:29 -05:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
chenyu	8291986959	Variable.sum -> Node.sum, Variable.ands -> Node.ands (#2961 )	2024-01-01 16:21:28 -05:00
chenyu	3d720b5761	move expand_idx, iter_idxs and expand_node from symbolic to linearizer (#2959 )	2024-01-01 14:41:21 -05:00
George Hotz	e0ecab3797	touchups from multibuffer branch (#2958 )	2024-01-01 11:33:41 -08:00
George Hotz	45247385eb	hotfix: make the line counter correct	2024-01-01 11:01:22 -08:00
George Hotz	56f44bd10e	move the compiler cache to be global (#2957 ) * move the compiler cache to be global * remove non robust test * remove dead code	2024-01-01 10:59:56 -08:00
George Hotz	063f465604	simpler webgpu (#2956 ) * simpler webgpu * skip that test	2024-01-01 10:28:59 -08:00
Shawn Hagler	fea20d71b3	add `/opt/cuda/include` directory (#2920 )	2023-12-30 08:16:42 -08:00
chenyu	0d6e264c48	cleanup Tensor.triu and Tensor.tril (#2953 ) `.where` does the dtype and shape conversions for 0, no need to use zeros_like	2023-12-29 22:27:18 -05:00
chenyu	e53b96fdbb	fix TC=2 tensor core op test (#2951 ) * print DEBUG for TC=2 in CI * enable TC=2 * no need to check src type * LOAD has side effect * don't push any local buffer * update comment * and BARRIER	2023-12-29 21:39:49 -05:00
chenyu	ad4472e6e8	cleanup llama apply_rotary_emb and other helpers (#2950 ) * cleanup llama apply_rotary_emb and other helpers used ellipsis and other higher level tensor function. disabled the half @ half -> half tensor core as it fails uop dtype checks * keep hip 8x8->8 wmma	2023-12-29 11:39:15 -05:00
chenyu	61e255d197	use max for gpt2 and llama (#2949 ) not using argmax yet because there's a multinomial outside of function.	2023-12-28 23:26:00 -05:00
chenyu	c7b106bf9c	hotfix float4 only supports float and half (#2948 ) #2942 broke coder	2023-12-28 20:23:52 -05:00
chenyu	2f67f1e580	remove obsolete TODO in beautiful_mnist (#2946 ) the compiler error was due to `error: call to 'max' is ambiguous` when we have max(int, float) in kernel. it was first fixed in `4380ccb1` the non fp32 math PR, and further solidified with dtype refactor	2023-12-28 17:09:23 -05:00
chenyu	50f2e31d26	cleanup float4 grouping in global_load and global_store (#2942 ) * cleanup float4 grouping in global_load and global_store * fix test decorator	2023-12-27 14:10:04 -05:00
chenyu	54629b56d2	minor cleanup in kernel and linearizer (#2937 ) * minor cleanup in kernel and linearizer less long line, spaces and colocate variables * no deadline in hypothesis test	2023-12-26 12:05:32 -05:00
chenyu	820f2e054e	fix PADTO optimization (#2935 ) the correct condition is that PADTO cannot be applied to reduce axis, not Reduce.MAX in ops. even for Reduce.SUM it's possible that the reduce axis had a div before, and the padded 0 became inf then sum over it is incorrect.	2023-12-25 22:52:49 -05:00
qazal	dca5e4fe74	tensor == tensor should be bool (#2916 ) * return bool * add tests to the type spec * fix multinomial * fix tril * fix round * fix NegativeLogLikelihoodLoss * rm debug * webgpu * more webgpu * bitwise or for adding two bools * onnx ops dont need to cast anymore * Revert "bitwise or for adding two bools" This reverts commit `b413babffa`. * workaround for metal neg * just the tests in the type spec	2023-12-25 12:38:47 -05:00
chenyu	8a8aed23d2	test dtypes of return values of cumsum, argmax/min, multinomial (#2933 ) * test dtypes of return values of cumsum, argmax/min, multinomial cumsum behaves like sum, and functions that return an index return in dtypes.default_int * because webgpu is different	2023-12-25 11:33:17 -05:00
qazal	12996d3a7d	green linearizer asserts for ops (#2800 ) * these asserts should pass * fix that assert * ALU dtypes * acc dtype for group_for_reduce * cast image ALUs to the base dtype * remove all casts from linearizer * fix argmax * fix multinomial * fix __getitem__ * Revert "fix __getitem__" This reverts commit `62ad719bfa`. * fix MemBuffer outputs being wrong when there is an arange + ALU with a different dtype eg. fancy slicing (int, float), bert embeddings (int, long) this should be fixed in lazy instead of having to break the kernel * cleanup argmax fix * fix matmul in ints cast in the end * fix llama * skip wrong hardcoded asts in the worlds dataset * fix llama p2 * cleanup missing parts of the diff --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-25 10:41:54 -05:00
chenyu	1fb815e77e	hotfix fix coder. RMSNorm cannot have float16 input (#2932 ) * hotfix fix coder. RMSNorm cannot have float16 input * update real world test due to new kernels * more type casts	2023-12-25 02:28:11 -05:00
chenyu	b469fe3723	add CMPEQ (#2931 ) * CMPEQ * work * fix onnx * fix round * fix webgpu * prettier * no PADTO in actions	2023-12-25 00:15:55 -05:00
Will	016aebcd84	Fixed Tensor.randint() not accepting tuple shapes (#2923 ) * ww/Fixed Tensor.randint() to accept shape tuples () * ww/Wrote a test to cover this typo * ww/Updated Tensor random objects to optionally take (,) or () to be more consistent ww/no lint no worries * ww/Made peace with linter * ww/Added new line can't reduce line size without reducing readablitity * ww/reverted to using .mul	2023-12-24 20:32:26 -05:00
chenyu	2dc99af169	clean up manual cast dtypes in tensor.py (#2930 ) don't need contiguous between two casts, and don't need cast bool into float before mul	2023-12-24 13:03:41 -05:00
Isalia20	8de1fc2539	Einsum space fix (#2927 ) * space removal in formula and a single test to cover it * space in torch einsum as well * replacing spaces in a var formula to support truncating all the spaces	2023-12-24 01:23:27 -05:00
chenyu	b55b55d56e	use at least int32 and uint32 for sum output (#2926 ) * use at least int32 and uint32 for sum output * use the correct type for acc * fix opencl * llvm mulacc	2023-12-24 01:14:54 -05:00
chenyu	d424babe2c	tensor.py cleanup around Tensor.slice (#2921 ) use None for no-op slice and pad	2023-12-22 19:46:39 -05:00
chenyu	089703a390	cleanup test_dtype_alu (#2919 ) wrapped long lines and lowered atol for METAL.sin to 2 since atol of two sins are bounded by 2	2023-12-22 17:29:31 -05:00
chenyu	3ba591c3fd	less outdated abstraction.py (#2917 ) removed some old terms and updated types and code pointers	2023-12-22 15:31:02 -05:00
chenyu	50927defad	s/lazydata.realized/lazydata.base.realized/g (#2914 ) * s/lazydata.realized/lazydata.base.realized/g * not that	2023-12-22 14:45:13 -05:00
chenyu	2783e1b50d	bugfix Tensor.item when it's unbased (#2913 ) it's possible for numel 1 tensor lazydata to be unbased and should call lazydata.base.realized	2023-12-22 13:50:06 -05:00
Oleg Rybalko	c3133adb8c	Disk shm refactor (#2912 ) * better support for platform dependent flags * osx test support * removed unused import and made line length <150 * changed osx ci shm * lstrip in case SharedMemory._name is passed	2023-12-22 09:23:37 -08:00

1 2 3 4 5 ...

3240 Commits