tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-27 07:48:07 -05:00

Author	SHA1	Message	Date
George Hotz	56f44bd10e	move the compiler cache to be global (#2957 ) * move the compiler cache to be global * remove non robust test * remove dead code	2024-01-01 10:59:56 -08:00
George Hotz	063f465604	simpler webgpu (#2956 ) * simpler webgpu * skip that test	2024-01-01 10:28:59 -08:00
Shawn Hagler	fea20d71b3	add `/opt/cuda/include` directory (#2920 )	2023-12-30 08:16:42 -08:00
chenyu	0d6e264c48	cleanup Tensor.triu and Tensor.tril (#2953 ) `.where` does the dtype and shape conversions for 0, no need to use zeros_like	2023-12-29 22:27:18 -05:00
chenyu	e53b96fdbb	fix TC=2 tensor core op test (#2951 ) * print DEBUG for TC=2 in CI * enable TC=2 * no need to check src type * LOAD has side effect * don't push any local buffer * update comment * and BARRIER	2023-12-29 21:39:49 -05:00
chenyu	ad4472e6e8	cleanup llama apply_rotary_emb and other helpers (#2950 ) * cleanup llama apply_rotary_emb and other helpers used ellipsis and other higher level tensor function. disabled the half @ half -> half tensor core as it fails uop dtype checks * keep hip 8x8->8 wmma	2023-12-29 11:39:15 -05:00
chenyu	61e255d197	use max for gpt2 and llama (#2949 ) not using argmax yet because there's a multinomial outside of function.	2023-12-28 23:26:00 -05:00
chenyu	c7b106bf9c	hotfix float4 only supports float and half (#2948 ) #2942 broke coder	2023-12-28 20:23:52 -05:00
chenyu	2f67f1e580	remove obsolete TODO in beautiful_mnist (#2946 ) the compiler error was due to `error: call to 'max' is ambiguous` when we have max(int, float) in kernel. it was first fixed in `4380ccb1` the non fp32 math PR, and further solidified with dtype refactor	2023-12-28 17:09:23 -05:00
chenyu	50f2e31d26	cleanup float4 grouping in global_load and global_store (#2942 ) * cleanup float4 grouping in global_load and global_store * fix test decorator	2023-12-27 14:10:04 -05:00
chenyu	54629b56d2	minor cleanup in kernel and linearizer (#2937 ) * minor cleanup in kernel and linearizer less long line, spaces and colocate variables * no deadline in hypothesis test	2023-12-26 12:05:32 -05:00
chenyu	820f2e054e	fix PADTO optimization (#2935 ) the correct condition is that PADTO cannot be applied to reduce axis, not Reduce.MAX in ops. even for Reduce.SUM it's possible that the reduce axis had a div before, and the padded 0 became inf then sum over it is incorrect.	2023-12-25 22:52:49 -05:00
qazal	dca5e4fe74	tensor == tensor should be bool (#2916 ) * return bool * add tests to the type spec * fix multinomial * fix tril * fix round * fix NegativeLogLikelihoodLoss * rm debug * webgpu * more webgpu * bitwise or for adding two bools * onnx ops dont need to cast anymore * Revert "bitwise or for adding two bools" This reverts commit `b413babffa`. * workaround for metal neg * just the tests in the type spec	2023-12-25 12:38:47 -05:00
chenyu	8a8aed23d2	test dtypes of return values of cumsum, argmax/min, multinomial (#2933 ) * test dtypes of return values of cumsum, argmax/min, multinomial cumsum behaves like sum, and functions that return an index return in dtypes.default_int * because webgpu is different	2023-12-25 11:33:17 -05:00
qazal	12996d3a7d	green linearizer asserts for ops (#2800 ) * these asserts should pass * fix that assert * ALU dtypes * acc dtype for group_for_reduce * cast image ALUs to the base dtype * remove all casts from linearizer * fix argmax * fix multinomial * fix __getitem__ * Revert "fix __getitem__" This reverts commit `62ad719bfa`. * fix MemBuffer outputs being wrong when there is an arange + ALU with a different dtype eg. fancy slicing (int, float), bert embeddings (int, long) this should be fixed in lazy instead of having to break the kernel * cleanup argmax fix * fix matmul in ints cast in the end * fix llama * skip wrong hardcoded asts in the worlds dataset * fix llama p2 * cleanup missing parts of the diff --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-25 10:41:54 -05:00
chenyu	1fb815e77e	hotfix fix coder. RMSNorm cannot have float16 input (#2932 ) * hotfix fix coder. RMSNorm cannot have float16 input * update real world test due to new kernels * more type casts	2023-12-25 02:28:11 -05:00
chenyu	b469fe3723	add CMPEQ (#2931 ) * CMPEQ * work * fix onnx * fix round * fix webgpu * prettier * no PADTO in actions	2023-12-25 00:15:55 -05:00
Will	016aebcd84	Fixed Tensor.randint() not accepting tuple shapes (#2923 ) * ww/Fixed Tensor.randint() to accept shape tuples () * ww/Wrote a test to cover this typo * ww/Updated Tensor random objects to optionally take (,) or () to be more consistent ww/no lint no worries * ww/Made peace with linter * ww/Added new line can't reduce line size without reducing readablitity * ww/reverted to using .mul	2023-12-24 20:32:26 -05:00
chenyu	2dc99af169	clean up manual cast dtypes in tensor.py (#2930 ) don't need contiguous between two casts, and don't need cast bool into float before mul	2023-12-24 13:03:41 -05:00
Isalia20	8de1fc2539	Einsum space fix (#2927 ) * space removal in formula and a single test to cover it * space in torch einsum as well * replacing spaces in a var formula to support truncating all the spaces	2023-12-24 01:23:27 -05:00
chenyu	b55b55d56e	use at least int32 and uint32 for sum output (#2926 ) * use at least int32 and uint32 for sum output * use the correct type for acc * fix opencl * llvm mulacc	2023-12-24 01:14:54 -05:00
chenyu	d424babe2c	tensor.py cleanup around Tensor.slice (#2921 ) use None for no-op slice and pad	2023-12-22 19:46:39 -05:00
chenyu	089703a390	cleanup test_dtype_alu (#2919 ) wrapped long lines and lowered atol for METAL.sin to 2 since atol of two sins are bounded by 2	2023-12-22 17:29:31 -05:00
chenyu	3ba591c3fd	less outdated abstraction.py (#2917 ) removed some old terms and updated types and code pointers	2023-12-22 15:31:02 -05:00
chenyu	50927defad	s/lazydata.realized/lazydata.base.realized/g (#2914 ) * s/lazydata.realized/lazydata.base.realized/g * not that	2023-12-22 14:45:13 -05:00
chenyu	2783e1b50d	bugfix Tensor.item when it's unbased (#2913 ) it's possible for numel 1 tensor lazydata to be unbased and should call lazydata.base.realized	2023-12-22 13:50:06 -05:00
Oleg Rybalko	c3133adb8c	Disk shm refactor (#2912 ) * better support for platform dependent flags * osx test support * removed unused import and made line length <150 * changed osx ci shm * lstrip in case SharedMemory._name is passed	2023-12-22 09:23:37 -08:00
chenyu	3855432265	don't use numpy to create Tensor(None) (#2909 ) * don't use numpy to create Tensor(None) empty suffices * parentheses	2023-12-22 01:07:44 -05:00
chenyu	50cfb1fb3a	update onnx model links (#2908 ) updated in https://github.com/onnx/models/pull/644	2023-12-22 00:19:41 -05:00
chenyu	1bbeb3fe2f	remove the different rtol / atol for openpilot CUDA in benchmark (#2907 ) not sure what the issue was but seems to be fixed on master	2023-12-21 22:23:39 -05:00
chenyu	a543d8bea8	fuzz default dtypes for some test_dtype tests (#2906 ) * fuzz default dtypes for some test_dtype tests * ocd * setUp and tearDown	2023-12-21 22:00:21 -05:00
wozeparrot	5f3d5cfb02	catch cycles in print_tree (#2891 ) * feat: smaller tree on references * fix: shorter line * fix: huh * fix: should be all * feat: cleaner * fix: extra imports * fix: pass by reference	2023-12-21 18:40:37 -08:00
George Hotz	4432cb17bb	minor cleanups / remove that op (#2905 )	2023-12-21 18:24:20 -08:00
chenyu	fd0ba33b38	onnx_ops formatting cleanup (#2904 ) also removed a case in safe_numpy that always convert 0-dim array to 1-dim	2023-12-21 20:06:06 -05:00
George Hotz	5cac6338a4	apply the multitensor optimizations in lazy.py (#2901 ) * apply the multitensor optimizations in lazy.py * less lines * hack for webgpu * save a line	2023-12-21 13:55:49 -08:00
chenyu	5bf43c9634	reenable one onnx test failed due to dtype (#2902 )	2023-12-21 15:50:02 -05:00
chenyu	677ae7673d	use np.less and torch.lt for CMPLT (#2899 ) also removed one unused output_type	2023-12-21 14:37:24 -05:00
qazal	d2e9245de8	render_locals takes a dtype (#2873 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-21 14:15:28 -05:00
chenyu	6116039f7b	don't match dtype with first input in where (#2898 ) * don't match dtype with first input in where `Tensor([1, 2, 3]).where(1.2, 2.3)` the first `[1, 2, 3]` can directly cast into bool without casting float (in broadcasted) first * cast in one place	2023-12-21 13:02:15 -05:00
chenyu	7dc3352877	increase stable diffusion validation threshold 1e-4 -> 3e-4 (#2897 ) saw a flaky CI failure with 1.1e-4, and 3e-4 is a good number	2023-12-21 11:45:25 -05:00
qazal	24e79e0f53	Move the webgpu CMPLT hack to one place (#2895 ) * move hacks to one place * no casting in mlops, move to tensor * ruff fix	2023-12-21 11:14:56 -05:00
George Hotz	852ef57ba4	fix readme typo	2023-12-21 08:06:24 -08:00
George Hotz	193109a88c	hotfix: compare on ids	2023-12-20 23:47:50 -08:00
George Hotz	f6c7833f9f	fast compare for lazyop (#2893 )	2023-12-20 23:32:27 -08:00
chenyu	1500aca43d	remove output_type in ops_cpu and ops_torch (#2892 ) now the input types are matched and checked in lazy, we can remove these output_type. also remove the usage of least_upper_dtype in ops.py since we can just use the input type	2023-12-21 02:11:27 -05:00
chenyu	2d2c4980fe	assert for elementwise dtypes in lazy (#2888 ) * assert for elementwise dtypes in lazy * no image hack * check dtype of scalar for IMAGE=2	2023-12-21 01:42:32 -05:00
George Hotz	41b2a25be6	Fix exponential behavior in lazyops (#2890 ) * add cache to ast_parse and lazyop builder * add caches	2023-12-20 22:06:50 -08:00
George Hotz	8c4a0f8e15	Fix int child count (#2882 ) * pad ops broke coder * that contiguous fixes it * Update lazy.py * recursive add * fix all * revert that * todo test	2023-12-20 21:06:27 -08:00
chenyu	8a04107d30	move the op casting logic from mlops to tensor try 2 (#2887 ) * unary works * where works * add sub mul * xor div * CMPLT * sparse_categorical_crossentropy * image const * sparse_categorical_crossentropy	2023-12-20 23:50:37 -05:00
George Hotz	7da2325dc7	get_lazyops() -> lazyops (#2884 ) * get_lazyops() -> lazyops * don't compare empty mem	2023-12-20 18:04:49 -08:00

... 143 144 145 146 147 ...

10417 Commits