tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
Francis Lata	6a62ece474	minor cleanups	2025-02-28 15:43:11 +00:00
Francis Lata	074e9f742b	more typing fixes	2025-02-28 15:42:11 +00:00
Francis Lata	2c3417dfce	Merge branch 'master' into retinanet_mlperf	2025-02-23 21:23:28 +00:00
Francis Lata	60c13c2932	update loss calculation for regresionhead and some cleanups	2025-02-23 21:22:33 +00:00
chenyu	1692087db5	_one_hot_along_dim input needs to be int (#9179 ) * _one_hot_along_dim input needs to be int indexing and onehot compare with arange, and non-int dtype is likely a bug	2025-02-20 09:00:43 -05:00
Francis Lata	fc36f09b1e	no need to return loaded keys for resnet	2025-02-19 20:35:03 +00:00
Francis Lata	41378e74a6	model init, hyperparam, and data preprocessing updates	2025-02-19 18:47:06 +00:00
chenyu	975c318dbc	bert use int32 for input ids (#9173 ) original data was int32 for these. float might have caused precision issues	2025-02-19 08:17:27 -05:00
Francis Lata	f61b10450e	Merge branch 'master' into retinanet_mlperf	2025-02-12 15:47:05 +00:00
divinity76	bec4f59ce8	workaround f16 cast ambiguity (#8935 ) for unknown reasons, without this, when trying to execute "Llama 3.2 1B", I get the error below. Fwiw I do not know the performance impact for this change. I can't even get exo running, but this change allows me to /get further/ (before running into a separate issue with vram allocation? story for another day i suppose) error: ``` Failed to fetch completions: Error processing prompt (see logs with DEBUG>=2): Nvrtc Error 6, NVRTC_ERROR_COMPILATION <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ 4 errors detected in the compilation of "<null>". ```	2025-02-11 09:38:56 +08:00
Francis Lata	041481f910	Merge branch 'master' into retinanet_mlperf	2025-02-07 15:28:29 +00:00
chenyu	a092b6395d	Tuple -> tuple, List -> list [pr] (#8936 )	2025-02-06 14:21:19 -05:00
George Hotz	80089536e5	Revert "move llvm_bf16_cast to renderer for CLANG and LLVM [pr] (#8720 )" (#8786 ) This reverts commit `af0452f116`.	2025-01-28 18:59:02 +09:00
mesozoic-egg	af0452f116	move llvm_bf16_cast to renderer for CLANG and LLVM [pr] (#8720 ) * handle bf16 via bitcasting for CLANG and LLVM * On LLVM, skip float16 cast * float32 on llvm lite, float32 elsewhere * code format * trigger pr * move to rewriter --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-28 18:16:43 +09:00
Francis Lata	2177053076	Merge branch 'master' into retinanet_mlperf	2025-01-27 08:07:19 -08:00
chenyu	49b914ee69	simpler bert acc [pr] (#8714 ) logit.log_softmax().argmax(-1) is equivalent to logit.argmax(-1)	2025-01-22 10:32:19 -05:00
Francis Lata	be2e97260d	fix dtype for anchor inside dataloader and fix horizontal flip transformation	2025-01-20 22:45:25 -08:00
Francis Lata	bef389dec7	realize boxcoder's encoding	2025-01-19 15:59:28 -08:00
Francis Lata	40d6752854	adjust regression loss to mask after L1 loss is calculated	2024-12-27 17:41:12 +00:00
Francis Lata	8441130822	Merge branch 'master' into retinanet_mlperf	2024-12-27 15:38:57 +00:00
Francis Lata	5755ac1f72	Fix FC layer ResNet load_from_pretrained error (#8387 ) * validate that FC exists before loading pretrained weights * add test case for ResNet pretrained model without FC layer * remove extra newline * rename test case * reraise exception if not handled by check	2024-12-26 18:11:27 -05:00
Francis Lata	c1a18e13ef	make training work	2024-12-23 21:48:55 +00:00
Francis Lata	46e0bcf412	minor cleanup	2024-12-23 20:22:21 +00:00
Francis Lata	03eda98a31	Merge branch 'master' into retinanet_mlperf	2024-12-23 08:30:46 -08:00
Francis Lata	239d2a7214	explicitly check value for not None (#8382 )	2024-12-23 11:12:39 -05:00
Francis Lata	8defd337d8	fixes after helper refactor cleanup	2024-12-23 05:30:03 +00:00
Francis Lata	0e4fd5c573	revert helper changes	2024-12-23 05:16:13 +00:00
Francis Lata	fb689f7097	move BoxCoder to MLPerf helpers	2024-12-23 05:12:51 +00:00
Francis Lata	972127496e	cleanup losses	2024-12-23 05:06:37 +00:00
Francis Lata	d57f7cc209	fix regression loss	2024-12-21 09:34:10 +00:00
Francis Lata	630267914f	implement regression loss	2024-12-20 23:38:25 +00:00
Francis Lata	971d10361f	revert anchors to use np	2024-12-20 16:56:50 +00:00
Francis Lata	759e1d6cbc	make anchors use Tensors	2024-12-19 22:20:03 +00:00
Francis Lata	9b9e5871ed	remove sigmoid when computing loss	2024-12-18 20:07:49 +00:00
Francis Lata	17eabdd1b2	cleanups + fix dataloader tests	2024-12-13 20:38:23 +00:00
Francis Lata	43e1f33d33	make ClassificationHead loss work	2024-12-13 20:20:27 +00:00
Francis Lata	827b2114e2	update focal loss to support masking	2024-12-10 23:00:32 +00:00
Francis Lata	e5bc0c0485	start some work on classification loss	2024-12-10 16:33:16 +00:00
Francis Lata	bb6f6075cd	Merge branch 'master' into retinanet_mlperf	2024-12-04 21:48:36 -05:00
Francis Lata	c3187087f7	QwQ-32B-Preview support (#7962 ) * load weights with some debugging * start running a prompt * cleanup * optionally permute layers and cleanup * add validation for simple prompt * small cleanup * minor cleanup with formatting download links * add a longer prompt * add timing option * some typings * remove unused arg * reset GlobalCounters * minor cleanups	2024-12-04 21:46:37 -05:00
Francis Lata	b21a25e6fd	Merge branch 'master' into retinanet_mlperf	2024-12-01 07:37:02 -05:00
chenyu	336a9b6bf3	remove dtype from llama precompute_freqs_cis (#7930 ) do the cast based on input in first forward call instead	2024-11-27 22:28:40 -05:00
chenyu	3b26e51fce	Tensor.cummax (#7854 ) generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD	2024-11-22 15:55:02 -05:00
Francis Lata	99efa2cfde	Merge branch 'master' into retinanet_mlperf	2024-11-18 04:42:57 -08:00
geohotstan	f8056a74d6	combine pad2d with pad (#7677 ) * I have pad2d, I have pad, uuh~, pad2dpad~ * fix some small things * strategically placed cast hack * fix more * fix more more * tests * periods	2024-11-14 17:56:02 +08:00
Francis Lata	0aad640465	Merge branch 'master' into retinanet_mlperf	2024-11-12 02:45:23 -08:00
Ahmed Harmouche	9c63c3d8ab	These casts should only happen if these are supported (#7644 )	2024-11-12 07:56:50 +08:00
Francis Lata	bb6f27d2f3	Merge branch 'master' into retinanet_mlperf	2024-11-04 19:19:22 -08:00
chenyu	fb694a63eb	Tensor.erf (#7419 ) the same one used in onnx and the one in bert.	2024-10-30 18:12:28 -04:00
eliotgolding	e920f1d663	Llama 3.2 1B load from GGUF (#7295 ) * gguf 1b-instruct * not needed	2024-10-27 09:29:02 +08:00

1 2 3

142 Commits