tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
terafo	aa60feda48	Fix naming conflict with huggingface datasets (#1161 ) * Rename in files * Move files * Moved to extra/datasets as suggested * Changes to files * Fixed stupid mistake --------- Co-authored-by: terafo <terafo@protonmail.com>	2023-07-07 10:43:44 -07:00
Yahya Lmallas	fd66d1ca00	fix Tensor.manual_seed() default to wrong type (#1168 ) * fix Tensor.manual_seed() default to wrong type None while it should be int * remove that tests	2023-07-07 10:42:48 -07:00
Stan	9b6e57eccd	helpers.py: improved test coverage + exception handling (#1165 ) * Fixes + improved test coverage for helpers.py - added exception handling in `proc`, if an exception was thrown, the thread would hang - made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread * Made `_early_exec_process` catch any Exception Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output` * Fixed `from tinygrad.helpers import Timing` import oops, for some reason my IDE cleaned that import from extra/helpers. * Fixed import in llama.py Another one that I skipped by accident, mybad * Extracted a class for tests of early exec * Normalize line endings, windows uses /r/n * Made `cross_process` not a daemon	2023-07-07 10:26:05 -07:00
Kunwar Raj Singh	8391648822	Over 90% on CIFAR with examples/hlb_cifar10.py (#1073 ) * fix eval, lr decay, best eval * 82.27 * 82.64 * 82.79, reproducable * add lr sched, 85.26 * 87.42 * 87.94 * 87.42 * tta with flip * training flip aug * refactor * using Tensor for LR is faster * 89.5 * refactor, flip only train set * 90.01 * 90.64 * eval jit * refactor * only JIT model * fix eval JIT * fix eval JIT * 90.82 * STEPS=900 reaches 90.22 * TTA envvar * TTA default 0 * fully jit training * refactor optim * fix sched * add label smoothing * param changes * patial gelu * OneCycle with pause * gelu maybe works * 90.12 * remove pause lr * maybe fix lr schedulers * scheduler test passing * comments * try mixup * shuffle! * add back the missing last eval * fix shuffle bugs * add mixup prob * fix mixup prob * 90.19 * correct mixup * correct mixup * correct mixup * 90.24 * 90.33 * refactor, add type hints * add gradient clipping * maybe fix test * full JIT * back to relu for now * pass mixup prob as param * add typehints * maybe CI works * try erf gelu * CI, types * remove useless import/ * refactor optim * refactor optim * try leakyrelu * try celu * gelu * 90.67 * remove grad clip * remove grad clip tests * revert params * add test for OneCycleLR * 90.62 * fix eval timing * fix eval timing again * so where i calculate mixup_prob matters --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-07-06 20:46:22 -07:00
Barath	c5aea13a65	Fix evaluation stage in examples/transformer.py when using CUDA (#1150 ) * make test data as contiguous array * standardise contiguous array for all input data in cuda ops * swap to x.ravel	2023-07-06 18:07:10 -07:00
Rayan Hatout	9975f24452	Fold expand preceding reduce if the reduction is on the same axis as the expansion (#1134 ) * fold expands that precede a reduce if the reduction is on the same axis as the expansion * add deterministic test for SIMPLIFY_SUM_RESHAPE_EXPAND_SUM optimization * add a test case to make sure we don't fold reduce-expand-reduce on different axes	2023-07-06 13:41:05 -07:00
cheeetoo	f109af3cbb	Don't save parents unless needed (#1142 ) * don't save parents unless requires grad * keep del ctx since idk	2023-07-05 18:11:57 -07:00
Eli Frigo	801564f31b	Remove POW llop and add SQRT llop (#1104 ) * fixed division by zero for fast operations * made et closer to 0 * replace POW llop with SQRT * updated mlops to swap SQRT and POW llops * updated hlops to swap POW and SQRT * added sqrt llop to cpu runtime * added sqrt llop to cstyle codegen * added POW llop to llvm ir codegen * added SQRT llop to torch runtime * moved pow from mlops to hlops * found a better way to do reverse pow * fixed indentation * added SQRT llop to triton * update docs to match new llops * removed POW operator from assembly codegen * added sqrt and rsqrt to pow hlop * rewrote pow function in tensor.py * Adjust tolerance * Adjust for adamw * Reduce for Adam too * removed accidental leftover code * removed all of accidental code * added rsqrt test * removed pow from mlops again it was added back when resolving merge conflicts --------- Co-authored-by: Jacky Lee <jla524@sfu.ca>	2023-07-05 18:07:58 -07:00
cloud11665	b7369ffcff	add ptx formatter + syntax highlighter (#1128 )	2023-07-05 17:56:09 -07:00
Reza Rezvan	d1356cac27	Fix: Jacobian tests [WIP] (#1126 ) * Fix: Jacobian tests; num_jacobian either bugged or not accurate enough; * Fix: Jacobian tests; * Fix: Gradcheck;	2023-07-05 15:36:22 -07:00
nimlgen	d363d25ee2	fix imports for examples/transformer.py (#1136 )	2023-07-05 08:15:13 -07:00
Mehmet Kuzucu	c3173ff281	Add return statement to the train function (#1135 ) add a return statement to the train function in order to provide access to the losses and accuracies lists	2023-07-05 08:13:38 -07:00
wozeparrot	981d4980c4	feat: reword contributing (#1131 )	2023-07-04 22:17:47 -07:00
George Hotz	793a670187	from tensor cores + lb touchup (#1127 )	2023-07-04 15:45:20 -07:00
George Hotz	2f968f8547	ignore cloudpickle type for local mypy	2023-07-04 13:51:20 -07:00
George Hotz	87d21ea979	examples: simple conv bn	2023-07-04 13:50:26 -07:00
Reza Rezvan	535224ac20	Remove float64 (#1101 ) * Refactor: Remove float64 * Refactor: Remove unused imports * Refactor: Remove float64 * Refactor: Remove float64 * Refactor: Exclude float64 onnx backend * Add: Skip jacobian and gradcheck tests;	2023-07-04 08:40:51 -07:00
Daniel Hipke	b4ce23e4b8	Make cross_process use cloudpickle (#1118 ) * fix syntax issues in imagenet_download.py * use cloudpickle in cross_process to make it work in Python 3.9+ * add cross_process test * prevent unpickling on every function call * add cloudpickle to setup.py * add support for args/kwargs	2023-07-04 00:47:34 -07:00
George Hotz	c709dec8b5	gelu: weird test was broken for metal	2023-07-04 00:43:54 -07:00
George Hotz	daf8e1942f	sigmoid: test large postive also and add note	2023-07-04 00:18:31 -07:00
Kunwar Raj Singh	9e6067378f	Broken Sigmoid backward: Add test and mlop for Sigmoid (#1113 ) * Add failing sigmoid test * update more tests * add mlop for sigmoid * add back test * math.log(math.e) = 1 * remove divides --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-07-04 00:14:22 -07:00
Daniel Hipke	d58a9603ab	Create COCO data directory if it doesn't exist. (#1114 ) * Create COCO data directory if it doesn't exist. * update paths to support windows	2023-07-03 18:15:53 -07:00
Anselm Coogan	a22aad7d32	Use generators instead of lists in `any`s and `all`s (#1111 ) * Use generators in any(..) instead of lists for better best-case * Use generators in all(...) instead of lists * enable R1729 in .pylintrc * revert import sorting --------- Co-authored-by: Anselm Coogan <anselm@scandit.com>	2023-07-03 16:06:06 -07:00
tricky-labyrinth	fd98f6cffa	Small fix to abstractions.py so it runs on Windows without throwing an AttributeError (#1109 ) Co-authored-by: Tricky Labyrinth <trickylabyrinth@gmail.com>	2023-07-03 13:44:49 -07:00
Mike Ovyan	651d080594	[perf] Replace more list comprehension with * (#1106 ) * [perf] Replace more list comprehension with * * comeback * final fix? * blind me * kill me * ? * rev * [none]	2023-07-03 10:49:23 -07:00
Frank Pinnola	2071e53da8	Handle broadcast flag on gemm (#1103 )	2023-07-02 22:15:07 -07:00
Taras Tsugrii	cbb5c655e5	[tensor][perf] Replace list comprehension with . (#1102 ) It's more concise, idiomatic and faster: ``` In [8]: %timeit [1 for _ in range(100)] 2.12 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) In [9]: %timeit [1] 100 515 ns ± 5.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) ```	2023-07-02 18:34:23 -07:00
David Hou	363fbfc2e4	do not emit loop end code for global+local loops in assembly kernel (#1100 )	2023-07-02 18:33:57 -07:00
Reza Rezvan	8ae9a054ae	Refactor nn.optim (#1091 ) * Refactor: nn.optim.py * Refactor: nn.optim.py; Fix all tests * Refactor: Replace all optim.get_parameters() * Refactor: Revert list comp. * Refactor: Replace optim.get_state_dict * Refactor: Change quickstart.md	2023-07-02 15:07:30 -07:00
Eli Frigo	10f1aeb144	fixed broken link (#1097 )	2023-07-02 15:06:59 -07:00
Rob Grossman	c8ddc34368	include missing queue in thneed load (#1095 )	2023-07-02 12:33:59 -07:00
nmarwell26	12ce68c1ee	Renamed examples/yolo to examples/vgg7_helpers because that directory contains no yolo-related code and only helper code for vgg7. This was confusing to a new user when trying to understand the examples. (#1086 )	2023-07-01 12:04:28 -07:00
Rob Grossman	2533a992e7	remove unused imports in models (#1088 )	2023-07-01 12:04:19 -07:00
geohotstan	575f75f613	hello (#1084 )	2023-07-01 01:29:35 -07:00
foreign-sub	574cbda979	Quickstart (#1015 ) * fix quickstart md * add quickstart to ci	2023-06-29 13:26:58 -07:00
Roelof van Dijk	542b2d93a5	Perf/cache string ops (#1078 ) * perf: remove extra function, include in cached getitem * perf: only calculate hash once per node --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-06-29 13:23:11 -07:00
George Hotz	e234bf2298	hip matmul : add K support	2023-06-28 19:54:33 +00:00
George Hotz	0e93b9642a	hip matmul	2023-06-28 19:21:01 +00:00
Jacky Lee	754e54ebb9	Fix Tensor ceil and floor for whole numbers (#1071 ) * Works on non-special numbers * Test different cases	2023-06-27 23:22:17 -07:00
George Hotz	1f5d45ca8c	imagenet loader minor cleanups	2023-06-28 05:08:09 +00:00
George Hotz	6ec0a24706	imagenet eval in 1 min 28 sec	2023-06-28 04:23:26 +00:00
George Hotz	9fabdbd054	speed (#1070 )	2023-06-27 20:28:57 -07:00
George Hotz	d16c16ec28	new upcast works (#1066 ) * new upcast works * float4 try * fix unaligned float4 * disallow unaligned access * upcast dim * maybe good now * fix gpu half * vstore_half4 * fix deep image bugs * improve symbolic to fix issues * fix symbolic * cl test * this maybe * gcd of 1 is 1 * real fix for old python * improve fuzzer	2023-06-27 19:34:53 -07:00
ernie	4d703be6d7	fix typo (#1065 )	2023-06-27 10:56:54 -07:00
George Hotz	70c07dfea5	5k line max (#1064 )	2023-06-27 10:53:18 -07:00
George Hotz	c8d87eb8d4	strip whitespace	2023-06-27 10:11:43 -07:00
Rayan Hatout	23648538fa	fix folding of float4 add/mul (#1060 )	2023-06-26 20:59:29 -07:00
George Hotz	a98e361da0	torch speed test, add add	2023-06-26 18:55:27 -07:00
George Hotz	3e33befc1d	realize hotspots (#1059 ) * realize hotspots * no str check * minor changes * make this an assert * faster and more readable * nicer self.buffers * tests for weak op + LAZYCACHE=0	2023-06-26 18:31:18 -07:00
George Hotz	2977fb17f6	various touchups (#1058 ) * op isn't optional * barrier + named local buffers * end global and local loop together to avoid useless if statement * better comments	2023-06-26 15:41:23 -07:00

1 2 3 4 5 ...

2078 Commits