tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
Mehmet Kuzucu	c3173ff281	Add return statement to the train function (#1135 ) add a return statement to the train function in order to provide access to the losses and accuracies lists	2023-07-05 08:13:38 -07:00
wozeparrot	981d4980c4	feat: reword contributing (#1131 )	2023-07-04 22:17:47 -07:00
George Hotz	793a670187	from tensor cores + lb touchup (#1127 )	2023-07-04 15:45:20 -07:00
George Hotz	2f968f8547	ignore cloudpickle type for local mypy	2023-07-04 13:51:20 -07:00
George Hotz	87d21ea979	examples: simple conv bn	2023-07-04 13:50:26 -07:00
Reza Rezvan	535224ac20	Remove float64 (#1101 ) * Refactor: Remove float64 * Refactor: Remove unused imports * Refactor: Remove float64 * Refactor: Remove float64 * Refactor: Exclude float64 onnx backend * Add: Skip jacobian and gradcheck tests;	2023-07-04 08:40:51 -07:00
Daniel Hipke	b4ce23e4b8	Make cross_process use cloudpickle (#1118 ) * fix syntax issues in imagenet_download.py * use cloudpickle in cross_process to make it work in Python 3.9+ * add cross_process test * prevent unpickling on every function call * add cloudpickle to setup.py * add support for args/kwargs	2023-07-04 00:47:34 -07:00
George Hotz	c709dec8b5	gelu: weird test was broken for metal	2023-07-04 00:43:54 -07:00
George Hotz	daf8e1942f	sigmoid: test large postive also and add note	2023-07-04 00:18:31 -07:00
Kunwar Raj Singh	9e6067378f	Broken Sigmoid backward: Add test and mlop for Sigmoid (#1113 ) * Add failing sigmoid test * update more tests * add mlop for sigmoid * add back test * math.log(math.e) = 1 * remove divides --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-07-04 00:14:22 -07:00
Daniel Hipke	d58a9603ab	Create COCO data directory if it doesn't exist. (#1114 ) * Create COCO data directory if it doesn't exist. * update paths to support windows	2023-07-03 18:15:53 -07:00
Anselm Coogan	a22aad7d32	Use generators instead of lists in `any`s and `all`s (#1111 ) * Use generators in any(..) instead of lists for better best-case * Use generators in all(...) instead of lists * enable R1729 in .pylintrc * revert import sorting --------- Co-authored-by: Anselm Coogan <anselm@scandit.com>	2023-07-03 16:06:06 -07:00
tricky-labyrinth	fd98f6cffa	Small fix to abstractions.py so it runs on Windows without throwing an AttributeError (#1109 ) Co-authored-by: Tricky Labyrinth <trickylabyrinth@gmail.com>	2023-07-03 13:44:49 -07:00
Mike Ovyan	651d080594	[perf] Replace more list comprehension with * (#1106 ) * [perf] Replace more list comprehension with * * comeback * final fix? * blind me * kill me * ? * rev * [none]	2023-07-03 10:49:23 -07:00
Frank Pinnola	2071e53da8	Handle broadcast flag on gemm (#1103 )	2023-07-02 22:15:07 -07:00
Taras Tsugrii	cbb5c655e5	[tensor][perf] Replace list comprehension with . (#1102 ) It's more concise, idiomatic and faster: ``` In [8]: %timeit [1 for _ in range(100)] 2.12 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) In [9]: %timeit [1] 100 515 ns ± 5.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) ```	2023-07-02 18:34:23 -07:00
David Hou	363fbfc2e4	do not emit loop end code for global+local loops in assembly kernel (#1100 )	2023-07-02 18:33:57 -07:00
Reza Rezvan	8ae9a054ae	Refactor nn.optim (#1091 ) * Refactor: nn.optim.py * Refactor: nn.optim.py; Fix all tests * Refactor: Replace all optim.get_parameters() * Refactor: Revert list comp. * Refactor: Replace optim.get_state_dict * Refactor: Change quickstart.md	2023-07-02 15:07:30 -07:00
Eli Frigo	10f1aeb144	fixed broken link (#1097 )	2023-07-02 15:06:59 -07:00
Rob Grossman	c8ddc34368	include missing queue in thneed load (#1095 )	2023-07-02 12:33:59 -07:00
nmarwell26	12ce68c1ee	Renamed examples/yolo to examples/vgg7_helpers because that directory contains no yolo-related code and only helper code for vgg7. This was confusing to a new user when trying to understand the examples. (#1086 )	2023-07-01 12:04:28 -07:00
Rob Grossman	2533a992e7	remove unused imports in models (#1088 )	2023-07-01 12:04:19 -07:00
geohotstan	575f75f613	hello (#1084 )	2023-07-01 01:29:35 -07:00
foreign-sub	574cbda979	Quickstart (#1015 ) * fix quickstart md * add quickstart to ci	2023-06-29 13:26:58 -07:00
Roelof van Dijk	542b2d93a5	Perf/cache string ops (#1078 ) * perf: remove extra function, include in cached getitem * perf: only calculate hash once per node --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-06-29 13:23:11 -07:00
George Hotz	e234bf2298	hip matmul : add K support	2023-06-28 19:54:33 +00:00
George Hotz	0e93b9642a	hip matmul	2023-06-28 19:21:01 +00:00
Jacky Lee	754e54ebb9	Fix Tensor ceil and floor for whole numbers (#1071 ) * Works on non-special numbers * Test different cases	2023-06-27 23:22:17 -07:00
George Hotz	1f5d45ca8c	imagenet loader minor cleanups	2023-06-28 05:08:09 +00:00
George Hotz	6ec0a24706	imagenet eval in 1 min 28 sec	2023-06-28 04:23:26 +00:00
George Hotz	9fabdbd054	speed (#1070 )	2023-06-27 20:28:57 -07:00
George Hotz	d16c16ec28	new upcast works (#1066 ) * new upcast works * float4 try * fix unaligned float4 * disallow unaligned access * upcast dim * maybe good now * fix gpu half * vstore_half4 * fix deep image bugs * improve symbolic to fix issues * fix symbolic * cl test * this maybe * gcd of 1 is 1 * real fix for old python * improve fuzzer	2023-06-27 19:34:53 -07:00
ernie	4d703be6d7	fix typo (#1065 )	2023-06-27 10:56:54 -07:00
George Hotz	70c07dfea5	5k line max (#1064 )	2023-06-27 10:53:18 -07:00
George Hotz	c8d87eb8d4	strip whitespace	2023-06-27 10:11:43 -07:00
Rayan Hatout	23648538fa	fix folding of float4 add/mul (#1060 )	2023-06-26 20:59:29 -07:00
George Hotz	a98e361da0	torch speed test, add add	2023-06-26 18:55:27 -07:00
George Hotz	3e33befc1d	realize hotspots (#1059 ) * realize hotspots * no str check * minor changes * make this an assert * faster and more readable * nicer self.buffers * tests for weak op + LAZYCACHE=0	2023-06-26 18:31:18 -07:00
George Hotz	2977fb17f6	various touchups (#1058 ) * op isn't optional * barrier + named local buffers * end global and local loop together to avoid useless if statement * better comments	2023-06-26 15:41:23 -07:00
George Hotz	f265e8523a	movement ops aren't really ops (#1056 )	2023-06-26 15:01:28 -07:00
Rayan Hatout	65cbaa3429	no need to slice A and B twice in LLaMa complex multiplication (#1054 )	2023-06-26 14:42:58 -07:00
George Hotz	571089f10e	Back off minor speed stuff for simplicity (#1053 ) * passing in buffers doesn't increase speed * functools.reduce * no more get_buffers	2023-06-26 14:42:17 -07:00
Rayan Hatout	dedbd970aa	Optimizations in lazy.py (#987 ) * optimizations in lazy.py * make mypy happy with stubs and fix the graph import hack * merge conflict in helpers.py	2023-06-26 13:55:42 -07:00
Roelof van Dijk	8bea6b6d35	perf/refactor_weakops (#1052 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-06-26 10:13:33 -07:00
Roelof van Dijk	8c65f9324c	refactor: print formatting for llama timing (#1050 ) * refactor: print formatting for llama timing, report median and individual runs * feat: back to mean * fix: whitespace * fix: add mean to print --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-06-26 09:49:31 -07:00
Roelof van Dijk	c604ef4beb	symbolic.py: faster Node.sum, faster SumNode.div (#1014 ) * refactor: replace isinstance with class check where possible * refactor: faster partition * fix; flake8 * feat: rework node.sum, correct list typing * fix: typo * feat: refactor sum * fix: pylint * refactor: simpler sum and factorize * feat; clean up sumnode div, all cpu tests pass * feat: simplify floordiv, cache factorization * don't factor numnodes at all * python 3.8 functools does not yet have @cache * fix: restore assert * refactor, fix failing tests * fix: address review comments * feat: rework, add specialization, remove cache * fix: remove specialization * feat: no tuple conversion, faster loop --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-06-26 09:47:17 -07:00
Casey Primozic	52b7105f87	Dedup params in `Optimizer` (#1047 ) * Dedup params in optimizer * Passing the same tensor multiple times in the set of learnable params passed to optimizers can result in models completely failing to learn, but no errors are produced. This dedups tensors to avoid the problem. * Fix types * Use new variable to satisfy linter * Use `helpers.dedup` instead of `set()` to dedup params * Add test for duped params in optimizers	2023-06-26 00:49:23 -07:00
Kunwar Raj Singh	5d3310ce56	MaskRCNN Inference (#884 ) * MaskRCNN weights loading * backbone maybe works * backbone works, but resnet body atol 1e-3 * RPN Call, but veryy wrong output * fixed topk * RPN maybe works, not sure about nms * Fix cursed modules * add back editorconfig * Full call, wrong output * Full call works * fix mask * use NMS from retinanet * Removing extra funcs * refactor * readable * Add example to run model * remove filter * Fix split, batched inference is worse * Fix image sizes * Matching reference * merge master * add filter on top detections * cuda backend fixed * add model eval and spec * convert images to rgb * fix eval * simplify examples code * remove extra code * meshgrid using tinygrad * removing numpy * roi align, floor, ceil * remove numpy from level_mapper * remove numpy from pooler * Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference" This reverts commit `4b95a3cb49`, reversing changes made to `98f2b1fa2e`. * roi align gather * fix master merge * revert to old floor, ceil as ints present in domain * use log2 op * fix indexes * weird bug with ints and gpu * weird bug with ints and gpu * refactors, add env var for gather * floor with contiguous, where * refactor topk, sort * remove staticmethod * refactor stride * remove log2 mlop * realize -> contiguous * refactor forward * remove num_classes, stride_in_1x1 from state * refactor forward * refactoring * flake8 * removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk * keep using tinygrad for smaller gathers * fix empty tensors * comms * move from tensor.py * resnet test passing * add coco dataset back * fix spaces * add test for log2 * no need to create Tensors * no need to create Tensors --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-06-25 15:37:51 -07:00
George Hotz	0f281e7b18	touchups	2023-06-25 15:24:26 -07:00
George Hotz	c8fbdeb48e	test speed llama (#1046 ) * test speed llama * oops, put it back * uses the real device codegen * just do it on the mac * pp * is faster? * Revert "is faster?" This reverts commit `42db542010`. * disable docker again for less load on CI	2023-06-25 15:22:56 -07:00

... 166 167 168 169 170 ...

10417 Commits