tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-13 17:08:11 -05:00

Author	SHA1	Message	Date
George Hotz	758515dcc0	conv2d is an hlop (#589 ) * conv2d is an hlop * shorter conv * KOPT=-1 * alt imp * MULACC * smarter mulacc * pop conv * 7x7 -> 5x5 * didn't fix, that's not going to work * this is faster and matches old behavior * oh, non lazy just won't work with mulacc * mulacc in torch * bool types were creeping in * optimizer is actually better with hlop conv * fix pushing permutes issue * refactor einsum_mulacc * fix up readme * update readme * _image_conv2d * fix bias addition location * pushing permutes gets back to 200 kernels * conv cleanup * disable hlop conv * don't hide that in helpers	2023-02-23 17:52:31 -08:00
George Hotz	ab3a2ae9a2	fix test_resnet in onnx now that maxpool works	2023-02-23 08:41:47 -08:00
George Hotz	fd6082dcef	support all _pool2d. conv will eventually be an hlop	2023-02-23 08:19:47 -08:00
Mischa Untaga	5190784cbb	Fix Tensor random functions determinism with same seed (#580 ) * fix Tensor random functions determinism with same seed * long lived rng * TIL ClassVar typing	2023-02-22 19:08:43 -08:00
George Hotz	c8d89eb20e	avg/max pool strides	2023-02-22 18:00:48 -08:00
George Hotz	628ce067a1	add tests to mypy	2023-02-22 07:07:38 -08:00
George Hotz	104c3c5e73	oops, forgot that debug	2023-02-22 06:58:27 -08:00
Connor Henderson	9670bf1fd1	Add unsqueeze (#574 ) * Add unsqueeze * remove UNSQUEEZE from llops part of readme * make it an hlop	2023-02-20 20:14:59 -08:00
George Hotz	60008e55cd	sick of that failing	2023-02-19 13:05:37 -08:00
Martin Loretz	7e9a5e3f31	Refactor graph (#560 ) * Refactor graph * Add graph tests * Use CPUBuffer for graph tests * Remove the use of GlobalCounters	2023-02-19 10:41:30 -08:00
Kirill	7944cfdadc	Remove Tensor.data (#565 )	2023-02-18 16:36:12 -08:00
Jacky Lee	9fd41632c6	Import get_parameters from tinygrad.nn (#559 ) * get_parameter is in optim * Update all imports for get_parameters * Clean up * use optim.get_paramters	2023-02-17 15:22:26 -08:00
George Hotz	fae7654924	fix sync issue	2023-02-17 12:42:45 -08:00
George Hotz	5e6265be6e	metal timing, fix speed test	2023-02-17 12:31:54 -08:00
George Hotz	121bd03cbd	metal globalcounters	2023-02-17 12:02:54 -08:00
Jacky Lee	e172f0087a	BatchNorm2D -> BatchNorm2d (#558 ) * BatchNorm2D -> BatchNorm2d * Fix typo	2023-02-16 12:31:49 -08:00
George Hotz	20a03d5017	woah, don't sync torch if it's not torch	2023-02-12 07:48:56 -08:00
George Hotz	de71c13934	test speed v torch uses jit	2023-02-12 07:43:17 -08:00
George Hotz	446442dbb3	fix tests symbolic	2023-02-11 15:16:47 -08:00
George Hotz	7a7046f264	sum_combine_num	2023-02-11 14:48:31 -08:00
George Hotz	7d33f2d659	CL.CACHE is over, GlobalCounters.cache is it	2023-02-11 12:00:14 -08:00
George Hotz	0a2035e015	oops, GPU isn't defined	2023-02-11 10:10:02 -08:00
George Hotz	3421d4af10	the jit has a test	2023-02-11 10:04:03 -08:00
George Hotz	b9f02671d3	oops, broke torch speed test	2023-02-10 16:13:53 -06:00
Jacky Lee	5c51ae8dbf	Show where tinygrad is faster in speed test vs torch (#549 ) * show where tinygrad is faster * don't change text color	2023-02-10 14:01:07 -06:00
George Hotz	c3cf17c6d0	Symbolic render (#550 ) * render symbolic * valid * fix shapetracker tests * render_python is the default * expr is gone * remove legacy behavior	2023-02-10 13:22:26 -06:00
Lucas Keller	56a06280c5	Testing/utils (#548 ) * New unittest for utils.py Unit test fetch in basic ways. Would have tested more fetches, but downloading stuff for tests is annoying and mocking is more dependencies. * Remove unused imports	2023-02-10 12:08:20 -06:00
George Hotz	5de850f6d5	assign buffer reuse (#547 ) * assign buffer reuse works * fix assign for torch and cpu * allow assign from numpy * fix llvm output_buffer * add some assign tests * fix assignment test * test should fail without lazy * env var to disable assign	2023-02-09 11:53:02 -06:00
George Hotz	473bbd3e35	fix graphs	2023-02-09 09:40:46 -06:00
George Hotz	3d63934995	refactor to keep cl in the runtime (#545 ) * refactor to keep cl in the runtime * fix thneed, rename cl to _cl * bugfix + _cuda * fix tests * thneed more correct	2023-02-08 16:46:09 -06:00
Mitchell Goff	ae4f0aeb5f	NumPy-like semantics for Tensor.__getitem__ (#506 ) * Rewrote Tensor.__getitem__ to fix negative indices and add support for np.newaxis/None * Fixed pad2d * mypy doesn't know about mlops methods * normal python behavior for out-of-bounds slicing * type: ignore * inlined idxfix * added comment for __getitem__ * Better comments, better tests, and fixed bug in np.newaxis	2023-02-08 08:59:46 -06:00
George Hotz	aebe75d9a2	remove val expansion (#539 ) * remove val expansion * types for all shapetracker functions: * more typing * add all the parens to the test * more types * fix tests * very minor speedup	2023-02-07 15:14:05 -06:00
Jared Z	7604b17fbf	TestZeroViewShapeTracker fix test (#481 ) * TestZeroViewST test * updated to align with st naming conventions in file * Update test_shapetracker.py	2023-02-07 06:17:55 -06:00
George Hotz	c073271f20	more symbolic correctness	2023-02-07 00:03:14 -06:00
George Hotz	e961fd3a04	more symbolic test, ModNode is wrong	2023-02-06 23:43:21 -06:00
George Hotz	8cfeb118d6	symbolic new test	2023-02-06 23:27:26 -06:00
George Hotz	c3d81bba2a	test_train: Adam -> SGD	2023-02-06 08:55:41 -06:00
Jacky Lee	ad4f6aa2cf	Add test for quick_gelu (#526 ) * Add test for quick_gelu * Bump PyTorch version for approximate	2023-02-03 20:01:39 -08:00
Jacky Lee	486f023e81	Rename Normalize and move to nn (#513 ) * Rename Normalize and move to nn * Match PyTorch for dim>1	2023-02-01 11:55:03 -08:00
George Hotz	cd97b036cc	A Triton backend for tinygrad (#470 ) * triton can add * print stuff from triton * write out file * ops triton working * reduce ops * sort of works * Triton bugfixes & implementation of remaining ops (#490) * padding * support pow, max, relu, gt0 * allocate return buffer * Fix reduce * Add tests for power op * Fix triton illegal memory accesses and memory leak (#512) * Fix mypy issue * Add triton to setup.py * Replace torch with pycuda * Use one cuda stream for data transfer and kernels * Remove triton submodule * Fix memory leak by using weakrefs for caching * Fix memory access by adding valid as mask for load * Fix invalid kernel launches by flattening the grid (#515) --------- Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>	2023-02-01 11:53:57 -08:00
Jacky Lee	799b3f185a	Refactor getenv into helpers (#508 ) * Refactor getenv into helpers * Remove unused os * Fix default value * Fix more defaults for CI * Fix bracket * Revert changes to openpilot/compile.py * Use getenv from helpers when possible	2023-01-31 15:09:09 -08:00
Jacky Lee	491e78d203	Add symbolic tests for correctness (#494 ) * [WIP] Add symbolic tests for correctness * Fix typo * Fix expected value for test_and_fold * Add more tests for symbolic * It is indeed right * Clean up * Check all strings * Put TODO back	2023-01-30 18:40:16 -08:00
George Hotz	7457f0d755	KOPT=2	2023-01-30 13:28:06 -08:00
George Hotz	cccfea4b25	factor out KOPT code	2023-01-30 13:13:55 -08:00
George Hotz	de2c419fd4	make_pair and first attempt at hlb_cifar10	2023-01-30 11:07:23 -08:00
George Hotz	2db272c7f7	Kernel Optimizer (#489 ) * kernel optimizer * 10x faster, but wrong. not good deal * move test -> extra * print x speedup * clcache * fix clcache + DEBUG * GFLOPS estimate * i==3	2023-01-29 17:15:00 -08:00
George Hotz	ebdec2b72f	fix optimizer	2023-01-29 00:23:06 -08:00
George Hotz	b0df4d99a0	os x profiling: this ratio is exact i believe	2023-01-28 19:02:51 -08:00
George Hotz	2f194aadad	loop unrolling upcast	2023-01-28 14:51:24 -08:00
George Hotz	381f3e92da	fix prints, add third conv	2023-01-28 14:10:27 -08:00

... 84 85 86 87 88 ...

4667 Commits