tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-06 12:44:58 -05:00

Author	SHA1	Message	Date
George Hotz	45ecae1ab3	Revert "Match Torch speed for sum reduction on M1 (#1187 )" (#1286 ) This reverts commit `59af9b81c5`.	2023-07-19 13:39:16 -07:00
chenyu	120ae74008	Enable JIT test for size 1 tensor (#1285 )	2023-07-19 11:06:40 -07:00
chenyu	940b6fd21a	Revert "Fix constant folding for Tensor([3]) (#1227 )" (#1274 ) This reverts commit `ab645317c9`.	2023-07-19 10:51:06 -07:00
chenyu	0aed3f73da	More JIT test cases (#1280 ) * More JIT test cases * test against jit_cache directly * remove unused	2023-07-19 10:45:43 -07:00
Francis Lam	3db57d3118	Fix llama.py to load and concatenate 13B, 30B, and 65B models (#1275 )	2023-07-19 13:22:33 -04:00
George Hotz	d6637623e3	torch test touchup	2023-07-19 09:37:23 -07:00
Alexander Edwards	59af9b81c5	Match Torch speed for sum reduction on M1 (#1187 ) * Add additional kernel when reducing multiple dimensions at once. * Faster for smaller inputs * Whitespace and naming * Cleaner, guard for Metal only, and max 1 split rather than N * Draft of different approach * One additional kernel call for this test (as expected)	2023-07-19 09:18:58 -07:00
Umut Zengin	fde9f0e60d	Slice migrated in Eye op (#1281 ) * Migrated from slice to pad and shrink, made cleaner * Changed repeat with reshape and expand	2023-07-19 09:08:38 -07:00
chenyu	a5f5330d91	Add Fuzz Test symbolic / shapetracker to CI. (#1278 ) * Fuzz test symbolic and shapetracker This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8. * mess again * no tail * test shapetracker too * Revert mess and enable all tests * removed leftover	2023-07-19 09:05:45 -07:00
David Hou	56ee97b37f	dedup kernel args v2 (#1272 ) * new version * fix abstractions * try remove test * Revert "try remove test" This reverts commit `2fc18a9f8e`. * assert_allclose * minimize the test * minimize the test * minimize the test * minimize the test * Revert "minimize the test" This reverts commit `e0c0929596`. * Revert "minimize the test" This reverts commit `88240551b1`. * Revert "minimize the test" This reverts commit `78328a7ce2`. * Revert "minimize the test" This reverts commit `989523fded`. * skip test inside body * oops * oops	2023-07-18 20:03:42 -07:00
wozeparrot	37cc33269a	cl fixes for multigpu (#1276 ) * feat: opencl fixes for multigpu usage * clean: who needs this import anyways	2023-07-18 19:59:30 -07:00
Umut Zengin	fa0265b173	Fix: AssertionError Transpose/Permute when WHERE Op in LB (#1266 )	2023-07-18 16:09:19 -04:00
chenyu	c96bf395df	Enable JIT tests for supported devices, skip METAL and WEBGPU (#1265 ) * Enable JIT test * really test metal * Skip some device	2023-07-18 11:40:37 -07:00
Umut Zengin	f8c539989e	Re-open create cumsum speed test (#1255 ) * Reduced tensor size in testing * Update formatting test_speed_v_torch.py	2023-07-17 18:59:36 -07:00
George Hotz	ab3d281a6e	Refactor MemOps (#1256 ) * metal tests pass locally * define global * refactor DEFINE_GLOBAL * move assembly out. it isn't tested * fix llvm	2023-07-17 16:36:33 -07:00
Stan	ed472bffea	Fix: negative axis in `tensor.cumsum` (#1261 )	2023-07-17 16:16:38 -07:00
Oddity	64d39188ad	Assembly ptx target current arch (#1250 ) * updated .target to use the current arch version * undid docstring	2023-07-17 08:45:43 -07:00
Adrian Kretz	5a8ad57163	Add WHERE ternary (or trinary?) op (#1196 ) * Rename FusedOps to TernaryOps * Support ternary broadcast * Add where llop and mlop * Make where op work in cstyle codegen * Don't skip test_inf_where * Add backward path to where op * Use bool in cstyle codegen * Add LLVM where op * Add numpy where op * Add torch where op * Simplify where mlop * Update documentation * Forgot a rename * Merged relevant changes from PR #1195 onto PR #1196 * Add test to cover changes to linearizer.ast_parse for WHERE op Without this METAL will try to use ternary op on float4 and fail * Make where op work in wgsl backend * Allow ternary ops to be merged * Make mypy happy --------- Co-authored-by: Francis Lam <flam@alum.mit.edu>	2023-07-16 00:31:55 -07:00
Stan	91f797cd52	Moved mkdir in `utils.download_file` to diff line (#1249 ) * Moved mkdir to diff line .mkdir does not return the actual directory being created. * use walrus operator to simplify	2023-07-16 00:30:46 -07:00
Yixiang Gao	a8f2c16f8e	add contiguous (#1246 )	2023-07-15 08:36:34 -07:00
Stan	872e2198fe	Added `nn.ConvTranspose1d` (#1243 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-15 00:42:42 -07:00
Oddity	7399f6dad7	display sass for both cuda code and ptx (#1240 ) * skip nvcc compile target cubin when using PTX * actually we should generate sass for both ptx and cuda code * Fixed formatting, should print the error anyway * ensure subprocess.run throws exception * fixed linting errors and checked before commit this time	2023-07-15 00:36:04 -07:00
Stan	264d467f2b	Added `tensor.squeeze` and support for testing exceptions (#1241 ) * WIP: `tensor.squeeze` function * Added `test_except` param to `helper_test_op` to avoid false positives * Extracted new method `helper_test_exception` for testing exceptions * Made `squeeze` not throw IndexError when ndim == 0 and dim <= 0 to match PyTorch	2023-07-15 00:33:24 -07:00
Stan	a8f3b3f4ed	Added test for nn.Conv1d (#1242 )	2023-07-15 00:30:50 -07:00
David Hou	9c135c9450	add sqrt to ptx (#1236 )	2023-07-13 07:26:11 -07:00
chenyu	32be39554c	Simplify symbolic.SumNode.__floordiv__ logic (#1220 )	2023-07-12 12:54:12 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00
Yosef Frost	613bcd945d	Added Test Coverage to Int32 and Make Sure Tests Succeed (#1174 ) * Added test coverage for int32 in `test/test_dtype.py` Tests for int32 include: - testing that int32 can be converted into a numpy array - testing that float and int64 can be cast into int32 - testing that int32 can be cast into float and int64 - testing addition, multiplication, and matrix multiplication with int32 - testing that addition, multiplication, and matrix multiplication with int32 and either float or int64 gets successfully cast into float and int64, respectively Additional changes include testing that int8 casts into int32 and testing that float16 casts into int32 * Added type casting to the add, subtract, and divide binary operations * Added automatic type casting when types differ to FusedOps.MULACC I moved the match_types function back so that I could call it in einsum_mulacc where it would cast the types of the MULACC to be the same * Added unit test for match_types and added type hints to the parameters * Added tests for ops_cpu.match_types * Changed ops_cpu.einsum logic to play nicely with PyTorch Changed `tinygrad.runtime.ops_cpu.einsum_mulacc` logic to not perform type matching. Type matching was instead moved to the numpy_fxn_for_op dictionary in the ops_cpu file. Since ops_torch uses the same einsum_mulacc function, this should fix all the broken pytorch tests. * empty commit to rerun ci * reverting PR#1213 in attempt to fix broken test * Removed all tests I added to see if they are causing CI issues * Added back type matching tests * removed type matching tests and added back int tests * added back part of the type matching tests * removed braking type matching tests * empty commit for testing * added test back but inside comment * removed a test from the comment to see if it breaks CI * removed another function * more testing * emptied test comment * cleaned up comments * Added optimize=True flag to einsum_mullac in cpu_ops.py * Removed unnecessary imports from tests * optimized match_types by removing unnecessary array copying	2023-07-12 10:29:15 -07:00
Roelof van Dijk	8f2e2f5ee2	style: else-after-return (#1216 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-07-12 10:26:38 -07:00
George Hotz	ab663c46e8	tensor cores: don't upcast if we can't. fix stable diffusion	2023-07-12 10:21:02 -07:00
Hey	4f72eb823c	Outdated repository URL (#1218 ) * Update outdated repo url * Update more outdated repo url's	2023-07-11 23:14:19 -07:00
Roelof van Dijk	d0e21a7398	ci: don't install recommended packages for GPU (#1215 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-07-11 15:38:49 -07:00
Francis Lam	df86672bd4	Fix LazyBuffer SHUFFLE_PAD_OPS to prevent invalid pad movement (#1223 ) In addition to div, any ops that will generate non-zero outputs from zero inputs need to be guarded.	2023-07-11 15:30:35 -07:00
AN Long	f75de602df	fix typo in stable diffusion example (#1219 )	2023-07-11 15:26:40 -07:00
chenyu	ab645317c9	Fix constant folding for Tensor([3]) (#1227 ) * Fix constant folding for Tensor([3]) * Remove duplicated prod import * load in the same device * better numpy * add constant fold shape test cases * improve tests	2023-07-11 14:01:32 -07:00
Carson Radtke	e2f6b09ffd	[perf] optimize=True kwarg for np.einsum (#1213 )	2023-07-09 18:31:04 -07:00
madt2709	bb316a42af	Fix pow to work with negative tensors (#1191 )	2023-07-09 17:33:04 -07:00
George Hotz	43385c7dbf	remove contiguous on full (#1212 )	2023-07-09 17:31:15 -07:00
Carson Radtke	13a1abf9e7	remove tuple from type annotation in Tensor.__init__ (#1211 )	2023-07-09 16:27:07 -07:00
Roelof van Dijk	e27f098946	View as namedtuple, cached methods (#1075 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-07-09 14:26:02 -07:00
Carson Radtke	1eb0e0cb3f	implement common subexpression elimination (#1204 ) * implement common subexpr elimination * Revert "implement common subexpr elimination" This reverts commit `40c5487d20`. * move cse to ast_parse + add type annotations * oneline if * improve saved_exprs lookup	2023-07-09 14:22:53 -07:00
George Hotz	beb4d3ab01	Tensor Cores 2: Local Buffers Edition (#1057 ) * local buffers * work * works * invert_strides * work * non tc * fix shapetracker bug * stride priority * touchups * gate tensor cores * tensor core conv * cleanups * bug fixes * fix metal_matmul * fast tensor cores * more speed * buffer selection bug fix * fix CI maybe * ugh, CI is set to true, not 1 * tc allowed * add_gl_dimension * split out padding conv tests * does padding add fail * test_padded_conv2d_1x1 * skip metal ci stuff * more strict on yellow * float2 * strip parens * fix float2 * touch up * dtype * strip parens * no alias * bugfix * cast float2 and test tensor core ops * oops, don't hardcode 4	2023-07-09 09:06:00 -07:00
George Hotz	67e34b356a	good stuff from tensor cores branch (#1199 )	2023-07-08 16:58:26 -07:00
George Hotz	7151382364	Refactor load/store before tensor cores (#1193 ) * minor cleanups * render_const * now that's a nice refactor * clean up vload/vstore * clean up render_load * debugs there * dumb * err, this? * const float4 * what's failing * bugfix * statement includes semicolon * bugfix	2023-07-08 15:54:58 -07:00
fluffy χατγιρλ	ef1909500e	remove superfluous parentheses (#1197 )	2023-07-08 15:11:02 -07:00
fluffy χατγιρλ	628ee46627	Fix bug where Tensor.randn returns inf (#1192 ) * fix randn inf bug * add test * more compact test * clarify test purpose	2023-07-08 12:03:46 -07:00
George Hotz	d9c1d81e99	Revert "feat: cancel previous workflow runs on new commits (#1184 )" (#1194 ) This reverts commit `d66a0c285d`.	2023-07-08 11:26:13 -07:00
George Hotz	52600d532e	add 20 minute timeout	2023-07-07 23:02:28 -07:00
wozeparrot	d66a0c285d	feat: cancel previous workflow runs on new commits (#1184 )	2023-07-07 22:55:35 -07:00
Jacky Lee	e0c2ae8984	Update file paths (#1179 )	2023-07-07 18:41:58 -07:00

1 2 3 4 5 ...

2136 Commits