tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-14 08:35:17 -05:00

Author	SHA1	Message	Date
chenyu	fae08c4d48	fix Tensor.triu / Tensor.triu with boolean input (#4941 ) `where(self, 0)` incorrectly upcasted the output. `where(self, False)` is correct but looks unnatural, so added a cast at the end. Pattern matcher can fold the cast into where branches	2024-06-12 20:16:13 -04:00
chenyu	eb0f5b5660	failed test case for getitem with leading Nones (#4936 ) * failed test case for getitem with leading Nones torch matched numpy so tinygrad is incorrect. another repro ``` t = np.arange(12).reshape((3, 4)) print(t[None, None, np.array([1, 2])]) t = torch.arange(12).reshape((3, 4)) print(t[None, None, torch.tensor([1, 2])].numpy()) t = Tensor.arange(12).reshape(3, 4) print(t[None, None, Tensor([1, 2])].numpy()) ``` * # noqa	2024-06-12 16:19:42 -04:00
chenyu	1326f29e24	fix Tensor.gather shape checking criteria (#4932 ) it's fine if `self.shape[d] >= index.shape[d]` for all `d != dim`, not for all `d`	2024-06-12 13:10:14 -04:00
chenyu	798ea61377	widen test_ops [low, high] and more strict atol (#4906 ) default [low, high] changed from [-1.5, 1.5] to [-2, 2] (except tan). dropped several explicit atol if it's unnecessarily larger than default 1e-6. tested on mac, tinybox red / green	2024-06-10 20:47:09 -04:00
chenyu	c8cd637236	test case for Tensor.var reducing over size = 1 axis (#4902 ) backward failed when correction >= reducing n	2024-06-10 12:11:39 -04:00
chenyu	a70e8a80d7	test_ops test cmp with special floats (#4826 ) prepare to fix nan, it did not work with ge and le before either	2024-06-04 12:10:21 -04:00
chenyu	3afc914617	CMPEQ -> CMPNE and make it safe to pad (#4818 ) * CMPNE * new dataset	2024-06-03 18:02:15 -04:00
chenyu	4921de1945	fix cumsum of 0-d tensor (#4781 ) * fix cumsum of 0-d tensor * _resolve_dim for all	2024-05-30 12:41:09 -04:00
chenyu	4cf0eadf8f	failed test case for ellipsis in einsum (#4779 ) from #4156	2024-05-30 11:14:42 -04:00
chenyu	7e90026eb0	pow cleanup part 2 (#4727 ) more cleanups and fix 0 ** 0	2024-05-25 07:17:40 -04:00
chenyu	31358cbea5	change Tensor.stack to method (#4719 )	2024-05-24 17:04:19 -04:00
chenyu	47aba47f64	update Torch.gather api (#4692 ) * update Torch.gather api gather(self, dim, index) to match torch * fix that	2024-05-22 21:54:06 -04:00
George Hotz	07b350a8f4	new uops is an actual graph (#4560 ) * new uops is an actual graph * it's way slower * simpler * fix define acc * render_loop unique * ops test pass * add pattern matcher back, there's bugs * rewrite * use priority queue * recursive children * fix tests * fix tests with SINK * fix abstractions * fix assembly * simpler * link define_acc * fix DEFINE_ACC placement * type verify * full cmp * fix cmp * ACCESS_ACC * insert DEFINE_ACC * fix PHI * recursive rewrite * fix many tests * sum collapse * more patterns * correct change * fold arange * fix that lin test * space * big folding rule works * close * has more maxes, meh * cached node replace * set changed * simplest folding yet * works * works * DIV * all tests pass * del * fuzz linearizer fails * sum_collapse * test depth 2 cf * fix lin test 14 * fix clang depth * disable that * failure 14 is fixed * fix ptx * failure 27 is fixed * fix llama * run_cnt * Revert "Optimize PTX gated loads index calculation (#4304)" This reverts commit `d97d5a7689`. * fix uops loop * fix ptx bugs * add barrier * print * mem_type in ptx direct * bypass tests that fail in CI but pass locally * ptx remove ptr_ar * more ptx passing * fix ptx tests * assert compile support * remove model inference benchmark from red	2024-05-17 18:00:18 -07:00
chenyu	2119e0456d	redo simpler abs and sign (#4611 ) moved Sign logic to function.py, and backward always returns 0 to match torch. rewrite abs as `self * self.sign()`, so it's backward also matches torch.	2024-05-15 18:19:46 -04:00
nimlgen	eb9689336e	nv mockgpu (#4600 ) * mockgpu nv * works * comment that out * fix merge * setup gpuocelot * install packages * not run all of them * passes * fix ci * almost * should pass * linter * linter 2 * try this? * ugn, not supported * ci * remove ticket from description * better descs	2024-05-15 23:46:08 +03:00
chenyu	2b0ee74bb6	lshift and rshift (#4591 )	2024-05-14 19:16:31 -04:00
George Hotz	02327b8adf	simple stuff from new_uops branch (#4563 )	2024-05-12 22:18:05 -07:00
chenyu	d3dc332c2e	Tensor.logsumexp (#4442 ) the subtract max part should share with safe softmax cleaner	2024-05-09 20:49:06 -04:00
qazal	23445db2b9	no skipped tests in RHIP (#4337 ) * delete skip * delete split skip * remu dev * compiler fails here * Revert "remu dev" This reverts commit `28b933d4eb`.	2024-04-28 12:23:05 -04:00
Obada Khalili	e4befa41d7	Fix in `_reshape_mask` (#4332 ) * handle reshape with remainder in _reshape_mask * remove trailing whitespce * use helper_test_op to generate tensors from shapes * test in shapetracket too * remove whitespace * revert property name in other class tests	2024-04-28 11:57:39 -04:00
George Hotz	55ae73e951	Replicate llm.c in tinygrad (#4179 ) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * test tolist * simple fix for onnx test failures (#4186) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * bump line count to 7500 * simplest fix * safenumpy tolist for now --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> --------- Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>	2024-04-16 15:40:48 +04:00
David González Martínez	980124a605	add lerp operation to tensor (#4102 ) * feat: add lerp operation to tensor * fix * style: fit in one line: * tests: test backward for lerp	2024-04-08 17:03:27 -07:00
geohotstan	183708b3fd	broadcast expand to match torch (#4085 ) * initial version * heh gimme grrrreen * version 2 * clean ups * some test confusion * fix onnx * rename to _broadcast_tensors * improved errors and test * fixed? * some test fixup * version 3 lol * comments * cleaner * add failure test for expand to 0 test * 1 more assertRaises test * make err msg better * also rewrite the expand onnx op? :s	2024-04-07 16:23:13 -04:00
Akshit Talwar	750ecf8fef	replace slice by pad/shrink in _pool (#4082 )	2024-04-05 11:47:22 -04:00
chenyu	fe03725b21	const fold cast unrealized_unpadded_const (#4047 ) * const fold unrealized_unpadded_const changed the underlying arg directly * CAST_BEFORE_VIEW folds some * fix const index in getitem	2024-04-03 12:31:24 -04:00
chenyu	793ab0512e	use ctypes to truncate float64 and float32 in uops (#3986 ) this fixed the softmax.argmax bug for ops_python as the float is truncated to float32	2024-03-28 23:56:50 -04:00
chenyu	4ecd5789ab	#include <tgmath.h> in ops_clang (#3927 ) * different clang sqrt/log2/exp2/sin function based on dtype fixed softmax_argmax issue in #3552 for clang. * tgmath.h * revert those	2024-03-25 17:48:57 -04:00
Alejandro F Queiruga	556dcfb8f2	Fix the result permutation in einsum (#3895 ) * Fix permutation of result indices in einsum. * Delete stray line used for breaking tests * Fix linter error by renaming twice-used variable --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-23 15:48:19 -04:00
chenyu	f271cd682b	user _resolve_dim in argmax (#3846 ) also added comment of the behavior if there are multple, and more tests	2024-03-20 20:17:30 -04:00
chenyu	ac866eaf5a	disable simplify_phi_loops (#3812 ) * disble simplify_phi_loops this breaks BEAM search GPT2. * skip that	2024-03-18 19:25:26 -04:00
chenyu	f599c6e7f4	test output dtypes matche in test_ops (#3703 ) need to cast some torch output to int32 because torch default returns int64 for index related function close #2797	2024-03-12 12:44:40 -04:00
chenyu	02ca067bdf	use default_float.np to construct test data in test_ops (#3701 ) first step of #2797	2024-03-12 11:58:20 -04:00
Patrick Tsai	971d7f5d7c	O(n) arange attempt (#3530 ) * It works? * Clamp correctly * Refactor * Make code better * Undo some stuff * First step to trying to make floats work * Floats work in Python op but not metal because int div is different Python integerdivision was implemented as // which rounds towards negative infinity, but C integer division rounds towards 0 so there is an off-by-1 division error * arange does cumsum with ints and then multiplies by step This is so loop optimization can remain int only * Undo a lot of symbolic changes * Final check * Cleanup * There can be multiple phis * Fix multiple phi op removal * const sets dtype correctly * Fix bugs * Fix a couple bugs and add loop vars to resolve * missed one * Don't trim too many ops * Fix symbolic test * Use ones instead of full * Delete test * Lint passes * max node error * Small updates to loop logic * Remove unnecessary changes * We are getting somewhere * Simple case * Fix * rm, prn * Better * If NumNode doesn't work then continue * clamp is needed for arange(256) * Move everything into the optim fn * Replace correctly * Order optimizations better * Delete * mypy * Test for simplification * Rename * Fix test * update test description * Undo more * Cleanup * No replaced_ops map * Fix lint * AssertionError * back again * Reinstate assertion * Return true and make diff not as big * Bigger range for test * Change cumsum impl * fix bug * make big cumsum work * lint * Undo cumsum 2-stage removal * No while helper * optional min/max clamping * floats work * rm giant arange test * fix python cast None * Check phi parents * one phi allowed per where * Fix one phi per where * Rework iteration * Delete assertions * convert to int * Try mul -1 instead of neg for hip..? * Remove one phi per where requirements * one accum only * Lint * should simplify a loop at a time * Don't get rid of loop explcitly * Need to iterate backwards * lint * unary neg * Make optim work for onnx and sum_pad_collapse * Better message * filter alu ops correctly * Fix the limiter * lint and simplify * Add it back * off by one error * test wheres and phis * test max ops and non-if stuff * <= * cast_scalar * Oops * Change test * Pass loop uops instead of a modified map * Cut param transfer between linearizer and uops * Fix issues * Fix lint * fix efficientnet python 3.8 invalid syntax * distinct vars in seen_vars * accurate var names --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-11 16:09:20 -07:00
George Hotz	69ca7f7bf9	changes for teenygrad (#3665 ) * changes for teenygrad * upd * simpler test	2024-03-09 15:30:34 -08:00
Obada Khalili	b5cbf1792a	Fix `Tensor.cumsum` when axis of length 0 is selected (#3473 ) * fix Tensor.cumsum when axis of length 0 is selected * add cumsum regression test * define padding left size in a seperate line	2024-03-09 08:26:41 -08:00
reddyn12	660df3cff1	Add test for .softmax.argmax (#3559 ) * Add broken test for known issue * skip PYTHON * skip PYTHON * fix commit --------- Co-authored-by: schlimeszn <schlimeszn@gmail.com> Co-authored-by: reddyn <nikidsniper@gmail.com>	2024-03-02 20:51:52 -08:00
George Hotz	aa9b013d79	add constant folding for WHERE in uops (#3584 ) * add constant folding for WHERE in uops * prereqs for generic constant folding * fix test * disable slow overflow logic * make that test faster	2024-03-02 10:37:14 -08:00
chenyu	d89e3c4e08	enable METAL tests now runner is M1 and no fast-math (#3523 )	2024-02-28 14:14:23 -05:00
David Friehs	2fe98b64bb	fix Tensor.split not passing dim to Tensor.chunk (#3490 )	2024-02-24 07:53:11 -05:00
chenyu	1eb24af63b	fix softmax and log_softmax for 0d tensor (#3463 ) matched torch to take axis \in [-1, 0] and used axis=None internally	2024-02-21 11:30:30 -05:00
George Hotz	871ba73e65	_reduce_op is axis based now (#3462 ) * _reduce_op is axis based now * axis_ * update lin failures * disable that * fix shape	2024-02-21 16:36:31 +01:00
geohotstan	5eb4c902f6	correct division dtype casting (#3405 ) * 新年快乐 * fix: exclude floordiv onnx tests * fix: less weird if statements in div * 龙年大吉 * fix: tempfix onnx div * fix: use reference impl for div	2024-02-15 19:34:40 -05:00
Obada Khalili	18bb6a22e0	make tensors sizes smaller in maxpool2d tests (#3417 )	2024-02-15 15:53:52 +01:00
chenyu	1156a27619	cleanup atol in test_ops (#3368 ) removed the explicit set value if it's the same as default 1e-6, or higher but can be set to default.	2024-02-10 19:44:44 -05:00
George Hotz	c32ea95d7d	Python uop emulator (#3327 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style	2024-02-08 19:24:55 +01:00
chenyu	b110c4a7b8	explicitly set input low and high in test_ops (#3347 ) easier to set `(low, high)` than figuring out a,b for `(x+a)*b`. this pr kept the same input ranges	2024-02-08 04:11:45 -05:00
chenyu	0d2dacb549	test intermediate tensors created by function have same device as input (#3338 ) run on TORCH since it's the fastest one on CI. caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.	2024-02-07 09:24:36 -05:00
chenyu	ca66be6a70	add failed Tensor.pow test cases (#3334 ) tried refactoring pow and found some bugs	2024-02-07 04:28:24 -05:00
chenyu	d9ef8e25b3	fix Tensor.var with 0 in reduce dim. (#3324 ) fix when correction is too big. it seems to only work when input size is 0 though. torch can output -inf in var when correction is too big, which does not make sense.	2024-02-05 20:59:13 -05:00
Obada Khalili	ee25f73283	Fix Tensor.mean to compute the mean correctly when 0-length axes are selected (#3318 ) * fix Tensor.mean to compute the mean correctly with 0-length axes are selected * add a regression test * rename sum variable to sum_t to avoid conflict with built it function * refactor Tensor.mean to has less lines	2024-02-05 01:40:37 -05:00

1 2 3 4 5 ...

375 Commits