tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 22:38:16 -05:00

Author	SHA1	Message	Date
chenyu	e7ff5102cf	failed test in test_pattern_matcher (#4080 ) something about the PTX rewrite is incorrect that it has duplicated rewritten uops	2024-04-05 02:53:50 -04:00
George Hotz	3de855ea50	don't use SVM memory in KFD (#4072 ) * don't use SVM memory in KFD * copy from fd * cleanups * transfer * hacks * ops_hsa * tighter API	2024-04-04 17:33:21 -07:00
chenyu	c1cffed1df	add LazyOp.dtype (#4073 ) an inferred cached_property. removed all cases that use get_lazyop_info just to get the dtype of an op. prereq to remove InterpretedFlopCounter	2024-04-04 17:38:19 -04:00
Szymon Ożóg	82b7b9655f	test for dtype set (#4069 )	2024-04-04 11:24:33 -04:00
geohotstan	1a1dd1c1a7	add and enable tests for indexing const folding (#4068 ) * enable test in test_indexing * added tests * rename stuff * del a test case cuz it's loadops.copy	2024-04-04 10:46:28 -04:00
Szymon Ożóg	ba118abfec	improved caching for pointer arithmetics in ptx (#3922 ) * improved caching for pointer arithmetics * Add test for pointer arithmetics caching * Refactor test	2024-04-04 07:33:48 -07:00
George Hotz	7181ffd630	HWCopyQueue in KFD (#4042 ) * HWCopyQueue in KFD * hw compute queue * test * move test * more tests * fix wait * fix multimap * mes crash * tests pass but slow * stuff is working * one more test	2024-04-03 20:14:24 -07:00
chenyu	e3c0ac9fbf	remove old envvar "OPT" (#4060 )	2024-04-03 14:55:21 -04:00
chenyu	406cb5fd90	const fold ReduceOps (#4059 )	2024-04-03 14:39:28 -04:00
chenyu	fe03725b21	const fold cast unrealized_unpadded_const (#4047 ) * const fold unrealized_unpadded_const changed the underlying arg directly * CAST_BEFORE_VIEW folds some * fix const index in getitem	2024-04-03 12:31:24 -04:00
Szymon Ożóg	e5a9bff899	Add pattern matcher tests, move uop transforms from assembly to pattern (#4056 ) matcher	2024-04-03 09:06:43 -07:00
chenyu	f61ed869f5	Use exec_alu for lazy const folding (#4039 )	2024-04-02 20:52:05 -04:00
chenyu	85edc493b0	uops const fold rules to prevent tautological compare warnings (#4041 ) * uops const fold rules to prevent tautological compare warnings `bool < false` is false, `true < bool` is false, `a == a` is true, `a != a` is false * not true for nan * and nan does not work with llvm * full truth table test * revert a==a * comments and indents	2024-04-02 16:45:58 -04:00
Patrick Tsai	0147174ad6	Embedding in one kernel (#4036 ) * Embedding is in one kernel * embedding is one kernel * rm extra line * newline * bert test counts state vars? * add a test? * move items around --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-04-02 11:38:21 -04:00
Dan Hoffman	5311b45053	re-enable has_local check for linearizer test (#4034 ) Co-authored-by: Dan Hoffman <daniel.hoffman@intel.com>	2024-04-02 00:02:03 -04:00
George Hotz	7425a0c646	CommandQueue is the future (#3950 ) * start of command queue * cq work * runs * cleanup * outs set * read is gone * future buffer work * command queue is better * command queue works * loadops * delete unneeded * command queue works * upd * fix tests * use CommandQueue in compile * delay sync	2024-04-01 17:35:48 -07:00
chenyu	82440d3416	don't call contiguous for unpadded const into multi tensor (#4032 ) * don't call contiguous for unpadded const into multi tensor fixed multi const folding for sharded const. still wip, need to be careful that this does not break multi device cache somewhere * ehh need a memory test for that * simple sharded memory test	2024-04-01 19:22:14 -04:00
chenyu	77a68fc52f	test examples for multi tensor const folding (#4031 ) works with literal const operand now because it's copied to each shard and handled by lazy. does not work for sharded const	2024-04-01 16:53:43 -04:00
chenyu	379d52548d	const fold left const operand for ADD and MUL (#4029 ) * const fold left const operand for ADD and MUL * neg have dtype issue	2024-04-01 15:09:04 -04:00
chenyu	0e02d074bd	fix Tensor.pow folding for exponent 0 and 1 (#4025 )	2024-03-31 19:57:23 -04:00
mmmkkaaayy	a4ae9352bd	delete irrelevant JIT regression test (#4024 )	2024-03-31 19:35:35 -04:00
chenyu	d3f27761b0	move const folding of ADD/SUB/MUL from tensor to lazy (#4020 ) * move const folding of ADD/SUB/MUL from tensor to lazy will do div and pow separately. * fix onnx adding with None	2024-03-31 16:35:36 -04:00
chenyu	7f859593b8	fix _to_const_val and const folding around it (#4017 ) * fix _to_const_val and const folding around it is_unrealized_contiguous_const is too strict and almost never hit if const is expanded. suffice to check if there's no pad * that test is folded * test_const_folding	2024-03-31 13:09:23 -04:00
chenyu	c71627fee6	move GlobalCounter to helpers (#4002 ) break circular import between ops and buffer	2024-03-30 00:30:30 -04:00
George Hotz	9eef44521b	ScheduleItem uses Buffer (#3995 ) * schedule Buffer * update * update tests * master * works * remove LoadOps.WAIT * fix compile2 * bad test * rename and note	2024-03-29 20:50:27 -07:00
George Hotz	8f1e34a2a0	early src delete (#3996 ) * early src delete * fix bad test * fix test_linearizer	2024-03-29 19:46:07 -07:00
George Hotz	f916aadaea	external that test	2024-03-29 19:35:50 -07:00
George Hotz	c42ed8e99c	don't reschedule	2024-03-29 19:17:37 -07:00
chenyu	b43e470f80	always use f32 for rand source of randn (#3998 ) * always use f32 for source of randn fixed bfloat16 randn to not have inf. don't really care about float64. threefry is float32 based too * HSA is broken	2024-03-29 17:04:34 -04:00
chenyu	6b6461122e	test case Tensor.randn should be finite (#3994 ) * test case Tensor.randn should be finite there's a hack to fix float16, need a generic solution that works with bf16 and threefry * skip not supported * bfloat16 local is wrong * skip RHIP	2024-03-29 14:51:02 -04:00
chenyu	d9ff636cf5	use is to compare with enum (#3993 ) * use is to compare with enum currently it's mixed between `==` and `is`, moved all to `is` * more	2024-03-29 13:02:56 -04:00
chenyu	7bc560ec49	remove outdated bf16 comments in test_dtype (#3987 )	2024-03-29 00:56:18 -04:00
uuuvn	8a40d7d423	Shape changing bitcast and assert bitcast in disk (#3973 ) * Shape changing bitcast * only support it on disk * basic test * more tests * RuntimeError instead of assert * create unique temp files * move tests that use disk to test_disk_tensor * linter * remove assert on error messages * that's RuntimeError now --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-28 21:49:10 -07:00
chenyu	793ab0512e	use ctypes to truncate float64 and float32 in uops (#3986 ) this fixed the softmax.argmax bug for ops_python as the float is truncated to float32	2024-03-28 23:56:50 -04:00
chenyu	c4c243f79d	update test_uops _equal to use assert_allclose (#3981 ) it handles nan	2024-03-28 22:14:45 -04:00
reddyn12	9b5e15db6e	Mamba Implementation (#3456 ) * first commit * state back to orig * mamba comparisions * rm file * rename file * use Tensor.einsum and mke default model 370M * Cleaned code and made a comparision test * Simplyfy pull request. Only has 1 mamba implementation now. * Update prompt * rm whitespaces * last space * remove Einops dependency * rm unused code * add tests * rm print statement * rm imports * skip CLANG * Update skipIf description * skip model test in CI and add CLANG fix * rm Device import * don't be stupid * Fix conv assign When the prompt is too short, the logic for conv_state assign messes up. This can be fixed when padding the tokenized array to min length of 4. I padded using the empty string token, but idk if proper practice is to use the PAD token * fix p1 * temp * fix jit import --------- Co-authored-by: schlimeszn <schlimeszn@gmail.com> Co-authored-by: reddyn <nikidsniper@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-28 17:49:12 -07:00
chenyu	1fa0351acb	fix DEFINE_ACC invalid_value to have same type as localtype (#3980 )	2024-03-28 19:21:17 -04:00
chenyu	b47f6cebb2	LinearizerOptions -> CompilerOptions (#3978 )	2024-03-28 17:50:23 -04:00
George Hotz	42b9d999ea	Buffer isn't always allocated (#3974 ) * buffer alloc * allocate * missing allocates * last one	2024-03-28 13:33:47 -07:00
chenyu	bfcaa2f70e	assert `__setitem__` if used other than disk (#3972 ) * assert `__setitem__` if used other than disk * that is not implemented	2024-03-28 12:16:38 -04:00
Francis Lam	7c5729a3bd	wmma: refactor to remove wmma_func and create TC funcs as needed (#3945 ) * wmma: refactor to remove wmma_func and create TC funcs as needed * test_linearizer: disable bf16 CUDA during emulation testing * cstyle: clean up creation of CUDA vec dtypes * extra/gemm: add option to accumulate to bfloat16 * cleanups * benchmark: add CUDA bfloat16 matmul * more cleanups	2024-03-27 16:43:09 -04:00
George Hotz	60639cccac	hotfix: RuntimeError for assign	2024-03-27 11:18:48 -07:00
qazal	9fb573d73c	DAG cycle asserts (#3955 ) * assert cycles * these are cycle errors * flip to positive	2024-03-27 11:09:59 -07:00
geohotstan	bd3a7d068c	correct device for validation test in model benchmark CI (#3960 ) * fix tests * add clang back for only metal * change the name to reflect CLANG being ran * add back cuda	2024-03-27 13:40:06 -04:00
chenyu	6c7df1445b	enforce UOps.CONST arg has python type based on dtype (#3952 ) added an assert in uops, remove the cast in renderer	2024-03-27 01:41:38 -04:00
George Hotz	68ca4d4276	split to schedule.py (#3949 ) * split to schedule.py * split	2024-03-26 21:02:46 -07:00
George Hotz	150ea2eb76	create engine folder and move code (#3948 ) * retry * older tf * that	2024-03-26 20:38:03 -07:00
Francis Lam	5530b0cbed	fuzz_linearizer: reduce debug verbosity and make easier for CI usage (#3942 ) * fuzz_linearizer: reduce debug verbosity and make easier for CI usage * rename FUZZ_BEAM to FUZZ_ALL_ACTIONS (not choosing a subset) * skip simple ASTs (easier to use with LOGOPS output) * don't fuzz a previously seen AST * add options to allow non-zero --expected-failures * clean up naming and use set	2024-03-26 16:25:24 -04:00
nimlgen	e2d6f76723	_alloc and _free with options (#3934 ) * _alloc has options * linter * fix hsa	2024-03-26 09:11:41 -07:00
chenyu	72d617a37d	opencl on OSX does not support fp16 extension (#3931 ) running `GPU=1 python -m pytest -rA test/test_dtype.py::TestHalfDtype::test_casts_from` on mac would fail.	2024-03-25 19:50:17 -04:00

... 59 60 61 62 63 ...

4618 Commits