tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-03 03:05:03 -05:00

Author	SHA1	Message	Date
Guy Leroy	0dba34b81c	Fix backward fn for `<` and `==` (#3037 ) * fix no grad fn for < and == * remove 2 line breaks * Remove deprecated autograd variable --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-14 20:39:52 -08:00
chenyu	db965a0c74	remove numpy from ops_torch (#3124 ) updated mnist test to cast label to int8 and avoid hacking cast issue of torch uint8	2024-01-14 22:46:57 -05:00
George Hotz	1f9aee8b6f	remove numpy from device (#3123 ) * remove numpy from device * fix tests * np item * cleanups * simplify with as_buffer * no toCPU * tinygradic * cast to scalar	2024-01-14 19:36:05 -08:00
George Hotz	ea5824657d	move fromcpu out of lazy.py (#3122 ) * move fromcpu out of lazy.py * fix abstractions2	2024-01-14 18:21:08 -08:00
George Hotz	96345061d3	hotfix: ptrdtype compare was broken	2024-01-14 18:08:22 -08:00
Jyotirmaya Mahanta	26e0faf656	make DType a dataclass (#3111 ) * remove np from DType * convert to dataclass * remove dunder hash, eq, ne overrides from ImageDType * is dataclass required for PtrDType? * fix GPU tests * reduce lines * revert changes to np * minor cleanup	2024-01-14 17:15:59 -08:00
Yixiang Gao	c13d51da1d	add device options for tests in multigpu (#3121 )	2024-01-14 15:17:47 -08:00
chenyu	79f4627fbc	fix conversation: llama generates token not prob now (#3120 )	2024-01-14 13:10:01 -05:00
chenyu	152ef7fc79	minor cleanups of onnx_ops (#3116 )	2024-01-14 02:15:24 -05:00
chenyu	fb3f8f7597	move sample inside jit for beautiful_mnist (#3115 ) also removed .realize() for jit functions since jit does it automatically now. a little more beautiful	2024-01-14 01:36:30 -05:00
chenyu	a313e63a9b	add Tensor.var (#3114 ) also updated MeanVarianceNormalization and made test_ops test tensors of var and std smaller	2024-01-14 01:11:08 -05:00
chenyu	c658aa4fbf	minor cleanup of test_disk_tensor (#3112 )	2024-01-13 20:54:58 -05:00
chenyu	9c73d2724f	cleanup ops_disk type annotation and redundant str cast (#3110 )	2024-01-13 16:56:48 -05:00
chenyu	a300fea2a4	failed test case due to cast resets shapetracker (#3109 ) cast implicitly resets shapetracker and makes it contiguous (for disk tensor), which fails for Interpreted backend if inputs contain non-contiguous st.	2024-01-13 12:46:51 -05:00
nimlgen	cf1d0a6704	no exceptions in __del__ when module creation is failed in hip/cuda (#3107 )	2024-01-13 12:03:55 -05:00
chenyu	12f28ac9d4	catch runtime error in search._time_program (#3106 ) return inf if search encountered runtime errors.	2024-01-12 21:53:13 -05:00
chenyu	f018a55ea1	update NumNode.__hash__ to be hash(self.b) (#3105 ) with this, `a:=NumNode(x) == b` implies `hash(a) == hash(b)`	2024-01-12 19:46:21 -05:00
chenyu	c3c35f9142	flag to profile mixtral - 1.7 tok/s now (#3104 )	2024-01-12 18:54:27 -05:00
chenyu	e078e2d060	add half @ half to mac benchmark (#3103 )	2024-01-12 16:38:41 -05:00
Francis Lam	ddbdb52f77	wmma: enable METAL half tensor cores and clean up cstyle (#3095 ) * wmma: enable METAL half tensor cores and clean up cstyle * revert simple_matmul rand changes and break line in tensor * added metal fp16->fp32 tensor core	2024-01-12 16:25:28 -05:00
chenyu	f96fc6e9d4	fix gpt2 with empty prompt take 2 (#3102 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:46:36 -05:00
chenyu	ca46d3541b	Revert "fix gpt2 with empty prompt" (#3101 )	2024-01-12 14:27:41 -05:00
chenyu	1d7f01bc6d	fix gpt2 with empty prompt (#3100 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:18:17 -05:00
SnakeOnex	0c49d38ba7	replace with tensor op (#3099 )	2024-01-12 14:13:40 -05:00
chenyu	f3a50b4e40	fix broadcasted logic if there's 0 in shapes (#3097 ) * fix broadcasted logic if there's 0 in shapes should always expand into 0, not the other way around. fixed matmul with 0 in input shapes. for forwards for now though, backward is more involved and would need to change 0 size shortcuts * fix tests	2024-01-12 13:32:43 -05:00
SnakeOnex	025fbf4e80	One hot in tensor.py (#3093 ) * onehot in Tensor.py * one_hot tests * works for all shapes, not just 1 * pylint * not a static method * moved around, num_classes mandatory * pylint * pylint * space & moving * formatting * moved tests	2024-01-12 13:31:18 -05:00
chenyu	7086d77db1	bugfix do not reset shapetracker of 0 size lazybuffer (#3096 ) it might be coming from an expand, and resetting results incorrect stride. caught by interpreted backend	2024-01-11 23:22:52 -05:00
Yixiang Gao	13e872b53f	add mutigpu support for llama attention (#3064 ) * add llama attention test for multigpu * test fails * kv cache trying to shrink on sharded axis * mask None works for scale dot product * kv cache seems to be working but scale dot product breaks * scaled dot product works, but the last linear layer failed * running into the reshape case where it could be wrong for multigpu * making sure it was the reshape * adding contiguous doesn't solve * need to shard more properly * remove reshape test * minor adjustment to scale dot product attention test * weights are sharded wrong * continue fix new weight sharding * clean up * fix attention when start_pos is 0 * remove print * add TODOs for the best mutigpu interface	2024-01-11 16:31:02 -08:00
chenyu	dcf7ecaaff	update jit type annotation post lazy rewrite (#3091 )	2024-01-11 15:49:30 -05:00
chenyu	0fe6904351	use device from LinearizerOptions in kernel search (#3090 ) * use device from LinearizerOptions in kernel search removed all Device.DEFAULT in search.py * pass device string for parallel pickle * device for interpreted backends in LinearizerOptions	2024-01-11 14:46:03 -05:00
chenyu	93e3f952aa	use BEAM=2 instead of BEAM=4 in cuda ci gpt2 (#3089 ) BEAM=2 is faster and less search time. investigating why BEAM2+BEAM4 is slower than BEAM2 alone	2024-01-11 13:21:06 -05:00
chenyu	f502c9b08f	minor cleanup of View.reshape (#3088 ) * minor cleanup of View.reshape removed some redundant logic * new_strides * revert that	2024-01-11 13:05:54 -05:00
chenyu	f40299c3fe	remove the third merging state in view._merge_dims (#3085 ) no logic depends on state == 0 or state == 2	2024-01-11 12:07:43 -05:00
chenyu	7f9590d357	hotfix disable flaky mac runner wino cifar (#3087 )	2024-01-11 11:57:05 -05:00
Yixiang Gao	adcc844755	cat works (#3086 )	2024-01-11 08:25:20 -08:00
chenyu	cdeab9ad97	mem_estimate is always int, not symbolic (#3083 ) * mem_estimate is always int, not symbolic op_estimate can be symbolic, but mem_estimate is always int, thus we don't need to sym_infer it. fixed some long lines too. update_stats is a very big function * operator does not need underscores	2024-01-10 23:39:51 -05:00
Francis Lam	162fa61a32	wmma: clean up device specific tensor core code (#3081 )	2024-01-10 21:03:09 -05:00
chenyu	d218d13885	minor cleanups of lazy.py (#3080 )	2024-01-10 20:17:56 -05:00
chenyu	56dda33fc6	Tensor.expand resolves the new_shape before shortcut return (#3078 ) similar to how reshape is done. also updated shrink shortcut criteria to read similar to pad	2024-01-10 14:29:15 -05:00
Yixiang Gao	6842476ca6	better test demonstration (#3077 ) * a better test demonstration * fix white space	2024-01-10 10:50:52 -08:00
chenyu	507e0afba0	fix onehot and jit in examples/transformer (#3073 ) trained to 0.999 in < 6 seconds on M1 Max consistently	2024-01-10 02:22:41 -05:00
chenyu	4342fccc83	filter_strides -> canonicalize_strides (#3072 )	2024-01-10 01:06:48 -05:00
chenyu	023f5df0e9	simpler idxs_to_idx (#3071 )	2024-01-10 00:30:10 -05:00
George Hotz	2495ca95c7	early gate the graph (#3070 )	2024-01-09 20:17:13 -08:00
George Hotz	ff0d6e4551	jit autorealizes output (#3069 )	2024-01-09 20:10:22 -08:00
George Hotz	ae83733431	hotfix: examples/transformer.py	2024-01-09 19:28:09 -08:00
chenyu	145718a90f	unbind view or shapetracker also returns var_val (#3067 ) * unbind view or shapetracker also returns var_val 4% faster for llama compile time * one line less * unbound_views	2024-01-09 21:45:05 -05:00
jxdv	ef3aa6d7fb	update gh actions (#3033 ) * update checkout actions * update upload artifact * update setup python --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-09 17:52:22 -08:00
George Hotz	3f80c1a098	speedtweaks3: apply shouldn't use the tensor constructor (#3065 ) * speedtweaks3: apply shouldn't use the tensor constructor * replace 0 size with CONST, not 0 in shape	2024-01-09 17:42:33 -08:00
George Hotz	0abe72b677	hotfix: use is for enum compare, a few more	2024-01-09 16:53:13 -08:00

1 2 3 4 5 ...

3386 Commits