tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-28 08:17:58 -05:00

Author	SHA1	Message	Date
Paul Gustafson	6bb65cd02e	fix off-by-one error in st_equal (#3131 ) * fix off by one error * whitespace	2024-01-15 11:32:13 -08:00
George Hotz	44c05919c1	dtype fmt (#3132 ) * dtype fmt * three ways to access	2024-01-15 11:31:54 -08:00
nimlgen	5ec66938de	remove np from metal graph (#3129 )	2024-01-15 11:44:35 -05:00
Jyotirmaya Mahanta	2ef09ca641	update test_ptr_ne (#3130 )	2024-01-15 11:36:29 -05:00
chenyu	e39cd3e7f2	update env_vars.md (#3127 ) mostly removed deprecated ones. not clear how to maintain this especially for extra/examples	2024-01-15 01:06:56 -05:00
chenyu	537fb8b0b8	separate try except blocks in onnx2torch in model benchmark (#3126 ) exceptions can be raised from either model conversion or individual backend failed. openpilot on torch mps works, but does not work with torch cpu. seperate the expcetion block so that the benchmark can inlcude torch mps for openpilot.	2024-01-15 00:39:33 -05:00
Guy Leroy	0dba34b81c	Fix backward fn for `<` and `==` (#3037 ) * fix no grad fn for < and == * remove 2 line breaks * Remove deprecated autograd variable --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-14 20:39:52 -08:00
chenyu	db965a0c74	remove numpy from ops_torch (#3124 ) updated mnist test to cast label to int8 and avoid hacking cast issue of torch uint8	2024-01-14 22:46:57 -05:00
George Hotz	1f9aee8b6f	remove numpy from device (#3123 ) * remove numpy from device * fix tests * np item * cleanups * simplify with as_buffer * no toCPU * tinygradic * cast to scalar	2024-01-14 19:36:05 -08:00
George Hotz	ea5824657d	move fromcpu out of lazy.py (#3122 ) * move fromcpu out of lazy.py * fix abstractions2	2024-01-14 18:21:08 -08:00
George Hotz	96345061d3	hotfix: ptrdtype compare was broken	2024-01-14 18:08:22 -08:00
Jyotirmaya Mahanta	26e0faf656	make DType a dataclass (#3111 ) * remove np from DType * convert to dataclass * remove dunder hash, eq, ne overrides from ImageDType * is dataclass required for PtrDType? * fix GPU tests * reduce lines * revert changes to np * minor cleanup	2024-01-14 17:15:59 -08:00
Yixiang Gao	c13d51da1d	add device options for tests in multigpu (#3121 )	2024-01-14 15:17:47 -08:00
chenyu	79f4627fbc	fix conversation: llama generates token not prob now (#3120 )	2024-01-14 13:10:01 -05:00
chenyu	152ef7fc79	minor cleanups of onnx_ops (#3116 )	2024-01-14 02:15:24 -05:00
chenyu	fb3f8f7597	move sample inside jit for beautiful_mnist (#3115 ) also removed .realize() for jit functions since jit does it automatically now. a little more beautiful	2024-01-14 01:36:30 -05:00
chenyu	a313e63a9b	add Tensor.var (#3114 ) also updated MeanVarianceNormalization and made test_ops test tensors of var and std smaller	2024-01-14 01:11:08 -05:00
chenyu	c658aa4fbf	minor cleanup of test_disk_tensor (#3112 )	2024-01-13 20:54:58 -05:00
chenyu	9c73d2724f	cleanup ops_disk type annotation and redundant str cast (#3110 )	2024-01-13 16:56:48 -05:00
chenyu	a300fea2a4	failed test case due to cast resets shapetracker (#3109 ) cast implicitly resets shapetracker and makes it contiguous (for disk tensor), which fails for Interpreted backend if inputs contain non-contiguous st.	2024-01-13 12:46:51 -05:00
nimlgen	cf1d0a6704	no exceptions in __del__ when module creation is failed in hip/cuda (#3107 )	2024-01-13 12:03:55 -05:00
chenyu	12f28ac9d4	catch runtime error in search._time_program (#3106 ) return inf if search encountered runtime errors.	2024-01-12 21:53:13 -05:00
chenyu	f018a55ea1	update NumNode.__hash__ to be hash(self.b) (#3105 ) with this, `a:=NumNode(x) == b` implies `hash(a) == hash(b)`	2024-01-12 19:46:21 -05:00
chenyu	c3c35f9142	flag to profile mixtral - 1.7 tok/s now (#3104 )	2024-01-12 18:54:27 -05:00
chenyu	e078e2d060	add half @ half to mac benchmark (#3103 )	2024-01-12 16:38:41 -05:00
Francis Lam	ddbdb52f77	wmma: enable METAL half tensor cores and clean up cstyle (#3095 ) * wmma: enable METAL half tensor cores and clean up cstyle * revert simple_matmul rand changes and break line in tensor * added metal fp16->fp32 tensor core	2024-01-12 16:25:28 -05:00
chenyu	f96fc6e9d4	fix gpt2 with empty prompt take 2 (#3102 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:46:36 -05:00
chenyu	ca46d3541b	Revert "fix gpt2 with empty prompt" (#3101 )	2024-01-12 14:27:41 -05:00
chenyu	1d7f01bc6d	fix gpt2 with empty prompt (#3100 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:18:17 -05:00
SnakeOnex	0c49d38ba7	replace with tensor op (#3099 )	2024-01-12 14:13:40 -05:00
chenyu	f3a50b4e40	fix broadcasted logic if there's 0 in shapes (#3097 ) * fix broadcasted logic if there's 0 in shapes should always expand into 0, not the other way around. fixed matmul with 0 in input shapes. for forwards for now though, backward is more involved and would need to change 0 size shortcuts * fix tests	2024-01-12 13:32:43 -05:00
SnakeOnex	025fbf4e80	One hot in tensor.py (#3093 ) * onehot in Tensor.py * one_hot tests * works for all shapes, not just 1 * pylint * not a static method * moved around, num_classes mandatory * pylint * pylint * space & moving * formatting * moved tests	2024-01-12 13:31:18 -05:00
chenyu	7086d77db1	bugfix do not reset shapetracker of 0 size lazybuffer (#3096 ) it might be coming from an expand, and resetting results incorrect stride. caught by interpreted backend	2024-01-11 23:22:52 -05:00
Yixiang Gao	13e872b53f	add mutigpu support for llama attention (#3064 ) * add llama attention test for multigpu * test fails * kv cache trying to shrink on sharded axis * mask None works for scale dot product * kv cache seems to be working but scale dot product breaks * scaled dot product works, but the last linear layer failed * running into the reshape case where it could be wrong for multigpu * making sure it was the reshape * adding contiguous doesn't solve * need to shard more properly * remove reshape test * minor adjustment to scale dot product attention test * weights are sharded wrong * continue fix new weight sharding * clean up * fix attention when start_pos is 0 * remove print * add TODOs for the best mutigpu interface	2024-01-11 16:31:02 -08:00
chenyu	dcf7ecaaff	update jit type annotation post lazy rewrite (#3091 )	2024-01-11 15:49:30 -05:00
chenyu	0fe6904351	use device from LinearizerOptions in kernel search (#3090 ) * use device from LinearizerOptions in kernel search removed all Device.DEFAULT in search.py * pass device string for parallel pickle * device for interpreted backends in LinearizerOptions	2024-01-11 14:46:03 -05:00
chenyu	93e3f952aa	use BEAM=2 instead of BEAM=4 in cuda ci gpt2 (#3089 ) BEAM=2 is faster and less search time. investigating why BEAM2+BEAM4 is slower than BEAM2 alone	2024-01-11 13:21:06 -05:00
chenyu	f502c9b08f	minor cleanup of View.reshape (#3088 ) * minor cleanup of View.reshape removed some redundant logic * new_strides * revert that	2024-01-11 13:05:54 -05:00
chenyu	f40299c3fe	remove the third merging state in view._merge_dims (#3085 ) no logic depends on state == 0 or state == 2	2024-01-11 12:07:43 -05:00
chenyu	7f9590d357	hotfix disable flaky mac runner wino cifar (#3087 )	2024-01-11 11:57:05 -05:00
Yixiang Gao	adcc844755	cat works (#3086 )	2024-01-11 08:25:20 -08:00
chenyu	cdeab9ad97	mem_estimate is always int, not symbolic (#3083 ) * mem_estimate is always int, not symbolic op_estimate can be symbolic, but mem_estimate is always int, thus we don't need to sym_infer it. fixed some long lines too. update_stats is a very big function * operator does not need underscores	2024-01-10 23:39:51 -05:00
Francis Lam	162fa61a32	wmma: clean up device specific tensor core code (#3081 )	2024-01-10 21:03:09 -05:00
chenyu	d218d13885	minor cleanups of lazy.py (#3080 )	2024-01-10 20:17:56 -05:00
chenyu	56dda33fc6	Tensor.expand resolves the new_shape before shortcut return (#3078 ) similar to how reshape is done. also updated shrink shortcut criteria to read similar to pad	2024-01-10 14:29:15 -05:00
Yixiang Gao	6842476ca6	better test demonstration (#3077 ) * a better test demonstration * fix white space	2024-01-10 10:50:52 -08:00
chenyu	507e0afba0	fix onehot and jit in examples/transformer (#3073 ) trained to 0.999 in < 6 seconds on M1 Max consistently	2024-01-10 02:22:41 -05:00
chenyu	4342fccc83	filter_strides -> canonicalize_strides (#3072 )	2024-01-10 01:06:48 -05:00
chenyu	023f5df0e9	simpler idxs_to_idx (#3071 )	2024-01-10 00:30:10 -05:00
George Hotz	2495ca95c7	early gate the graph (#3070 )	2024-01-09 20:17:13 -08:00

1 2 3 4 5 ...

3392 Commits