tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-30 17:28:24 -05:00

Author	SHA1	Message	Date
geohotstan	423d823c50	add GatherND and ScatterND to onnx ops (#8241 ) * implemented * this implementation is now correct * this is fine I guess * better variable names * finally correct gathernd * add a note * eh just leave it at this for now * teeny adjustment	2024-12-19 00:35:04 -05:00
chenyu	9789a83064	hotfix DEBUG in speed_v_theoretical.py conv (#8266 ) infinite loop with manual DEBUG set `DEBUG=2 python test/external/speed_v_theoretical.py -k conv` ``` File "/Users/chenyu/code/tinygrad/tinygrad/helpers.py", line 95, in __ge__ def __ge__(self, x): return self.value >= x ^^^^^^^^^^^^^^^ [Previous line repeated 4984 more times] RecursionError: maximum recursion depth exceeded in comparison ```	2024-12-15 19:44:45 -05:00
qazal	67e66ac1ab	hotfix: schedule_uop in process replay (#8260 ) * hotfix: schedule_uop in process replay * notes	2024-12-15 21:24:54 +08:00
chenyu	62e19649c0	lower test_conv_3x3_256_32_32_256_256 (#8226 ) tiny7 is slow	2024-12-13 17:15:53 -05:00
qazal	5864627abe	process replay filter warnings [pr] (#8199 )	2024-12-13 17:43:43 +08:00
George Hotz	8a04a3a77a	rename LazyBuffer -> UOp [pr] (#8169 ) * rename LazyBuffer -> UOp [pr] * fix docs	2024-12-11 16:15:52 -08:00
chenyu	155f7df599	lower test_gemm_4096 expectation on green (#8152 ) getting 119 sometimes, so lowered to 115	2024-12-10 18:05:12 -05:00
qazal	07b6d5cf63	assign early folding (#8093 ) * assign early folding [pr] * move to to_si * - * fix generate_dataset * diff too big * no recreation, no diff * gzip * new sops from tiny10 * final try	2024-12-07 17:02:55 +08:00
chenyu	564b3a3e1b	onnx Bitwise ops (#8095 ) free stuff!	2024-12-06 16:58:09 -05:00
chenyu	d000c08f04	fix return type of Tensor.pow (#8091 ) int to power of int should return int etc, it hints that we would like to have Ops.POW	2024-12-06 13:38:29 -05:00
geohotstan	5184410fc3	combine get inputs and type_parse function in onnx [fixed] (#8081 ) * 1 is simpler than 2 * variable name * change error wording * shapes for sequence type must be homogeneous * bug fix for model benchmark * fix comments too --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-06 12:34:47 -05:00
chenyu	b73d9a7d24	Revert "combine get inputs and type_parse function in onnx (#8069 )" (#8079 ) This reverts commit `074a67a6eb`.	2024-12-06 08:04:21 -05:00
geohotstan	074a67a6eb	combine get inputs and type_parse function in onnx (#8069 ) * 1 is simpler than 2 * variable name * change error wording * shapes for sequence type must be homogeneous	2024-12-06 07:42:35 -05:00
chenyu	5c6ed5dba6	lower test_conv_3x3_256_32_32_256_256 expectation (#8060 ) failed https://github.com/tinygrad/tinygrad/actions/runs/12182799887/job/33982676812#step:9:210	2024-12-05 10:30:56 -05:00
George Hotz	20878be2af	lower test_gemv_4096_16384 expectations	2024-12-05 12:08:26 +08:00
geohotstan	5ce8090d42	simple onnx_ops cleanups (#8003 ) * simple clean ups first * more work * kinda have adam * ooo momentum worked nicely * almost there * wow.. is the onnx test wrong * nicer optim stuff * just skip that test * small comment changes * use naming convention from other parts of codebase --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-04 15:33:03 -05:00
chenyu	0693158d28	lower v_theoretical gemv on red (#8042 ) tiny7 is still slower https://github.com/tinygrad/tinygrad/actions/runs/12166149038/job/33931736130#step:8:209	2024-12-04 13:59:40 -05:00
George Hotz	08657cb7b0	hotfix: bump expectations in speed_v_theoretical	2024-12-04 19:00:33 +08:00
George Hotz	ea65c79ba2	hotfix: don't spam BEAM debug in speed_v_theoretical	2024-12-04 18:47:16 +08:00
George Hotz	09b00b1b04	hotfix: use kernel timings instead of python timings in speed_v_theoretical	2024-12-04 18:36:17 +08:00
uuuvn	e9c5b23ba1	Use MTLCompiler directly (v2) (#7920 ) * Use MTLCompiler directly (v2) * to_block_literal and REQUEST_TYPE_COMPILE * Rewrite command encoding * Revert to_block_literal * Maybe that's more readable to some people? * Typo and comment about stdlib caching * Update ops_metal.py * Update ops_metal.py * Update ops_metal.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-04 16:36:48 +08:00
George Hotz	09eac42fd6	cache indexed uops in st [pr] (#8008 ) * cache indexed uops in st [pr] * remove arg from range	2024-12-03 21:27:07 +08:00
George Hotz	b8bf5b2787	minor uop speedups [pr] (#8002 ) * minor uop cleaner [pr] * free uop creation speed by removing WeakValueDictionary * a lil faster * disable that test * lines * and it doesn't print non hit patterns	2024-12-03 17:04:48 +08:00
George Hotz	0905f87b68	hotfix: print only kernel time	2024-12-03 14:25:08 +08:00
chenyu	b91fa24387	script to run regressed sd conv on metal (#7995 ) * script to run regressed sd conv on metal this and other similar `conv2d + add` kernels contributed to most of the speed regression * # ruff: noqa: E501	2024-12-02 15:34:27 -05:00
qazal	b797aee720	uop global buf number tracking try 2 [pr] (#7912 ) * uop buffer init small refactor [pr] * add early * this way it doesn't need late * buffer_num * itertools.count * count from 0 * down to 380	2024-12-02 14:45:17 +08:00
George Hotz	cbcc1c20eb	second try at block linearize (#7892 ) * second try at block linearize * weeee, works for lil matmul * it's so beautiful * test tiny passes * fix bugs * combine matching BLOCKENDS * wrapping * test lin failures passes * those failures were fake * flip sort order * fix ptx tests * deal with store better * dumb ptx fix * expect less * reduce lines * reduce lines * less lines and cleaner * no defaultdict * tighter * simpler block_parent_count	2024-12-02 13:43:09 +08:00
George Hotz	6c1efb9a72	hotfix: amd gemv was flaky	2024-12-02 11:08:24 +08:00
chenyu	bb23469f93	lower conv threshold on red (#7948 )	2024-11-28 13:31:06 -05:00
chenyu	f54508549f	don't search conv weight init in speed_v_theoretical (#7943 )	2024-11-28 10:03:18 -05:00
geohotstan	cea5853cfa	add Tensor.scatter (#7737 ) * working I think * where are my onnx scatter tests?? * forward_only for now * try if nan hack fix NV * looks like issue is different... CUDA WHY * oops that was wrong. Try if this fixes CUDA * simpler multiply * actually finish this up tmrw morning :x * fix tests? * improve tests * improve test and implementation * fix ruff * complete but lots of expected failure... * reviewed tests * add onnx tests * is this a processing op? * add return type to indicate that it's not in-place * final cleanups * use or and improve tests a little * add masked_index_select * call it masked_setitem instead * try * FIXED --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-27 10:52:04 -05:00
George Hotz	9d0038bccb	small changes from block linearizer [pr] (#7888 ) * small changes from block linearizer [pr] * fix test_gc	2024-11-25 15:27:04 +08:00
chenyu	5c5b1b994c	less flaky benchmarks (#7855 ) JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830	2024-11-22 16:39:39 -05:00
qazal	9828277c03	view doesn't have buffer, fix the tests [pr] (#7841 ) * view doesn't have buffer, fix the tests [pr] * need assigns	2024-11-22 20:41:55 +08:00
George Hotz	e9ae2ccd09	_prg to match _buf [pr] (#7816 )	2024-11-21 12:44:48 +08:00
George Hotz	c5d458ce02	BufferSpec and ProgramSpec [pr] (#7814 ) * BufferSpec and ProgramSpec [pr] * delete preallocate, it's unused * Revert "delete preallocate, it's unused" This reverts commit `dcfcfaccde`.	2024-11-21 12:18:05 +08:00
George Hotz	9df5a62c5e	unify to HWQueue [pr] (#7812 ) * unify to HWCommandQueue [pr] * all is HWQueue	2024-11-21 10:33:08 +08:00
chenyu	11cea00090	lower vs_theoretical conv tflops threshold for nv (#7811 ) less flaky	2024-11-20 20:03:49 -05:00
George Hotz	eb0bb7dc0b	final dname to device [pr] (#7806 ) * final dname to device [pr] * oops, fix nv	2024-11-20 20:20:28 +08:00
George Hotz	bc977fec53	dname -> device [pr] (#7804 ) * dname -> device [pr] * a few more * only one left	2024-11-20 17:57:14 +08:00
George Hotz	d71fe7faa5	rename allocator methods to not conflict [pr] (#7788 ) * rename allocator methods to not conflict [pr] * forgot those * transfer + offset	2024-11-20 00:10:29 +08:00
qazal	1e31b5ba6b	hotfix: ctx doesn't impact process replay [pr] (#7785 )	2024-11-19 20:17:01 +08:00
ignaciosica	597a239e28	Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725 ) * remove unaryops * remove ternaryops * remove metaops * hotfix * remove binaryops * hotfix: test_pattern_matcher --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-16 20:56:56 +08:00
qazal	e84d089ef1	delete ReduceOps, only use REDUCE_AXIS (#7667 )	2024-11-13 19:04:27 +08:00
chenyu	1884f021e3	add conv3x3 to speed_v_theoretical (#7658 ) * add conv3x3 to speed_v_theoretical * show test duration	2024-11-12 16:41:56 -05:00
chenyu	962dafb467	use randn in speed_v_theoretical instead of rand (#7656 ) * use randn in speed_v_theoretical instead of rand this made green gemv 20% faster... but why? * update threshold	2024-11-12 15:00:32 -05:00
chenyu	6159790ab8	add gemv to speed_v_theoretical (#7654 ) * add gemv to speed_v_theoretical getting ~300GB/s if we just count the memory of inputs and output * better green numbers * flip	2024-11-12 11:19:35 -05:00
chenyu	99f29e50b2	update speed_v_theoretical numbers (#7647 ) better amd after set compute profile	2024-11-11 20:05:13 -05:00
chenyu	773d5b60bf	beam benchmark tests (#7638 ) * beam benchmark tests * lower AMD number somehow * less flaky	2024-11-11 18:11:18 -05:00
nimlgen	4d81b7952a	qcom match texture/sampler descriptors to OpenCL (#7622 ) * qcom ioctl compare more regs * bug fix	2024-11-11 21:56:51 +03:00

1 2 3 4 5 ...

599 Commits