tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 05:48:08 -05:00

Author	SHA1	Message	Date
chenyu	f54508549f	don't search conv weight init in speed_v_theoretical (#7943 )	2024-11-28 10:03:18 -05:00
geohotstan	cea5853cfa	add Tensor.scatter (#7737 ) * working I think * where are my onnx scatter tests?? * forward_only for now * try if nan hack fix NV * looks like issue is different... CUDA WHY * oops that was wrong. Try if this fixes CUDA * simpler multiply * actually finish this up tmrw morning :x * fix tests? * improve tests * improve test and implementation * fix ruff * complete but lots of expected failure... * reviewed tests * add onnx tests * is this a processing op? * add return type to indicate that it's not in-place * final cleanups * use or and improve tests a little * add masked_index_select * call it masked_setitem instead * try * FIXED --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-27 10:52:04 -05:00
George Hotz	9d0038bccb	small changes from block linearizer [pr] (#7888 ) * small changes from block linearizer [pr] * fix test_gc	2024-11-25 15:27:04 +08:00
chenyu	5c5b1b994c	less flaky benchmarks (#7855 ) JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830	2024-11-22 16:39:39 -05:00
qazal	9828277c03	view doesn't have buffer, fix the tests [pr] (#7841 ) * view doesn't have buffer, fix the tests [pr] * need assigns	2024-11-22 20:41:55 +08:00
George Hotz	e9ae2ccd09	_prg to match _buf [pr] (#7816 )	2024-11-21 12:44:48 +08:00
George Hotz	c5d458ce02	BufferSpec and ProgramSpec [pr] (#7814 ) * BufferSpec and ProgramSpec [pr] * delete preallocate, it's unused * Revert "delete preallocate, it's unused" This reverts commit `dcfcfaccde`.	2024-11-21 12:18:05 +08:00
George Hotz	9df5a62c5e	unify to HWQueue [pr] (#7812 ) * unify to HWCommandQueue [pr] * all is HWQueue	2024-11-21 10:33:08 +08:00
chenyu	11cea00090	lower vs_theoretical conv tflops threshold for nv (#7811 ) less flaky	2024-11-20 20:03:49 -05:00
George Hotz	eb0bb7dc0b	final dname to device [pr] (#7806 ) * final dname to device [pr] * oops, fix nv	2024-11-20 20:20:28 +08:00
George Hotz	bc977fec53	dname -> device [pr] (#7804 ) * dname -> device [pr] * a few more * only one left	2024-11-20 17:57:14 +08:00
George Hotz	d71fe7faa5	rename allocator methods to not conflict [pr] (#7788 ) * rename allocator methods to not conflict [pr] * forgot those * transfer + offset	2024-11-20 00:10:29 +08:00
qazal	1e31b5ba6b	hotfix: ctx doesn't impact process replay [pr] (#7785 )	2024-11-19 20:17:01 +08:00
ignaciosica	597a239e28	Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725 ) * remove unaryops * remove ternaryops * remove metaops * hotfix * remove binaryops * hotfix: test_pattern_matcher --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-16 20:56:56 +08:00
qazal	e84d089ef1	delete ReduceOps, only use REDUCE_AXIS (#7667 )	2024-11-13 19:04:27 +08:00
chenyu	1884f021e3	add conv3x3 to speed_v_theoretical (#7658 ) * add conv3x3 to speed_v_theoretical * show test duration	2024-11-12 16:41:56 -05:00
chenyu	962dafb467	use randn in speed_v_theoretical instead of rand (#7656 ) * use randn in speed_v_theoretical instead of rand this made green gemv 20% faster... but why? * update threshold	2024-11-12 15:00:32 -05:00
chenyu	6159790ab8	add gemv to speed_v_theoretical (#7654 ) * add gemv to speed_v_theoretical getting ~300GB/s if we just count the memory of inputs and output * better green numbers * flip	2024-11-12 11:19:35 -05:00
chenyu	99f29e50b2	update speed_v_theoretical numbers (#7647 ) better amd after set compute profile	2024-11-11 20:05:13 -05:00
chenyu	773d5b60bf	beam benchmark tests (#7638 ) * beam benchmark tests * lower AMD number somehow * less flaky	2024-11-11 18:11:18 -05:00
nimlgen	4d81b7952a	qcom match texture/sampler descriptors to OpenCL (#7622 ) * qcom ioctl compare more regs * bug fix	2024-11-11 21:56:51 +03:00
chenyu	8ca422e21a	script to compare kernel opt with BEAM (#7604 ) intersting that on m1 max hcopt wins BEAM 2 about 20% of the time	2024-11-08 17:40:28 -05:00
Harald Schäfer	e7cbc29f48	openpilot benchmark: add cast from numpy to benchmark (#7593 ) * openpilot benchmark: add cast from numpy to benchmark * whitespace * comment	2024-11-08 19:31:00 +08:00
George Hotz	205befa788	move is_dtype_supported to device [pr] (#7575 )	2024-11-07 20:38:03 +08:00
Carl Basho	630a7f37cf	update tests (#7554 ) Co-authored-by: John Doe <null@mail.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-05 11:35:15 -05:00
chenyu	207bca6cea	set PAGE_SIZE=1 and generate new dataset (#7559 ) 13080 rows in total. both generating and loading this are pretty broken now. filters are wrong for example	2024-11-05 11:25:01 -05:00
George Hotz	99bd4372a5	Ops.ALU is no more, the arg is just an op (#7525 ) * op arg alu [pr] * more * more passing * fix more tests * more tests passing * fix single failing test * so much cleaner * noop to not have process replay trigger * fix ptx	2024-11-05 00:22:22 +08:00
George Hotz	0c19b6298b	rename ops to have unique names (#7522 )	2024-11-04 17:09:45 +08:00
George Hotz	c8bf09b7d4	s/UOps/Ops (#7500 ) * s/UOps/Ops [pr] * fix	2024-11-03 11:26:10 +08:00
qazal	e955aa1bee	hotfix: process replay (#7418 )	2024-10-30 22:45:40 +02:00
George Hotz	4e2895f8d2	safe changes from new dtype branch [pr] (#7397 ) * safe changes from new dtype branch [pr] * only image test on GPU	2024-10-30 17:18:48 +08:00
qazal	51c0c8d27e	cachable small graph rewrite (#7371 )	2024-10-29 22:28:13 +08:00
qazal	e46edc22aa	use unittest helpers in TestTensorMetadata [pr] (#7329 ) * use unittest helpers in TestTensorMetadata [pr] * fix that * 5 args	2024-10-28 18:38:30 +08:00
qazal	8d9459f281	always run process replay with contextvars (#7323 ) * always run process replay with contextvars [pr] * not the last two * extra * no pr	2024-10-27 20:44:42 +02:00
nimlgen	293714610a	capture beam log runtime errors (#7311 )	2024-10-26 13:59:45 +03:00
qazal	d482d927a8	hotfix: nobody uses [run_process_replay] [pr] (#7264 )	2024-10-24 13:37:29 +03:00
chenyu	f890d1cbbd	remove PUSH_PERMUTES from external_test_opt (#7232 ) remove old comments and update kernel count for test_convnext	2024-10-23 00:11:34 -04:00
qazal	dae908299e	full_ast_rewrite api with ScheduleItemContext (#7223 )	2024-10-22 23:17:05 +03:00
chenyu	ea016b55d1	don't throw in fuzz_linearizer (#7148 ) already broken on master and needs fix. don't throw to not block other pr	2024-10-18 09:28:30 -04:00
nimlgen	45db7d9045	fuzz qcom vs opencl (#7130 ) * fuzz qcom vs opencl * fix nv * bettre? * typo * open both devs	2024-10-17 18:49:08 +03:00
George Hotz	ded1b38b84	minor dtype cleanup [pr] (#7124 ) * minor dtype cleanup [pr] * use ptr() function	2024-10-17 17:41:23 +08:00
nimlgen	39ab67e9ef	beam capture and replay in fuzz (#7099 ) * beam capture and reply in fuzz * clean a bit	2024-10-16 20:26:58 +03:00
qazal	40f33c110b	big graph var_vals as rewrite context (#7007 ) * var_vals as rewrite context * no default arg * add st var_vals * delete some stuff * add the rewrite rule again * extra * this whole part is preschedule * test with a second context * redo * i always forget tensor variable	2024-10-16 07:31:44 +03:00
qazal	390171d686	delete SAVE_SCHEDULE=1 [pr] (#7087 )	2024-10-16 07:13:20 +03:00
George Hotz	3169cb386d	remove graph [pr] (#7085 )	2024-10-16 11:40:07 +08:00
nimlgen	b025495e5c	fuzz nv vs cuda (#7066 ) * fuzz nv vs cuda * fixes * smth * um * cmp the same * dnrt * correct gpfifo scan * fix	2024-10-15 22:22:40 +03:00
qazal	09de958855	move print_diff to test/helpers (#7071 )	2024-10-15 22:00:39 +03:00
chenyu	fbaab30fe3	add timing to fuzz_linearizer (#7056 ) and applied smaller FUZZ_MAX_SIZE. this is getting quite slow in CI	2024-10-14 11:57:41 -04:00
qazal	0ef186d4be	scheduler internal api cleanups [pr] (#7052 ) * delete external_benchmark_ast.py [pr] * cleanup 2 * random	2024-10-14 15:56:10 +03:00
chenyu	bd8ecf7fd6	remove NumNode (#7035 )	2024-10-13 16:42:19 -04:00

... 5 6 7 8 9 ...

870 Commits