tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
Francis Lata	99efa2cfde	Merge branch 'master' into retinanet_mlperf	2024-11-18 04:42:57 -08:00
ignaciosica	597a239e28	Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725 ) * remove unaryops * remove ternaryops * remove metaops * hotfix * remove binaryops * hotfix: test_pattern_matcher --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-16 20:56:56 +08:00
Francis Lata	a0c0a77f54	Merge branch 'master' into retinanet_mlperf	2024-11-13 21:30:12 -08:00
qazal	e84d089ef1	delete ReduceOps, only use REDUCE_AXIS (#7667 )	2024-11-13 19:04:27 +08:00
chenyu	1884f021e3	add conv3x3 to speed_v_theoretical (#7658 ) * add conv3x3 to speed_v_theoretical * show test duration	2024-11-12 16:41:56 -05:00
chenyu	962dafb467	use randn in speed_v_theoretical instead of rand (#7656 ) * use randn in speed_v_theoretical instead of rand this made green gemv 20% faster... but why? * update threshold	2024-11-12 15:00:32 -05:00
chenyu	6159790ab8	add gemv to speed_v_theoretical (#7654 ) * add gemv to speed_v_theoretical getting ~300GB/s if we just count the memory of inputs and output * better green numbers * flip	2024-11-12 11:19:35 -05:00
Francis Lata	0aad640465	Merge branch 'master' into retinanet_mlperf	2024-11-12 02:45:23 -08:00
chenyu	99f29e50b2	update speed_v_theoretical numbers (#7647 ) better amd after set compute profile	2024-11-11 20:05:13 -05:00
chenyu	773d5b60bf	beam benchmark tests (#7638 ) * beam benchmark tests * lower AMD number somehow * less flaky	2024-11-11 18:11:18 -05:00
nimlgen	4d81b7952a	qcom match texture/sampler descriptors to OpenCL (#7622 ) * qcom ioctl compare more regs * bug fix	2024-11-11 21:56:51 +03:00
Francis Lata	bf2dc3ae33	Merge branch 'master' into retinanet_mlperf	2024-11-09 17:00:30 -08:00
chenyu	8ca422e21a	script to compare kernel opt with BEAM (#7604 ) intersting that on m1 max hcopt wins BEAM 2 about 20% of the time	2024-11-08 17:40:28 -05:00
Harald Schäfer	e7cbc29f48	openpilot benchmark: add cast from numpy to benchmark (#7593 ) * openpilot benchmark: add cast from numpy to benchmark * whitespace * comment	2024-11-08 19:31:00 +08:00
George Hotz	205befa788	move is_dtype_supported to device [pr] (#7575 )	2024-11-07 20:38:03 +08:00
Carl Basho	630a7f37cf	update tests (#7554 ) Co-authored-by: John Doe <null@mail.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-05 11:35:15 -05:00
chenyu	207bca6cea	set PAGE_SIZE=1 and generate new dataset (#7559 ) 13080 rows in total. both generating and loading this are pretty broken now. filters are wrong for example	2024-11-05 11:25:01 -05:00
Francis Lata	bb6f27d2f3	Merge branch 'master' into retinanet_mlperf	2024-11-04 19:19:22 -08:00
George Hotz	99bd4372a5	Ops.ALU is no more, the arg is just an op (#7525 ) * op arg alu [pr] * more * more passing * fix more tests * more tests passing * fix single failing test * so much cleaner * noop to not have process replay trigger * fix ptx	2024-11-05 00:22:22 +08:00
George Hotz	0c19b6298b	rename ops to have unique names (#7522 )	2024-11-04 17:09:45 +08:00
George Hotz	c8bf09b7d4	s/UOps/Ops (#7500 ) * s/UOps/Ops [pr] * fix	2024-11-03 11:26:10 +08:00
qazal	e955aa1bee	hotfix: process replay (#7418 )	2024-10-30 22:45:40 +02:00
George Hotz	4e2895f8d2	safe changes from new dtype branch [pr] (#7397 ) * safe changes from new dtype branch [pr] * only image test on GPU	2024-10-30 17:18:48 +08:00
qazal	51c0c8d27e	cachable small graph rewrite (#7371 )	2024-10-29 22:28:13 +08:00
qazal	e46edc22aa	use unittest helpers in TestTensorMetadata [pr] (#7329 ) * use unittest helpers in TestTensorMetadata [pr] * fix that * 5 args	2024-10-28 18:38:30 +08:00
qazal	8d9459f281	always run process replay with contextvars (#7323 ) * always run process replay with contextvars [pr] * not the last two * extra * no pr	2024-10-27 20:44:42 +02:00
Francis Lata	e5d37f26f6	Merge branch 'master' into retinanet_mlperf	2024-10-26 15:36:23 -07:00
nimlgen	293714610a	capture beam log runtime errors (#7311 )	2024-10-26 13:59:45 +03:00
Francis Lata	8a5cbb14e4	Merge branch 'master' into retinanet_mlperf	2024-10-25 22:56:30 -07:00
Francis Lata	6e3efd4ed6	add validation set test	2024-10-25 22:55:49 -07:00
Francis Lata	2586555bd3	clean up reference dataset implementation + ruff changes	2024-10-25 22:13:48 -07:00
Francis Lata	1344871a15	add back normalization and negate it in test	2024-10-25 21:50:42 -07:00
Francis Lata	4b21a8fb8d	got dataloader with normalize working	2024-10-25 20:25:07 -07:00
qazal	d482d927a8	hotfix: nobody uses [run_process_replay] [pr] (#7264 )	2024-10-24 13:37:29 +03:00
chenyu	f890d1cbbd	remove PUSH_PERMUTES from external_test_opt (#7232 ) remove old comments and update kernel count for test_convnext	2024-10-23 00:11:34 -04:00
qazal	dae908299e	full_ast_rewrite api with ScheduleItemContext (#7223 )	2024-10-22 23:17:05 +03:00
Francis Lata	967438ca71	Merge branch 'master' into retinanet_mlperf	2024-10-22 02:48:51 -07:00
Francis Lata	ec146da5cf	trim dataloader related code needed from ref	2024-10-22 02:48:11 -07:00
Francis Lata	d9d65b9537	cleanup dataloader test and revert shm path	2024-10-19 17:32:58 -07:00
chenyu	ea016b55d1	don't throw in fuzz_linearizer (#7148 ) already broken on master and needs fix. don't throw to not block other pr	2024-10-18 09:28:30 -04:00
nimlgen	45db7d9045	fuzz qcom vs opencl (#7130 ) * fuzz qcom vs opencl * fix nv * bettre? * typo * open both devs	2024-10-17 18:49:08 +03:00
George Hotz	ded1b38b84	minor dtype cleanup [pr] (#7124 ) * minor dtype cleanup [pr] * use ptr() function	2024-10-17 17:41:23 +08:00
Francis Lata	4bebe61a9c	add dataloader + test	2024-10-16 15:38:47 -04:00
Francis Lata	3d857d758e	Merge branch 'master' into retinanet_mlperf	2024-10-16 15:36:37 -04:00
nimlgen	39ab67e9ef	beam capture and replay in fuzz (#7099 ) * beam capture and reply in fuzz * clean a bit	2024-10-16 20:26:58 +03:00
Francis Lata	498141c579	Merge branch 'master' into retinanet_mlperf	2024-10-16 10:14:39 -04:00
qazal	40f33c110b	big graph var_vals as rewrite context (#7007 ) * var_vals as rewrite context * no default arg * add st var_vals * delete some stuff * add the rewrite rule again * extra * this whole part is preschedule * test with a second context * redo * i always forget tensor variable	2024-10-16 07:31:44 +03:00
qazal	390171d686	delete SAVE_SCHEDULE=1 [pr] (#7087 )	2024-10-16 07:13:20 +03:00
George Hotz	3169cb386d	remove graph [pr] (#7085 )	2024-10-16 11:40:07 +08:00
nimlgen	b025495e5c	fuzz nv vs cuda (#7066 ) * fuzz nv vs cuda * fixes * smth * um * cmp the same * dnrt * correct gpfifo scan * fix	2024-10-15 22:22:40 +03:00

1 2 3 4 5 ...

586 Commits