tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	0d39bb5de1	rename to get_kernelize_map (#10465 )	2025-05-22 11:44:44 -07:00
George Hotz	577a0b4cfa	openpilot compile4 (wip) (#10407 ) * openpilot compile4 * add copies * remove junk	2025-05-22 10:47:34 -07:00
chenyu	67d1364106	update LOGMLPERF in red resnet run_and_time (#10416 )	2025-05-19 13:23:33 -04:00
qazal	90eb3c0e5d	add MobileNetV2 benchmark to comma CI (#10250 ) * add MobileNetV2 to comma CI * symlink imagenet * also the signature * comment that out * need imagenetmock * same train and test set * quantize on CPU=1 * verbose * need __hexagon_divsf3 * 0x858d6c15 * quant cpu + CC=clang-19	2025-05-19 18:22:50 +03:00
chenyu	485e80da69	run_and_time for resnet ci (#10405 )	2025-05-18 23:39:57 -04:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
George Hotz	0b733ba75e	multi device training with GPT2 [pr] (#10375 ) * multi device training with GPT2 [pr] * Update grouper.py	2025-05-17 15:33:56 -07:00
wozeparrot	12a1ccc680	clean: double import (#10345 )	2025-05-15 20:15:09 -07:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
George Hotz	568d6d96e7	small changes from new multi [pr] (#10318 )	2025-05-14 20:50:59 -07:00
George Hotz	bfc30fa6ea	hotfix: typo in shm_name	2025-05-14 19:34:52 -07:00
George Hotz	2bc54b3e22	manually handle OSX	2025-05-14 19:17:51 -07:00
George Hotz	ab460486d7	Revert "resnet dataloader osx (#10316 )" This reverts commit `aef336930a`.	2025-05-14 19:15:07 -07:00
George Hotz	aef336930a	resnet dataloader osx (#10316 ) * mlperf dataloader on mac * resnet dataloader [pr] * simple should work	2025-05-14 18:31:26 -07:00
chenyu	fbaa26247a	randn_like in minrf (#10298 ) tested that it trains to similar loss	2025-05-14 07:59:50 -04:00
George Hotz	98c84a711d	min rectified flow example [pr] (#10252 ) * work on minrf example * more * jit sample * t is tensor not const * fixes * more convs * fix dropout * don't print * 504 * big patch * onehot * touch * use embeddings * dumb uses final layer * act * non fl * match * tp * 3 * of * ppsz * normal * add adln * no t * weird transformer * weird transformer * contig * actual speed fix * dumb * cb * 0 * t is 0 * mort-t * args * dumb days are over * readable * contig * no more t mask * mask_t * init to zero * clean * steps * work * tt * t * solid	2025-05-11 18:36:44 -07:00
Adam Van Ymeren	a28ca0680f	update dead link (#10242 )	2025-05-09 19:59:52 -04:00
Rory Clear	9f2931ae67	Fix yolo load failing silently (#10046 ) * wait for js before loading model * use f32 * revert html changes, try both cameras and remove f16 req * clean	2025-05-07 11:46:09 -07:00
Kevin Buhler	363481e2fb	correct mispelled words (#10165 )	2025-05-05 08:12:41 -07:00
chenyu	4a04098389	fix llama3 with nf4 quantize (#10107 ) also int8 outputs is wrong	2025-04-29 15:14:36 -04:00
qazal	a59d18da21	hack for VIZ=1 with examples/llama (#10103 ) * hack for VIZ=1 with examples/llama * move it alongside BEAM=0	2025-04-29 23:42:17 +08:00
chenyu	3eba3d6ee9	don't pass model in convert_from_huggingface and convert_from_gguf (#10094 ) it only needs n_layers	2025-04-28 20:11:19 -04:00
chenyu	610ee79b22	cherry pick mlperf5.0 branch to master (#10089 )	2025-04-28 15:36:56 -04:00
George Hotz	b341296304	hotfix: save sdxl ram	2025-04-27 12:09:45 -04:00
George Hotz	68c5f7ba80	load fast in sdxl (#10072 ) * load fast in sdxl * back to that with the ret * no context	2025-04-27 11:58:51 -04:00
George Hotz	4b8ef6ce78	hotfix: sdxl corealize	2025-04-27 10:41:46 -04:00
George Hotz	1253819151	make beautiful indexing use a Variable (#10063 ) * make beautiful indexing use a Variable * stunning test * better color * training is broken * fix tests * fix variable indexing * fix test * no contiguous * revert that * revert that too * indexing two bind * skip for webgpu * make not slow	2025-04-27 08:22:38 -04:00
Rory Clear	a13a43c4fe	yolo 416 to 640 res (#10047 )	2025-04-26 20:45:58 -04:00
George Hotz	ea5dddc537	reduce collapse generic (#10045 ) * reduce collapse generic * new arange folder * new range folding * correct with sym * all tests pass * indexing ops passes * failing tests * fix tests, remove unused * revert that * torch indexing is fast * skip on webgpu * touchups * comments	2025-04-26 09:13:24 -04:00
Rory Clear	3a189fa561	More yolo processing in tinygrad (#9928 ) * more tg less np * update webgpu html for new compile * resize boxes * remove text * add back note * fix indentation * fix indentation * remove magic num * remove now unused funcs * back to numpy nms * no loop * fix iou suppression * update test * dont suppress other classes * add working scale * fix expected value, rounded up 0.24 was being counted * add postprocess bool for onnx test * fix indents * clean * clean * fix indent * remove print * fix indent * remove unused import * remove hardcoded 0.25 * space * spacing * clean label_predictions func * remove single item lists * space * use postprocess output in test * space * clean * clean * remove redundant threshold * remove redundant threshold * clean * rename var * move loop into func * unhardcode iou_threshold * remove unused values * clean * add note * clean * keep const * move back funcs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 16:21:46 -04:00
chenyu	74c6cf8be3	lint mlperf model_train (#10038 )	2025-04-24 16:19:44 -04:00
chenyu	a25abf55e3	retinanet only call postprocess_detections with RUNMLPERF (#10017 ) during setup only need to compile `_eval_step().numpy()`	2025-04-23 20:45:38 -04:00
chenyu	65faa1d94b	explicit device in mlperf scripts (#10015 )	2025-04-23 17:11:52 -04:00
chenyu	a3f938dbee	remove retinanet INITMLPERF from beam script (#10011 ) it only controls logging, loading real data or not is solely controlled by RUNMLPERF	2025-04-23 14:32:54 -04:00
Francis Lata	5542aeb0e4	RetinaNet MLPerf flag updates (#10009 ) * add RUNMLPERF and update INITMLPERF usage * update scripts to use RUNMLPERF	2025-04-23 13:00:34 -04:00
George Hotz	de0504276b	pop 0 is slow [pr] (#10007 )	2025-04-23 17:00:59 +01:00
chenyu	d3a8d5c128	print postprocess_detections time in retinanet eval (#10005 ) `BS=96 BASEDIR="/raid/datasets/openimages" MODEL=retinanet python examples/mlperf/model_eval.py` ``` ... loaded dataset @ 8.64s loaded initial data @ 12.57s **** 619.97 ms to enqueue, 46042.13 ms to realize ( 116.22 ms fetching, 45399.58 ms postprocess_detections). 0.09 examples/sec. 0.83 TFLOPS @ 59.23s ** 147.49 ms to enqueue, 37362.16 ms to realize ( 146.96 ms fetching, 36618.84 ms postprocess_detections). 0.11 examples/sec. 1.03 TFLOPS @ 96.74s ** 152.85 ms to enqueue, 37244.08 ms to realize ( 120.67 ms fetching, 36235.19 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 134.14s ** 146.39 ms to enqueue, 37279.85 ms to realize ( 65.07 ms fetching, 36233.56 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 171.56s ** 152.41 ms to enqueue, 37264.04 ms to realize ( 127.08 ms fetching, 36196.10 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 208.98s ** 151.29 ms to enqueue, 36868.08 ms to realize ( 142.73 ms fetching, 36153.07 ms postprocess_detections). 0.11 examples/sec. 1.05 TFLOPS @ 246.00s **** 136.41 ms to enqueue, 37325.04 ms to realize ( 90.29 ms fetching, 36573.38 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 283.46s ```	2025-04-23 11:39:56 -04:00
chenyu	c39128133c	retinanet green scripts (#9996 ) also removed realize in data_get and used empty for fake data. slightly bigger lr. https://wandb.ai/chenyuxyz/MLPerf-RetinaNet/runs/8skid0e8?nw=nwuserchenyuxyz	2025-04-23 08:28:03 -04:00
chenyu	fb89d9a584	retinanet eval combine output on GPUS[0] (#9966 ) eval 35 sec -> 20 sec. it was spending 13 seconds assembling output tensor on CPU backend. GPUS[0] seems to have enough memory, otherwise we can lower EVAL_BS	2025-04-22 07:43:51 -04:00
chenyu	5294c32279	dev scripts for retinanet (#9968 ) also BASE_DIR -> BASEDIR for consistency, and move wandb up a bit for more accurate timing	2025-04-21 17:54:56 -04:00
Francis Lata	defa1e77f6	get the proper dataset count (#9962 )	2025-04-21 12:11:37 -04:00
Francis Lata	d7e247f329	RetinaNet INITMLPERF support (#9950 ) * fixes to make fake data work * fix eval beam * fix merge issue	2025-04-21 10:32:05 -04:00
Francis Lata	ea4cb2c715	small cleanups (#9947 )	2025-04-20 20:33:20 -04:00
chenyu	6c30948df6	hand_coded_optimizations returns list[Opt] [pr] (#9938 ) new api looks like `k.apply_opts(hand_coded_optimizations(k))`	2025-04-19 20:26:59 -04:00
chenyu	3fdba48fc7	update bert green and README (#9934 ) submission candidate	2025-04-18 21:21:28 -04:00
chenyu	617b45748f	fuse embedding for bert on red (#9925 ) also updated BEAM param and use AMD driver for actual run. 535ms step	2025-04-18 07:20:25 -04:00
chenyu	e2ed673c94	FUSE_ARANGE_UINT to not fuse uint (#9915 ) hack to bypass rand, can FUSE_ARANGE on green for 6ms per step	2025-04-16 18:49:38 -04:00
chenyu	e8024c8281	faster bert global_norm (#9901 ) tinyamd 2% faster. also updated beam params that's 2-3% faster. update mlperf doc and steps too	2025-04-15 18:24:44 -04:00
Sieds Lykles	91ccf1c343	Off by one error in start_pos (#9792 ) Variable upper bound is inclusive	2025-04-15 15:07:13 -04:00
Francis Lata	31483050c0	add eval_freq flag (#9894 )	2025-04-15 06:42:40 -04:00

1 2 3 4 5 ...

1088 Commits