tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-29 16:58:18 -05:00

Author	SHA1	Message	Date
George Hotz	272bea5100	GraphRunner (#4375 ) * GraphRunner * new metal graph * update hsa for graph runner * put var_vals back * move that clear after the capture	2024-05-01 10:27:13 -07:00
chenyu	077ea6926c	remove downcast_half in sum (#4376 ) breaks boolean mean and other stuff	2024-05-01 11:46:44 -04:00
George Hotz	bd49d2854a	hotfix: skip fetch tests always	2024-05-01 08:43:26 -07:00
George Hotz	b683d0f496	hotfix: 100% accuracy is wrong	2024-05-01 08:07:18 -07:00
George Hotz	8bcf533a84	gitignore open-images-v6TEST	2024-05-01 13:55:38 +00:00
qazal	ea06f657df	fusion tests from test_opt (#4357 ) * opt tests * more sgd * batchnorm * models stay in external	2024-05-01 16:44:12 +03:00
George Hotz	995d264666	hotfix: add CNAME to put docs at docs.tinygrad.org	2024-04-30 23:17:35 -07:00
chenyu	683b7c605a	pad first batch of imagenet dataloader and update eval (#4368 ) * pad first batch of imagenet dataloader and update eval * pad zero instead of empty for training	2024-05-01 00:21:52 -04:00
wozeparrot	4a26718ca9	feat: tinyboxgreen (#4365 )	2024-04-30 19:05:37 -04:00
Francis Lam	16838eae08	mlperf/resnet: update tinybox_red parameters to new best values (#4364 ) about 27 minutes to setup and 345ms/110TF steps	2024-04-30 18:08:12 -04:00
George Hotz	27ee49bf30	tensor variable (#4362 ) * tensor variable support * consttype without variable? * __setitem__ * symbolic mean works * arange test * more tests * a few more tests	2024-04-30 14:08:57 -07:00
nimlgen	d2f89615b2	remove aql remnants in amd (#4346 )	2024-04-30 23:36:02 +03:00
Francis Lam	0d33c54d99	kernel: change PADTO check to allow up to 4x padding (#4354 ) * kernel: change PADTO check to allow up to 4x padding also optionally remove PADTO from the search action space with BEAM_PADTO=0. * fix test_linearizer test_tensor_cores_padded tests * update resnet runs to use SPLIT_REDUCEOP=1 * fix up search TC axis and amt checking * fix up the dimensions of the TC tests	2024-04-30 15:29:34 -04:00
Elias Wahl	babe87a8ae	BERT: Checkpoint loading tests (#4359 ) * Move checkpoint init to helpers. Add test * linters * Move the steps outside of the main train loop * Move data_get * data_get belongs to helpers	2024-04-30 14:43:41 -04:00
Francis Lam	c12bcabb07	search: fix actions space checks to ignore TC axis and amt (#4360 ) * search: fix actions space checks to ignore TC axis and amt * add test for number of actions in get_linearizer_actions	2024-04-30 14:02:22 -04:00
chenyu	fdc8fabae5	disable flaky mac gpt2 beam benchmark and add back cifar mac with JIT=2 (#4358 ) * debug flaky mac gpt2 beam run * disable for now	2024-04-30 10:41:37 -04:00
George Hotz	d325be2540	update docs (#4356 ) * update docs * nn.md * mnist cleanups * rhip test is very slow	2024-04-30 16:51:42 +09:00
Sohaib	a2d81514fd	just get dtype from kwargs (#4355 )	2024-04-30 16:26:14 +09:00
Francis Lam	a9a1fa6bbf	wmma: add reduce axis choice to TC action space (#4328 ) * wmma: add reduce axis choice to TC action space * add test for TC multi-reduce axis choice	2024-04-29 19:15:39 -04:00
chenyu	93abcd3113	fix function.py sum backward without downcast_half (#4353 ) without downcast_half, sum output dtype can be different from input dtype. cast back to input dtype in function.py	2024-04-29 17:53:02 -04:00
Francis Lam	18c61ce077	test/fuzz_linearizer: add --atol/rtol and change half distribution (#4352 )	2024-04-29 15:53:59 -04:00
Elias Wahl	71ff68b445	dropout after eval step (#4351 )	2024-04-29 15:47:21 -04:00
Elias Wahl	27613dd881	MLPerf BERT: Main training loop (#4288 ) * BERT language modeling head + trunc normal initializers * add train loop + helpers * shuffle in dataloaders + slight changes in main loop * beam change * Minor changes * random.shuffle * HParam update * Use deque for dataloader * wandb bert project name * half fixes * BENCHMARK + remove epoch * cast + print() --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-29 14:35:27 -04:00
Sohaib	61c97d5305	refactor ops_gpu ctypes (#4331 ) * refactor ops_gpu ctypes - remove redundant byref as ctypes automatically handles passing `type` as `POINTER(type)` - use walrus operator instead of init_c_var when possible * clSetKernelArg argtype is POINTER(None)	2024-04-30 01:33:34 +08:00
qazal	cc1797673e	all fusion opportunities (#4348 )	2024-04-29 19:32:23 +03:00
chenyu	f363f39e83	fix dtype of const folded sum (#4349 ) const folding sum should return in the same dtype the same as regular sum, which can be different from input dtype	2024-04-29 11:40:45 -04:00
geohotstan	bf412aeb80	use tolist instead of numpy for extracting parameters in onnx (#4333 ) * still some numpy left * all pass * oops indent * fix up safe_python * to_python_const	2024-04-29 10:48:20 -04:00
qazal	774a9b0bca	override assign_target in fuzz_schedule (#4342 ) * store assign_targets * cleanup * override target	2024-04-29 11:04:04 +03:00
Francis Lata	bb849a57d1	[MLPerf] UNet3D dataloader (#4343 ) * add support for train/val datasets for kits19 * split dataset into train and val sets * add tests for kits19 dataloader * add MLPerf dataset tests to CI * update unet3d model_eval script * fix linting * add nibabel * fix how mock dataset gets created * update ref implementation with permalink and no edits * clean up test and update rand_flip implementation * cleanups	2024-04-28 22:34:18 -04:00
chenyu	82d0ed3cf3	cap default dataset wikipedia max_workers to 32 (#4345 ) 64 on tinybox OOM	2024-04-28 21:55:21 -04:00
chenyu	c1d8d425eb	fix mean of half tensor if sum is greater than hlaf.max (#4327 ) sum of half does acc in float32 already, add an arg to not downcast to half and use that in mean	2024-04-28 18:04:54 -04:00
qazal	e027879475	hotfix: remove double assignment (#4340 )	2024-04-28 13:41:31 -04:00
qazal	23445db2b9	no skipped tests in RHIP (#4337 ) * delete skip * delete split skip * remu dev * compiler fails here * Revert "remu dev" This reverts commit `28b933d4eb`.	2024-04-28 12:23:05 -04:00
Obada Khalili	e4befa41d7	Fix in `_reshape_mask` (#4332 ) * handle reshape with remainder in _reshape_mask * remove trailing whitespce * use helper_test_op to generate tensors from shapes * test in shapetracket too * remove whitespace * revert property name in other class tests	2024-04-28 11:57:39 -04:00
Timmy	664b563c91	Add `insert_before` to Linearizer Functions (#4320 ) * adding insert_before to linearizer functions * uop insert_before test case * formatting * more formatting * more formatting * syntax * removing self.cast * addressing err * removing noqa s	2024-04-28 11:38:36 -04:00
qazal	3372bea322	reduce children fusion tests (#4321 ) * base tests * real-world tests	2024-04-28 11:14:02 -04:00
Arnav Mehta	f3de17912f	added the download if not present missing function (#4318 )	2024-04-28 16:31:08 +08:00
geohotstan	bc36940c28	fix (#4319 )	2024-04-28 16:29:04 +08:00
nimlgen	8d1649d8c2	raise error when too many resources requested in nv (#4324 )	2024-04-27 23:48:51 +03:00
qazal	c6c12ba94a	save schedule graph pre validation (#4317 )	2024-04-27 12:06:15 +03:00
Victor Ziliang Peng	40264c7d1e	Update index.md (#4315 )	2024-04-27 15:12:44 +08:00
chenyu	24a6342950	add mem/s to external_benchmark_resnet (#4309 )	2024-04-26 20:07:17 -04:00
Francis Lam	1f2642c73b	kernel: fix calculation of smem size to ignore UNROLL (#4308 ) * kernel: fix calculation of smem size to ignore UNROLL * simplify prod array	2024-04-26 14:34:56 -04:00
Szymon Ożóg	de832d26c6	disable bfloat16 from ptx tests (#4305 )	2024-04-26 01:20:10 -04:00
chenyu	ec65aea32f	resnet stop the script once hit target (#4303 ) * resnet stop the script once hit target * comment	2024-04-25 23:54:56 -04:00
chenyu	1891ebb655	make ring allreduce chunks a multiple of 2^n if possible (#4302 ) in resnet, instead of chunking as [43691, 43691, 43691, 43691, 43690, 43690], chunk as [43712, 43712, 43680, 43680, 43680, 43680] and those can have 32 local. more than 2X faster for the applicable kernels and overall 1% for resnet	2024-04-25 23:45:28 -04:00
George Hotz	1e37c4a7a1	minor llm.c improvements	2024-04-26 11:15:31 +08:00
chenyu	3ec4b745d6	JIT=2 for mac cifar benchmark (#4300 ) also double BS for resnet training benchmark to match submission target	2024-04-25 18:33:40 -04:00
David Hou	c2dbe2a78b	new split reduce heuristic try 2 (#4294 ) * new split reduce heuristic * update comment * rename --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-25 18:14:15 -04:00
Szymon Ożóg	f1ebcffb87	Ptx beam fix (#4296 ) * Fix beam search for PTX * fix ptr arm test	2024-04-25 15:39:39 -04:00

... 122 123 124 125 126 ...

10417 Commits