tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 01:18:26 -05:00

Author	SHA1	Message	Date
Sieds Lykles	572a3c15c6	Move Ops.SPECIAL arg to src (#11918 ) * initial moving bound to src * arg to src * remove import * fixup linearizer * arg to src * fix test_uop_graph * fix more tests * fix python renderer * get const value from const uop * ssimplify uop estimates * fix webgpu locals * fix old test * gate Ops.SPECIAL in linearizer * use ssimplify() for local/global_size * remove toposort gate_parents_instead_of_self * fix rendering in comment * cleanup * rename and add comments * add BottomUpGate with test	2025-09-04 09:31:44 +02:00
George Hotz	5cf42dc4db	add Scheduler to replace Kernel with POSTOPT=2 (#11924 ) * ** simple kernel to replace Kernel for postopt * support old * fix beam * beaming * beam on old * bring tensor cores back * raise * postbeam * test ops passes on mac * skip that * postopt default * gate that * fix tensor cores * a few test fixes * dsp fix * tc fix * loop * support swap * test_gemv * fix beam for variable * test opts from high level stuff * range annoying * compile slow * metal slow * better beam * no POSTBEAM * fix nolocals * hc opt mostly works * put that back * lil * some work * fix that * POSTOPT 2 * fix tests * no postopt 2 * work * back * padded tensors cores * shift_to * postopt 0 passes? * write PADTO * fix padded tensor cores * compare hcopt * 18000 lines * should pass tests * fix rangeify * put types back	2025-09-03 19:23:30 -07:00
qazal	c7bb561ef9	remu: add v_rsq_f32_e32 instruction (#11947 ) https://github.com/tinygrad/tinygrad/pull/11936 introduces a change to the AMD LLVM renderer that outputs this instruction. Adding both 32 and 64 bit variants.	2025-09-01 11:29:31 +03:00
George Hotz	afad7d0cd1	remove dtype from range, it will be dtypes.index soon [pr] (#11914 ) * remove dtype from range, it will be dtypes.index soon [pr] * a few more	2025-08-29 09:52:07 -07:00
George Hotz	394c2d1db1	update Kernel API in tests + move optimize_local_size (#11907 )	2025-08-28 15:12:47 -07:00
Ben Waldron	ea1be2e4cd	[bounty] Remove using reshape to register symbolic shape (#11771 ) * Modify tests and start work towards removing symbolic reshape * Refactor symbolic reshape * fix small error * much cleaner + fix more tests * Can remove this now * Update test_symbolic_ops and test_tiny * Couple more tests * Unused import * More tests and add EXPAND to Tensor.empty * Fix test beam search * all int * Fix rangeify by adding shrink * Remove OOB check and so fix test_symbolic_jit * test_symbolic_jit doesn't need OOB Context anymore either * Should remove that test now * Cleanups part 1 * fix linters * Final cleanups * Don't reassign inside for loop --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-08-28 12:30:49 -04:00
nimlgen	874c1db4af	am: init support for aql (#11888 )	2025-08-28 18:41:46 +03:00
George Hotz	27701ef823	add locals support to rangeify (#11826 )	2025-08-24 14:03:12 -07:00
chenyu	fb8ee02424	Tensor.logaddexp (#11793 )	2025-08-23 09:15:00 -04:00
chenyu	d0d39885c3	onnx in tinygrad (#11675 )	2025-08-14 19:57:21 -04:00
chenyu	48c4033ae1	fix pylint for onnx (#11673 ) * fix pylint for onnx * too long	2025-08-14 18:48:02 -04:00
nimlgen	4176b24264	amd: support xcc in regs (#11670 ) * amd: support xcc in regs * mockamd * typong	2025-08-14 21:20:11 +03:00
nimlgen	d747eeed32	amd logs parser based on device (#11669 )	2025-08-14 19:49:33 +03:00
geohotstan	1e904155e3	Add Onnx Huggingface to test/models/test_onnx.py (#11468 ) * BOOM * cache extra/huggingface/models/ * why max buffer size is not 0 * override MAX_BUFFER_SIZE * less models * remove more models and change cache dir to already cached dir * only metal * less is more? * remove check ops * why is this not setting the ENVVAR * ughhhhh just test in models * only cpu and gpu * only cpu actually * just override it idk * final * move extra dependencies up top * simplification * fix print * make README better * revert ops_disk fix for now * clean up test_onnx * remove testing fashion clip model cuz sloooowwwwww * actually let METAL run this * fix comment mistake * fix download path in run_models * does this work? * cleanup setup and teardown * contextvar like this? * prove model is cached * do I need to increment DOWNLOAD_CACHE_VERSION? * see if cached with incremented DOWNLOAD_CACHE_VERSION * use warnings to see if the model exists * revert DOWNLOAD_CACHE_VERSION stuff and clean up * add retry to download * nit	2025-08-14 11:16:41 -04:00
kevvz	e2873a3a41	[bounty] Muon optim (#11414 ) * newton schulz * add muon + move newton schulz to tensor * compact newton schulz * better tests * cleanup * add comments for muon * cleanup * add export with tests * match muon optim with test optim * cleanup * unsed import * correct comment * whitespace * move export * muon test fix * match reference impl + tests * remove export by moving muon device * add credit * cleanup * remove print * spacing * spacing * comma * cleanup * removal * fix tests + optim momentum * consistent is not/ not * more consistency * fix test * cleanup * fix the nones * remove comment * cast * comment * comment * muon teeny test * muon flag beautiful mnist * set steps * steps as hyperparam * match default test steps * name * large cleanup * dont care about steps * nesterov false default * match each other impl * steps * switch nest * swap defaults * update docstring * add no nesterov test * ban fuse_optim * prints * classical momentum * alternative condition * recon * pre + post wd * false default * detach * signature changes * context * swap order * big cleanup * 0 step instead * parity * remove fuse * remove fused * better paper * assert message * correct shape check + eps * multidim * add eps * cleanup * correct assert message * lint * better tests * naming * ns_steps,ns_params * update docstring * docstring * match sgd and muon together * sandwich * add back fused * parity --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-08-13 14:27:55 -04:00
geohotstan	cf7224ce3e	fully lint onnx.py (#11647 ) * mypy * ruff ruff ruff	2025-08-13 08:22:06 -07:00
geohotstan	925555b62a	Fix onnx Domain bug (#11650 )	2025-08-13 08:20:50 -07:00
chenyu	3fb79bb43a	minor onnx cleanups (#11642 )	2025-08-13 01:05:19 -04:00
chenyu	e9e5a08a04	simplify onnx cubic (#11641 ) we can drop the double where and abs since we know which ranges the inputs map into	2025-08-12 19:57:31 -04:00
geohotstan	ad9dec25b3	combine onnx parser and onnx (#11485 ) * start * more * fix onnx_runner test * pass * patch for disk and add domains from huggingface * simpler docs * revert domain changes * rerun ci * revert onnx ops test change * add fix from strenum stuff * correct way * revert correct way to leave the fix for another PR * test segfault * Revert "test segfault" This reverts commit `4e1aaf41e7`. * remove some unnecessary documentation * test segfault again * Revert "test segfault again" This reverts commit `56fc5f03e7`. * try gemini suggested patch for sys._getframe * keep trying with gemini * revert not working gemini suggestions and try faulthandler * remove pythonfaulthandler * trigger CI a few times * minimize diff --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-08-12 12:56:39 -04:00
Joshua Kissoon	c44760c89d	torch backend: fix arange, add linalg.cross, add tests (#11628 )	2025-08-11 23:34:41 -04:00
geohotstan	27bcb9fd1c	Support cubic mode for ONNX Resize OP (#11612 ) * start * add reference * this is so much slower * this makes sense but differs from official impl, but results are still correct..? * add a comment * Just keep it simple for now since I don't fully get it yet * address comments * correct * teeny clean up * another small comment improvement lol	2025-08-11 11:49:30 -04:00
geohotstan	b0dab6a4cd	onnx Resize OP clean up (#11603 ) * start * slight clean up	2025-08-10 14:10:39 -04:00
chenyu	ef17af85c6	remove .float call in llama logit (#11598 ) * remove .float call in llama logit * bfloat item	2025-08-10 00:02:18 -04:00
chenyu	3e64467322	remove freqs_cis contiguous in llama (#11597 )	2025-08-09 21:11:12 -04:00
qazal	793ace530e	update amd_uop_matmul.py import (#11581 ) Using this for testing SQTT	2025-08-08 17:07:35 +03:00
George Hotz	82be8abfd2	move opt under codegen (#11569 )	2025-08-07 14:19:17 -07:00
chenyu	702e38dc19	remove FUSE_ARANGE_UINT (#11567 ) also add IGNORE_OOB=1 to bert runs. lowered BS on tinybox to 90 since 96 oom during eval without reset	2025-08-07 16:49:06 -04:00
geohotstan	1163292759	move onnx_parser into onnx (#11530 )	2025-08-06 10:46:27 -04:00
nimlgen	eafc7fda12	upd perfetto (#11528 )	2025-08-06 14:00:34 +03:00
nimlgen	4877aa965a	ast seems to probe nv as well (#11494 )	2025-08-04 11:47:07 +03:00
George Hotz	8ff03806e8	add llama layers (#11460 ) * add llama layers * add contig bw for speed	2025-07-31 16:28:04 -07:00
George Hotz	474ee9daa5	hotfix: add contiguous_backward to llama	2025-07-31 15:07:12 -07:00
kevvz	c3cfcb50cb	Add linalg_det and test for torch backend (#11405 ) * add linalg_det and test * space --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-30 22:04:44 -04:00
wozeparrot	825b6a2505	feat: llama3 dataloader (#11340 )	2025-07-30 13:27:55 -07:00
nimlgen	5fc5bb5237	ci: clear processes (#11434 ) * unified hcq_smi for managment * fix * fix * no reset for amd	2025-07-30 22:15:18 +03:00
George Hotz	4f26a9ad32	check elements_per_thread in tensorcore [pr] (#11435 )	2025-07-30 11:55:48 -07:00
George Hotz	1bef2d80c1	unrolls are all in the same scope (#11429 ) * unrolls are all in the same scope * fix that import	2025-07-29 16:55:37 -07:00
George Hotz	03909f2772	permute locals for HL uop matmul (#11412 ) * permute locals for HL uop matmul * parens fix that * permutes * 20 TFLOPS	2025-07-29 08:19:59 -07:00
George Hotz	735ad5f10d	kernel4 and 5 in uops (#11411 ) * move simplify views to merge views * add amd kernel 4 * Revert "move simplify views to merge views" This reverts commit `1e07dff384`. * k4 in python * kernel4 written in uops * k5 support * cleanups	2025-07-28 19:35:48 -07:00
George Hotz	fddc645668	HL=2 top matmul (#11406 ) * HL=2 top matmul * top colored	2025-07-28 12:32:38 -07:00
George Hotz	dfeee63d30	uop matmul work (#11388 ) * uop matmul work * works with locals	2025-07-26 21:23:55 -07:00
George Hotz	2c70eaf18c	fix load / barrier (#11386 ) * fix load / barrier * cleanups * fix CI	2025-07-26 10:27:37 -07:00
George Hotz	466ab5a3f2	store/load not pass through index (#11381 ) * noop * fix noop * store cat is NOOP * store dtype is void * stores aren't passed through anymore * meh, skip those for ptx * correct ptx skip * hl runs	2025-07-25 21:01:47 -07:00
chenyu	3d68feb67d	minor onnx Gather cleanup (#11375 ) removed a type ignore and one error code skip	2025-07-25 21:08:08 -04:00
George Hotz	490a93902c	define reg doesn't have init anymore (#11365 ) * define reg doesn't have init anymore * remove that * no special logic for dr * fix amd uop matmul	2025-07-24 19:15:49 -07:00
George Hotz	0602b22086	kernel spec (#11359 ) * kernel spec * ops.VIEW * work	2025-07-24 12:45:38 -07:00
George Hotz	b0dc97d1f7	write out kernel 3 in uops (#11352 ) * write out kernel 3 in uops * matmul is correct * gemm passes spec * bugfix to match speed * cleanups	2025-07-23 17:32:38 -07:00
chenyu	86e7504111	mypy check extra/onnx.py (#11348 ) instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now	2025-07-23 12:42:59 -04:00
chenyu	960da9319d	Remove StrEnum in onnx for python 3.10 (#11345 ) some training tests failed looks like parsing error?	2025-07-23 11:52:25 -04:00

1 2 3 4 5 ...

1225 Commits