tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
chenyu	0e266f376c	ops_gpu -> ops_cl (#12103 )	2025-09-10 15:15:48 -04:00
nimlgen	fb96394ff5	auto-select available compilers (#12094 ) * device: auto select compilers * fix * metal+opencl * nv/cuda * test without ptx * ptx * fix tests * fix * fix test * rename * test + cleaner * xx * ops * better test * win? * um? * types * debug * win?? * sep rung * wtf? * debug * skip win * revert this * types	2025-09-10 19:52:01 +03:00
nimlgen	9182948951	remove llvm_bf16_cast (#12075 )	2025-09-08 20:51:15 +03:00
nimlgen	10ac427aaa	cpu threading (#11951 ) * start cpu threading * fix * fix2 * fix * hacks? * threads * minor * no dsp * dsp 2 * n * more * test * xm * cleaner * readable * f * reorder * when no threads * rangeify * typos * not needed * reapply * remoev this * linter * fixed cpu count in ci * fix * fixes * rm * typo * sort based on speed * test if test works in ci * Revert "test if test works in ci" This reverts commit `1f05edb531`. * do not pad thread	2025-09-06 16:13:43 +03:00
Sieds Lykles	c6c16b2946	`var_vals` uses str for var (#12011 ) * var_vals is str,int * remove imports * remove print * fix test * change var_vals in hcq * update test_hcq * fix multitensor _device_num var * fix syminfer test * shorten line * p.vars stays list[Variable] * shorten line * vars is back to tuple[Variable, ...] * change var_vals in extra * change var_vals from shapetracker * var_vals is str:int * fix signature	2025-09-06 04:16:12 +02:00
George Hotz	38dcadf07b	delete kernel.py (#12040 ) * delete kernel.py * delete that file * rip and tear * don't test search * imports * fix torch frontend * not a part of regen	2025-09-05 15:52:07 -07:00
George Hotz	870f63d9cc	add WARP axistype, fix postopt bugs (#12033 ) * postopt is 83% match * warp is bright CYAN * beautiful mnist beam works * fix shutdown bug	2025-09-05 10:36:55 -07:00
George Hotz	f8e2dd4dd1	investigate opts mismatches (#12020 )	2025-09-05 07:40:29 -07:00
George Hotz	431666da74	POSTOPT=2 work (#12012 ) * POSTOPT=2 work * bugfixes * add chain in one place * tensor cores match * better hcopt check * match from old * Change POSTOPT ContextVar value to 0 * we didn't need to check that	2025-09-04 16:55:56 -07:00
George Hotz	70ce29b630	test pyrender (#12005 ) * test pyrender * make them print * switch to pyrendered	2025-09-04 11:48:40 -07:00
Sieds Lykles	572a3c15c6	Move Ops.SPECIAL arg to src (#11918 ) * initial moving bound to src * arg to src * remove import * fixup linearizer * arg to src * fix test_uop_graph * fix more tests * fix python renderer * get const value from const uop * ssimplify uop estimates * fix webgpu locals * fix old test * gate Ops.SPECIAL in linearizer * use ssimplify() for local/global_size * remove toposort gate_parents_instead_of_self * fix rendering in comment * cleanup * rename and add comments * add BottomUpGate with test	2025-09-04 09:31:44 +02:00
George Hotz	5cf42dc4db	add Scheduler to replace Kernel with POSTOPT=2 (#11924 ) * ** simple kernel to replace Kernel for postopt * support old * fix beam * beaming * beam on old * bring tensor cores back * raise * postbeam * test ops passes on mac * skip that * postopt default * gate that * fix tensor cores * a few test fixes * dsp fix * tc fix * loop * support swap * test_gemv * fix beam for variable * test opts from high level stuff * range annoying * compile slow * metal slow * better beam * no POSTBEAM * fix nolocals * hc opt mostly works * put that back * lil * some work * fix that * POSTOPT 2 * fix tests * no postopt 2 * work * back * padded tensors cores * shift_to * postopt 0 passes? * write PADTO * fix padded tensor cores * compare hcopt * 18000 lines * should pass tests * fix rangeify * put types back	2025-09-03 19:23:30 -07:00
qazal	c7bb561ef9	remu: add v_rsq_f32_e32 instruction (#11947 ) https://github.com/tinygrad/tinygrad/pull/11936 introduces a change to the AMD LLVM renderer that outputs this instruction. Adding both 32 and 64 bit variants.	2025-09-01 11:29:31 +03:00
George Hotz	afad7d0cd1	remove dtype from range, it will be dtypes.index soon [pr] (#11914 ) * remove dtype from range, it will be dtypes.index soon [pr] * a few more	2025-08-29 09:52:07 -07:00
George Hotz	394c2d1db1	update Kernel API in tests + move optimize_local_size (#11907 )	2025-08-28 15:12:47 -07:00
Ben Waldron	ea1be2e4cd	[bounty] Remove using reshape to register symbolic shape (#11771 ) * Modify tests and start work towards removing symbolic reshape * Refactor symbolic reshape * fix small error * much cleaner + fix more tests * Can remove this now * Update test_symbolic_ops and test_tiny * Couple more tests * Unused import * More tests and add EXPAND to Tensor.empty * Fix test beam search * all int * Fix rangeify by adding shrink * Remove OOB check and so fix test_symbolic_jit * test_symbolic_jit doesn't need OOB Context anymore either * Should remove that test now * Cleanups part 1 * fix linters * Final cleanups * Don't reassign inside for loop --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-08-28 12:30:49 -04:00
nimlgen	874c1db4af	am: init support for aql (#11888 )	2025-08-28 18:41:46 +03:00
George Hotz	27701ef823	add locals support to rangeify (#11826 )	2025-08-24 14:03:12 -07:00
chenyu	fb8ee02424	Tensor.logaddexp (#11793 )	2025-08-23 09:15:00 -04:00
chenyu	d0d39885c3	onnx in tinygrad (#11675 )	2025-08-14 19:57:21 -04:00
chenyu	48c4033ae1	fix pylint for onnx (#11673 ) * fix pylint for onnx * too long	2025-08-14 18:48:02 -04:00
nimlgen	4176b24264	amd: support xcc in regs (#11670 ) * amd: support xcc in regs * mockamd * typong	2025-08-14 21:20:11 +03:00
nimlgen	d747eeed32	amd logs parser based on device (#11669 )	2025-08-14 19:49:33 +03:00
geohotstan	1e904155e3	Add Onnx Huggingface to test/models/test_onnx.py (#11468 ) * BOOM * cache extra/huggingface/models/ * why max buffer size is not 0 * override MAX_BUFFER_SIZE * less models * remove more models and change cache dir to already cached dir * only metal * less is more? * remove check ops * why is this not setting the ENVVAR * ughhhhh just test in models * only cpu and gpu * only cpu actually * just override it idk * final * move extra dependencies up top * simplification * fix print * make README better * revert ops_disk fix for now * clean up test_onnx * remove testing fashion clip model cuz sloooowwwwww * actually let METAL run this * fix comment mistake * fix download path in run_models * does this work? * cleanup setup and teardown * contextvar like this? * prove model is cached * do I need to increment DOWNLOAD_CACHE_VERSION? * see if cached with incremented DOWNLOAD_CACHE_VERSION * use warnings to see if the model exists * revert DOWNLOAD_CACHE_VERSION stuff and clean up * add retry to download * nit	2025-08-14 11:16:41 -04:00
kevvz	e2873a3a41	[bounty] Muon optim (#11414 ) * newton schulz * add muon + move newton schulz to tensor * compact newton schulz * better tests * cleanup * add comments for muon * cleanup * add export with tests * match muon optim with test optim * cleanup * unsed import * correct comment * whitespace * move export * muon test fix * match reference impl + tests * remove export by moving muon device * add credit * cleanup * remove print * spacing * spacing * comma * cleanup * removal * fix tests + optim momentum * consistent is not/ not * more consistency * fix test * cleanup * fix the nones * remove comment * cast * comment * comment * muon teeny test * muon flag beautiful mnist * set steps * steps as hyperparam * match default test steps * name * large cleanup * dont care about steps * nesterov false default * match each other impl * steps * switch nest * swap defaults * update docstring * add no nesterov test * ban fuse_optim * prints * classical momentum * alternative condition * recon * pre + post wd * false default * detach * signature changes * context * swap order * big cleanup * 0 step instead * parity * remove fuse * remove fused * better paper * assert message * correct shape check + eps * multidim * add eps * cleanup * correct assert message * lint * better tests * naming * ns_steps,ns_params * update docstring * docstring * match sgd and muon together * sandwich * add back fused * parity --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-08-13 14:27:55 -04:00
geohotstan	cf7224ce3e	fully lint onnx.py (#11647 ) * mypy * ruff ruff ruff	2025-08-13 08:22:06 -07:00
geohotstan	925555b62a	Fix onnx Domain bug (#11650 )	2025-08-13 08:20:50 -07:00
chenyu	3fb79bb43a	minor onnx cleanups (#11642 )	2025-08-13 01:05:19 -04:00
chenyu	e9e5a08a04	simplify onnx cubic (#11641 ) we can drop the double where and abs since we know which ranges the inputs map into	2025-08-12 19:57:31 -04:00
geohotstan	ad9dec25b3	combine onnx parser and onnx (#11485 ) * start * more * fix onnx_runner test * pass * patch for disk and add domains from huggingface * simpler docs * revert domain changes * rerun ci * revert onnx ops test change * add fix from strenum stuff * correct way * revert correct way to leave the fix for another PR * test segfault * Revert "test segfault" This reverts commit `4e1aaf41e7`. * remove some unnecessary documentation * test segfault again * Revert "test segfault again" This reverts commit `56fc5f03e7`. * try gemini suggested patch for sys._getframe * keep trying with gemini * revert not working gemini suggestions and try faulthandler * remove pythonfaulthandler * trigger CI a few times * minimize diff --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-08-12 12:56:39 -04:00
Joshua Kissoon	c44760c89d	torch backend: fix arange, add linalg.cross, add tests (#11628 )	2025-08-11 23:34:41 -04:00
geohotstan	27bcb9fd1c	Support cubic mode for ONNX Resize OP (#11612 ) * start * add reference * this is so much slower * this makes sense but differs from official impl, but results are still correct..? * add a comment * Just keep it simple for now since I don't fully get it yet * address comments * correct * teeny clean up * another small comment improvement lol	2025-08-11 11:49:30 -04:00
geohotstan	b0dab6a4cd	onnx Resize OP clean up (#11603 ) * start * slight clean up	2025-08-10 14:10:39 -04:00
chenyu	ef17af85c6	remove .float call in llama logit (#11598 ) * remove .float call in llama logit * bfloat item	2025-08-10 00:02:18 -04:00
chenyu	3e64467322	remove freqs_cis contiguous in llama (#11597 )	2025-08-09 21:11:12 -04:00
qazal	793ace530e	update amd_uop_matmul.py import (#11581 ) Using this for testing SQTT	2025-08-08 17:07:35 +03:00
George Hotz	82be8abfd2	move opt under codegen (#11569 )	2025-08-07 14:19:17 -07:00
chenyu	702e38dc19	remove FUSE_ARANGE_UINT (#11567 ) also add IGNORE_OOB=1 to bert runs. lowered BS on tinybox to 90 since 96 oom during eval without reset	2025-08-07 16:49:06 -04:00
geohotstan	1163292759	move onnx_parser into onnx (#11530 )	2025-08-06 10:46:27 -04:00
nimlgen	eafc7fda12	upd perfetto (#11528 )	2025-08-06 14:00:34 +03:00
nimlgen	4877aa965a	ast seems to probe nv as well (#11494 )	2025-08-04 11:47:07 +03:00
George Hotz	8ff03806e8	add llama layers (#11460 ) * add llama layers * add contig bw for speed	2025-07-31 16:28:04 -07:00
George Hotz	474ee9daa5	hotfix: add contiguous_backward to llama	2025-07-31 15:07:12 -07:00
kevvz	c3cfcb50cb	Add linalg_det and test for torch backend (#11405 ) * add linalg_det and test * space --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-30 22:04:44 -04:00
wozeparrot	825b6a2505	feat: llama3 dataloader (#11340 )	2025-07-30 13:27:55 -07:00
nimlgen	5fc5bb5237	ci: clear processes (#11434 ) * unified hcq_smi for managment * fix * fix * no reset for amd	2025-07-30 22:15:18 +03:00
George Hotz	4f26a9ad32	check elements_per_thread in tensorcore [pr] (#11435 )	2025-07-30 11:55:48 -07:00
George Hotz	1bef2d80c1	unrolls are all in the same scope (#11429 ) * unrolls are all in the same scope * fix that import	2025-07-29 16:55:37 -07:00
George Hotz	03909f2772	permute locals for HL uop matmul (#11412 ) * permute locals for HL uop matmul * parens fix that * permutes * 20 TFLOPS	2025-07-29 08:19:59 -07:00
George Hotz	735ad5f10d	kernel4 and 5 in uops (#11411 ) * move simplify views to merge views * add amd kernel 4 * Revert "move simplify views to merge views" This reverts commit `1e07dff384`. * k4 in python * kernel4 written in uops * k5 support * cleanups	2025-07-28 19:35:48 -07:00

1 2 3 4 5 ...

1235 Commits