tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
wozeparrot	f228c03f9f	fetch raid from cloud (#10799 ) * feat: initial tinyfs device * feat: don't allow compute on tinyfs device * feat: tensor helpers to load and store * feat: bufferview for tinyfs * fix: keep copy sizes correct * fix: recv large * clean: unneeded * feat: comment * clean: unneeded * clean: remove * clean: remove * feat: get request tag * feat: rename to cloud * feat: send request_id * feat: start computing tree * feat: compute store tree on this side * feat: jank chunked load * feat: more debugging * feat: rename to just load and store * feat: correct chunk count * fix: fix load for < 1mb * feat: comments * feat: don't truncate on block devices * feat: better way of testing block device * feat: don't need to pad that much * feat: connect to nodes directly on load * feat: cache connections * feat: don't hard code chunk size * feat: close mmap when closing file handle * feat: don't overwrite stuff on disk if storing from disk * clean: debug print * fix: close mmap * feat: await workers * feat: fast copy from tinyfs to disk * feat: don't copy to device on last * feat: use single socket per device * feat: raid in tinyfs * clean: remove import * clean: type * feat: maintain single event loop * feat: lower worker count * feat: use connection pool * feat: fetch mapping in its own process * fix: release lock * feat: don't fetch if exists * feat: req id only on stores * feat: always fetch * fix: rangeify * feat: allow specifying raid root * fix: dealloc buffer * feat: start support non 0 offset * clean: use cleaner * feat: don't pass to threadpool * clean: typing	2025-10-14 07:53:55 -07:00
George Hotz	fb61f3519f	remove assign contiguous hack (#12659 ) * remove assign contiguous hack * remove bad contiguous usage in torch backend * assign	2025-10-14 16:42:14 +08:00
qazal	cd6aeebfee	sqtt: osx decoder installer (#12637 )	2025-10-13 17:26:12 +08:00
nimlgen	89be3590aa	amd: sqtt on gfx12 (#12564 ) * amd: sqtt on gfx12 * cleaner * thi * and this * ops * ugh * back * rm this * rm	2025-10-10 17:54:14 +08:00
wozeparrot	f12e2a75db	feat: add thunderkittens (#12590 )	2025-10-10 00:32:33 -07:00
nimlgen	1309cea247	rocprof parser in extra (#12569 ) * rocprof parser * viewer * vw * skip	2025-10-10 14:56:42 +08:00
chenyu	c8dfd10257	ShapeTracker.real_strides -> is_expanded [pr] (#12579 ) only keep the used part	2025-10-09 22:52:45 -04:00
George Hotz	9b66c2b0b7	fix weekly commits table (i didn't know we linted extra)	2025-10-10 09:23:33 +08:00
George Hotz	658b96cbfb	weekly commits table	2025-10-10 09:15:41 +08:00
nimlgen	a11b686c71	amd: sqtt for all gfx11 (#12546 ) * amd: general sqtt for gfx11 * target * ops * no gfx12 here	2025-10-09 17:04:06 +08:00
chenyu	ae51bdd06a	remove trivial use of RANGEIFY flag (#12550 ) some tests need update still	2025-10-09 02:29:38 -04:00
George Hotz	2653147cb7	delete the lowerer (#12526 )	2025-10-08 21:58:18 +08:00
chenyu	e701106a64	remove FUSE_ARANGE (#12511 ) it was the default already	2025-10-08 04:54:07 -04:00
nimlgen	4a756a37d8	amd: support rocm7 (#12502 ) * amd: support rocm7 * mock	2025-10-08 14:30:39 +08:00
George Hotz	514d2a0774	merge tagless reshapes (#12474 ) * merge tagless reshapes * cleanup	2025-10-07 13:57:58 +08:00
George Hotz	b4509fba31	thundermittens (#12471 ) * thundermittens * give device a type	2025-10-07 11:47:39 +08:00
George Hotz	0f25b4b289	move frontend dir to nn [pr] (#12470 )	2025-10-07 10:42:22 +08:00
hooved	0f804c9a83	Stable Diffusion model init for mlperf (#12314 ) * include clip pr diff * updated unet and sd init * dehardcode default device * revert beam hang workaround --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-10-02 02:28:41 -04:00
qazal	e8c595c29e	remu: add new instructions introduced in RANGEIFY (#12363 ) * add v_mad_i64_i32 for test_output_padded_conv_transpose2d * run amd test_ops * skip test_masked_select	2025-09-30 12:36:29 +03:00
hooved	c2689c505e	Clip model updates for Stable Diffusion mlperf training (#12313 ) * stable diffusion mlperf clip changes * add clip tests * set gelu as attribute * add more tests * factor out GPUS * rerun CI * add imports to if blocks * remove unneeded axis * add clip tests to CI * move clip tests * add deps, disable max buf size	2025-09-29 21:50:14 -04:00
Sieds Lykles	cc038b31b6	Shrink instead of reshape to unregister symbolic (#12241 ) * Slice to unbind symbolic * use vmax for now * assert shape in reshape is valid * update test_symbolic_ops to use shrink instead of reshape * remove infer_with_bound_values for npw * symbolic output doesnt have symbolic strides * symbolic jit tests use shrink to unregister symbolic * update test * update more tests * wrap vmax in int() * only create a new st if the store is not an assigne * unwrap st * comments	2025-09-19 06:04:35 +02:00
qazal	a388d2cb1a	remove PROFILE=1 option, it's just VIZ=1 [pr] (#12176 ) * remove PROFILE=1 option, it's just VIZ=1 [pr] * sqtt * sqtt 2 * return last * rename	2025-09-15 12:51:50 +03:00
hooved	e1fef895b1	don't hardcode weights path (#12171 )	2025-09-15 00:33:47 -04:00
chenyu	12a910f1d2	update torch 2.8 (#12172 ) support _reshape_alias. something is wrong with one case of unfold	2025-09-14 15:19:03 -04:00
chenyu	0e266f376c	ops_gpu -> ops_cl (#12103 )	2025-09-10 15:15:48 -04:00
nimlgen	fb96394ff5	auto-select available compilers (#12094 ) * device: auto select compilers * fix * metal+opencl * nv/cuda * test without ptx * ptx * fix tests * fix * fix test * rename * test + cleaner * xx * ops * better test * win? * um? * types * debug * win?? * sep rung * wtf? * debug * skip win * revert this * types	2025-09-10 19:52:01 +03:00
nimlgen	9182948951	remove llvm_bf16_cast (#12075 )	2025-09-08 20:51:15 +03:00
nimlgen	10ac427aaa	cpu threading (#11951 ) * start cpu threading * fix * fix2 * fix * hacks? * threads * minor * no dsp * dsp 2 * n * more * test * xm * cleaner * readable * f * reorder * when no threads * rangeify * typos * not needed * reapply * remoev this * linter * fixed cpu count in ci * fix * fixes * rm * typo * sort based on speed * test if test works in ci * Revert "test if test works in ci" This reverts commit `1f05edb531`. * do not pad thread	2025-09-06 16:13:43 +03:00
Sieds Lykles	c6c16b2946	`var_vals` uses str for var (#12011 ) * var_vals is str,int * remove imports * remove print * fix test * change var_vals in hcq * update test_hcq * fix multitensor _device_num var * fix syminfer test * shorten line * p.vars stays list[Variable] * shorten line * vars is back to tuple[Variable, ...] * change var_vals in extra * change var_vals from shapetracker * var_vals is str:int * fix signature	2025-09-06 04:16:12 +02:00
George Hotz	38dcadf07b	delete kernel.py (#12040 ) * delete kernel.py * delete that file * rip and tear * don't test search * imports * fix torch frontend * not a part of regen	2025-09-05 15:52:07 -07:00
George Hotz	870f63d9cc	add WARP axistype, fix postopt bugs (#12033 ) * postopt is 83% match * warp is bright CYAN * beautiful mnist beam works * fix shutdown bug	2025-09-05 10:36:55 -07:00
George Hotz	f8e2dd4dd1	investigate opts mismatches (#12020 )	2025-09-05 07:40:29 -07:00
George Hotz	431666da74	POSTOPT=2 work (#12012 ) * POSTOPT=2 work * bugfixes * add chain in one place * tensor cores match * better hcopt check * match from old * Change POSTOPT ContextVar value to 0 * we didn't need to check that	2025-09-04 16:55:56 -07:00
George Hotz	70ce29b630	test pyrender (#12005 ) * test pyrender * make them print * switch to pyrendered	2025-09-04 11:48:40 -07:00
Sieds Lykles	572a3c15c6	Move Ops.SPECIAL arg to src (#11918 ) * initial moving bound to src * arg to src * remove import * fixup linearizer * arg to src * fix test_uop_graph * fix more tests * fix python renderer * get const value from const uop * ssimplify uop estimates * fix webgpu locals * fix old test * gate Ops.SPECIAL in linearizer * use ssimplify() for local/global_size * remove toposort gate_parents_instead_of_self * fix rendering in comment * cleanup * rename and add comments * add BottomUpGate with test	2025-09-04 09:31:44 +02:00
George Hotz	5cf42dc4db	add Scheduler to replace Kernel with POSTOPT=2 (#11924 ) * ** simple kernel to replace Kernel for postopt * support old * fix beam * beaming * beam on old * bring tensor cores back * raise * postbeam * test ops passes on mac * skip that * postopt default * gate that * fix tensor cores * a few test fixes * dsp fix * tc fix * loop * support swap * test_gemv * fix beam for variable * test opts from high level stuff * range annoying * compile slow * metal slow * better beam * no POSTBEAM * fix nolocals * hc opt mostly works * put that back * lil * some work * fix that * POSTOPT 2 * fix tests * no postopt 2 * work * back * padded tensors cores * shift_to * postopt 0 passes? * write PADTO * fix padded tensor cores * compare hcopt * 18000 lines * should pass tests * fix rangeify * put types back	2025-09-03 19:23:30 -07:00
qazal	c7bb561ef9	remu: add v_rsq_f32_e32 instruction (#11947 ) https://github.com/tinygrad/tinygrad/pull/11936 introduces a change to the AMD LLVM renderer that outputs this instruction. Adding both 32 and 64 bit variants.	2025-09-01 11:29:31 +03:00
George Hotz	afad7d0cd1	remove dtype from range, it will be dtypes.index soon [pr] (#11914 ) * remove dtype from range, it will be dtypes.index soon [pr] * a few more	2025-08-29 09:52:07 -07:00
George Hotz	394c2d1db1	update Kernel API in tests + move optimize_local_size (#11907 )	2025-08-28 15:12:47 -07:00
Ben Waldron	ea1be2e4cd	[bounty] Remove using reshape to register symbolic shape (#11771 ) * Modify tests and start work towards removing symbolic reshape * Refactor symbolic reshape * fix small error * much cleaner + fix more tests * Can remove this now * Update test_symbolic_ops and test_tiny * Couple more tests * Unused import * More tests and add EXPAND to Tensor.empty * Fix test beam search * all int * Fix rangeify by adding shrink * Remove OOB check and so fix test_symbolic_jit * test_symbolic_jit doesn't need OOB Context anymore either * Should remove that test now * Cleanups part 1 * fix linters * Final cleanups * Don't reassign inside for loop --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-08-28 12:30:49 -04:00
nimlgen	874c1db4af	am: init support for aql (#11888 )	2025-08-28 18:41:46 +03:00
George Hotz	27701ef823	add locals support to rangeify (#11826 )	2025-08-24 14:03:12 -07:00
chenyu	fb8ee02424	Tensor.logaddexp (#11793 )	2025-08-23 09:15:00 -04:00
chenyu	d0d39885c3	onnx in tinygrad (#11675 )	2025-08-14 19:57:21 -04:00
chenyu	48c4033ae1	fix pylint for onnx (#11673 ) * fix pylint for onnx * too long	2025-08-14 18:48:02 -04:00
nimlgen	4176b24264	amd: support xcc in regs (#11670 ) * amd: support xcc in regs * mockamd * typong	2025-08-14 21:20:11 +03:00
nimlgen	d747eeed32	amd logs parser based on device (#11669 )	2025-08-14 19:49:33 +03:00
geohotstan	1e904155e3	Add Onnx Huggingface to test/models/test_onnx.py (#11468 ) * BOOM * cache extra/huggingface/models/ * why max buffer size is not 0 * override MAX_BUFFER_SIZE * less models * remove more models and change cache dir to already cached dir * only metal * less is more? * remove check ops * why is this not setting the ENVVAR * ughhhhh just test in models * only cpu and gpu * only cpu actually * just override it idk * final * move extra dependencies up top * simplification * fix print * make README better * revert ops_disk fix for now * clean up test_onnx * remove testing fashion clip model cuz sloooowwwwww * actually let METAL run this * fix comment mistake * fix download path in run_models * does this work? * cleanup setup and teardown * contextvar like this? * prove model is cached * do I need to increment DOWNLOAD_CACHE_VERSION? * see if cached with incremented DOWNLOAD_CACHE_VERSION * use warnings to see if the model exists * revert DOWNLOAD_CACHE_VERSION stuff and clean up * add retry to download * nit	2025-08-14 11:16:41 -04:00
kevvz	e2873a3a41	[bounty] Muon optim (#11414 ) * newton schulz * add muon + move newton schulz to tensor * compact newton schulz * better tests * cleanup * add comments for muon * cleanup * add export with tests * match muon optim with test optim * cleanup * unsed import * correct comment * whitespace * move export * muon test fix * match reference impl + tests * remove export by moving muon device * add credit * cleanup * remove print * spacing * spacing * comma * cleanup * removal * fix tests + optim momentum * consistent is not/ not * more consistency * fix test * cleanup * fix the nones * remove comment * cast * comment * comment * muon teeny test * muon flag beautiful mnist * set steps * steps as hyperparam * match default test steps * name * large cleanup * dont care about steps * nesterov false default * match each other impl * steps * switch nest * swap defaults * update docstring * add no nesterov test * ban fuse_optim * prints * classical momentum * alternative condition * recon * pre + post wd * false default * detach * signature changes * context * swap order * big cleanup * 0 step instead * parity * remove fuse * remove fused * better paper * assert message * correct shape check + eps * multidim * add eps * cleanup * correct assert message * lint * better tests * naming * ns_steps,ns_params * update docstring * docstring * match sgd and muon together * sandwich * add back fused * parity --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-08-13 14:27:55 -04:00
geohotstan	cf7224ce3e	fully lint onnx.py (#11647 ) * mypy * ruff ruff ruff	2025-08-13 08:22:06 -07:00

1 2 3 4 5 ...

1259 Commits