tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	4a151e7533	make xcode signing happy, waiting for entitlement (#12712 )	2025-10-16 10:20:34 +08:00
Daniel	d65bd669f8	update tiny torch backend hook (#12575 ) * update the backend to fix torch deprecation warning * use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients * fix indentation --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-10-15 14:02:33 -04:00
Christopher Milan	0aabc1e938	Mesa NIR backend (NAK/LLVMpipe) (#12089 ) * nak works * TestOps::test_add works * testop has no crashes * fix bool casts * fix typo * add disassemble * RANGE and locals/regs * simplify NAKCompiler * disass cleanup * cleanup nir codegen * almost all tests passing * cleanup notes in extra/ * old notes * only import nak if NIR=1 * fix new SPECIAL syntax * fix local/shared memory * more tests passing * add DEFINE_VAR support * llvmpipe kinda works * diskcache * some mypy stuff * lvp passing test_ops.py * fix imports * actually fix imports * remove 'stdout' * fix llvm import * fix mypy issues * nicer errors * simpler test_dtype skips * test lvp in CI * fix github action syntax * fix more actions typos * switch to mesa 25.1.0 * diskcache_put * better generation for lvp nir_options * b64encode shader blobs * Revert diskcache changes This reverts commits `930fa3de8a` and `8428c694b3`. * general cleanup * better error messages * fix llvm import * fix windows tests * link with libm and libgcc_s * fix some errors * dont check for 'float4' * NIR uses pointer arithmetic * use tinymesa * bump tinymesa * bump tinymesa again * update lvp nir_options * print nir shader with DEBUG * simplify LVPCompiler * more tests * "gated" STORE * NAK is cacheable * more tests * all tests pass locally for NAK * test autogen in CI * autogen deps * more deps * fix uop_gc * fix macos * mypy * save 2 lines * save two more lines * save 1 line * save 4 lines * save more lines * Revert "save more lines" This reverts commit `dd3a720c5a`. * save more lines * fix LVP on windows * refactor * reorganize some code * refactor lib_gpu * move LVP check * out of order loads * remove support.mesa * bump tinymesa version * simplify LVP jit * macos * macos ci * shell: bash * testing * more testing * compute brew prefix * stupid typo * actually fix * lib * stdout on macos * inline gallivm_compile_module * Revert "inline gallivm_compile_module" This reverts commit `b65983b151`. * elf macos * semicolon * inherit from CPULLVMCompiler * ruff * disas test * fix libm linking * default is fine actually * arm works * add elf loader link test * fix NAK beam * pylint is too smart by half --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-10-15 17:38:33 +08:00
nimlgen	aa81bde150	amd: usb4/thunderbolt on macs (#12641 ) * tbgpu * works * cleaner * this * zero size * h * fix * simpler * prio over usb * c * not needed * linter * this way * mappings * mypy * mypy * mypy 2 * nn	2025-10-15 13:02:01 +08:00
wozeparrot	f228c03f9f	fetch raid from cloud (#10799 ) * feat: initial tinyfs device * feat: don't allow compute on tinyfs device * feat: tensor helpers to load and store * feat: bufferview for tinyfs * fix: keep copy sizes correct * fix: recv large * clean: unneeded * feat: comment * clean: unneeded * clean: remove * clean: remove * feat: get request tag * feat: rename to cloud * feat: send request_id * feat: start computing tree * feat: compute store tree on this side * feat: jank chunked load * feat: more debugging * feat: rename to just load and store * feat: correct chunk count * fix: fix load for < 1mb * feat: comments * feat: don't truncate on block devices * feat: better way of testing block device * feat: don't need to pad that much * feat: connect to nodes directly on load * feat: cache connections * feat: don't hard code chunk size * feat: close mmap when closing file handle * feat: don't overwrite stuff on disk if storing from disk * clean: debug print * fix: close mmap * feat: await workers * feat: fast copy from tinyfs to disk * feat: don't copy to device on last * feat: use single socket per device * feat: raid in tinyfs * clean: remove import * clean: type * feat: maintain single event loop * feat: lower worker count * feat: use connection pool * feat: fetch mapping in its own process * fix: release lock * feat: don't fetch if exists * feat: req id only on stores * feat: always fetch * fix: rangeify * feat: allow specifying raid root * fix: dealloc buffer * feat: start support non 0 offset * clean: use cleaner * feat: don't pass to threadpool * clean: typing	2025-10-14 07:53:55 -07:00
George Hotz	fb61f3519f	remove assign contiguous hack (#12659 ) * remove assign contiguous hack * remove bad contiguous usage in torch backend * assign	2025-10-14 16:42:14 +08:00
qazal	cd6aeebfee	sqtt: osx decoder installer (#12637 )	2025-10-13 17:26:12 +08:00
nimlgen	89be3590aa	amd: sqtt on gfx12 (#12564 ) * amd: sqtt on gfx12 * cleaner * thi * and this * ops * ugh * back * rm this * rm	2025-10-10 17:54:14 +08:00
wozeparrot	f12e2a75db	feat: add thunderkittens (#12590 )	2025-10-10 00:32:33 -07:00
nimlgen	1309cea247	rocprof parser in extra (#12569 ) * rocprof parser * viewer * vw * skip	2025-10-10 14:56:42 +08:00
chenyu	c8dfd10257	ShapeTracker.real_strides -> is_expanded [pr] (#12579 ) only keep the used part	2025-10-09 22:52:45 -04:00
George Hotz	9b66c2b0b7	fix weekly commits table (i didn't know we linted extra)	2025-10-10 09:23:33 +08:00
George Hotz	658b96cbfb	weekly commits table	2025-10-10 09:15:41 +08:00
nimlgen	a11b686c71	amd: sqtt for all gfx11 (#12546 ) * amd: general sqtt for gfx11 * target * ops * no gfx12 here	2025-10-09 17:04:06 +08:00
chenyu	ae51bdd06a	remove trivial use of RANGEIFY flag (#12550 ) some tests need update still	2025-10-09 02:29:38 -04:00
George Hotz	2653147cb7	delete the lowerer (#12526 )	2025-10-08 21:58:18 +08:00
chenyu	e701106a64	remove FUSE_ARANGE (#12511 ) it was the default already	2025-10-08 04:54:07 -04:00
nimlgen	4a756a37d8	amd: support rocm7 (#12502 ) * amd: support rocm7 * mock	2025-10-08 14:30:39 +08:00
George Hotz	514d2a0774	merge tagless reshapes (#12474 ) * merge tagless reshapes * cleanup	2025-10-07 13:57:58 +08:00
George Hotz	b4509fba31	thundermittens (#12471 ) * thundermittens * give device a type	2025-10-07 11:47:39 +08:00
George Hotz	0f25b4b289	move frontend dir to nn [pr] (#12470 )	2025-10-07 10:42:22 +08:00
hooved	0f804c9a83	Stable Diffusion model init for mlperf (#12314 ) * include clip pr diff * updated unet and sd init * dehardcode default device * revert beam hang workaround --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-10-02 02:28:41 -04:00
qazal	e8c595c29e	remu: add new instructions introduced in RANGEIFY (#12363 ) * add v_mad_i64_i32 for test_output_padded_conv_transpose2d * run amd test_ops * skip test_masked_select	2025-09-30 12:36:29 +03:00
hooved	c2689c505e	Clip model updates for Stable Diffusion mlperf training (#12313 ) * stable diffusion mlperf clip changes * add clip tests * set gelu as attribute * add more tests * factor out GPUS * rerun CI * add imports to if blocks * remove unneeded axis * add clip tests to CI * move clip tests * add deps, disable max buf size	2025-09-29 21:50:14 -04:00
Sieds Lykles	cc038b31b6	Shrink instead of reshape to unregister symbolic (#12241 ) * Slice to unbind symbolic * use vmax for now * assert shape in reshape is valid * update test_symbolic_ops to use shrink instead of reshape * remove infer_with_bound_values for npw * symbolic output doesnt have symbolic strides * symbolic jit tests use shrink to unregister symbolic * update test * update more tests * wrap vmax in int() * only create a new st if the store is not an assigne * unwrap st * comments	2025-09-19 06:04:35 +02:00
qazal	a388d2cb1a	remove PROFILE=1 option, it's just VIZ=1 [pr] (#12176 ) * remove PROFILE=1 option, it's just VIZ=1 [pr] * sqtt * sqtt 2 * return last * rename	2025-09-15 12:51:50 +03:00
hooved	e1fef895b1	don't hardcode weights path (#12171 )	2025-09-15 00:33:47 -04:00
chenyu	12a910f1d2	update torch 2.8 (#12172 ) support _reshape_alias. something is wrong with one case of unfold	2025-09-14 15:19:03 -04:00
chenyu	0e266f376c	ops_gpu -> ops_cl (#12103 )	2025-09-10 15:15:48 -04:00
nimlgen	fb96394ff5	auto-select available compilers (#12094 ) * device: auto select compilers * fix * metal+opencl * nv/cuda * test without ptx * ptx * fix tests * fix * fix test * rename * test + cleaner * xx * ops * better test * win? * um? * types * debug * win?? * sep rung * wtf? * debug * skip win * revert this * types	2025-09-10 19:52:01 +03:00
nimlgen	9182948951	remove llvm_bf16_cast (#12075 )	2025-09-08 20:51:15 +03:00
nimlgen	10ac427aaa	cpu threading (#11951 ) * start cpu threading * fix * fix2 * fix * hacks? * threads * minor * no dsp * dsp 2 * n * more * test * xm * cleaner * readable * f * reorder * when no threads * rangeify * typos * not needed * reapply * remoev this * linter * fixed cpu count in ci * fix * fixes * rm * typo * sort based on speed * test if test works in ci * Revert "test if test works in ci" This reverts commit `1f05edb531`. * do not pad thread	2025-09-06 16:13:43 +03:00
Sieds Lykles	c6c16b2946	`var_vals` uses str for var (#12011 ) * var_vals is str,int * remove imports * remove print * fix test * change var_vals in hcq * update test_hcq * fix multitensor _device_num var * fix syminfer test * shorten line * p.vars stays list[Variable] * shorten line * vars is back to tuple[Variable, ...] * change var_vals in extra * change var_vals from shapetracker * var_vals is str:int * fix signature	2025-09-06 04:16:12 +02:00
George Hotz	38dcadf07b	delete kernel.py (#12040 ) * delete kernel.py * delete that file * rip and tear * don't test search * imports * fix torch frontend * not a part of regen	2025-09-05 15:52:07 -07:00
George Hotz	870f63d9cc	add WARP axistype, fix postopt bugs (#12033 ) * postopt is 83% match * warp is bright CYAN * beautiful mnist beam works * fix shutdown bug	2025-09-05 10:36:55 -07:00
George Hotz	f8e2dd4dd1	investigate opts mismatches (#12020 )	2025-09-05 07:40:29 -07:00
George Hotz	431666da74	POSTOPT=2 work (#12012 ) * POSTOPT=2 work * bugfixes * add chain in one place * tensor cores match * better hcopt check * match from old * Change POSTOPT ContextVar value to 0 * we didn't need to check that	2025-09-04 16:55:56 -07:00
George Hotz	70ce29b630	test pyrender (#12005 ) * test pyrender * make them print * switch to pyrendered	2025-09-04 11:48:40 -07:00
Sieds Lykles	572a3c15c6	Move Ops.SPECIAL arg to src (#11918 ) * initial moving bound to src * arg to src * remove import * fixup linearizer * arg to src * fix test_uop_graph * fix more tests * fix python renderer * get const value from const uop * ssimplify uop estimates * fix webgpu locals * fix old test * gate Ops.SPECIAL in linearizer * use ssimplify() for local/global_size * remove toposort gate_parents_instead_of_self * fix rendering in comment * cleanup * rename and add comments * add BottomUpGate with test	2025-09-04 09:31:44 +02:00
George Hotz	5cf42dc4db	add Scheduler to replace Kernel with POSTOPT=2 (#11924 ) * ** simple kernel to replace Kernel for postopt * support old * fix beam * beaming * beam on old * bring tensor cores back * raise * postbeam * test ops passes on mac * skip that * postopt default * gate that * fix tensor cores * a few test fixes * dsp fix * tc fix * loop * support swap * test_gemv * fix beam for variable * test opts from high level stuff * range annoying * compile slow * metal slow * better beam * no POSTBEAM * fix nolocals * hc opt mostly works * put that back * lil * some work * fix that * POSTOPT 2 * fix tests * no postopt 2 * work * back * padded tensors cores * shift_to * postopt 0 passes? * write PADTO * fix padded tensor cores * compare hcopt * 18000 lines * should pass tests * fix rangeify * put types back	2025-09-03 19:23:30 -07:00
qazal	c7bb561ef9	remu: add v_rsq_f32_e32 instruction (#11947 ) https://github.com/tinygrad/tinygrad/pull/11936 introduces a change to the AMD LLVM renderer that outputs this instruction. Adding both 32 and 64 bit variants.	2025-09-01 11:29:31 +03:00
George Hotz	afad7d0cd1	remove dtype from range, it will be dtypes.index soon [pr] (#11914 ) * remove dtype from range, it will be dtypes.index soon [pr] * a few more	2025-08-29 09:52:07 -07:00
George Hotz	394c2d1db1	update Kernel API in tests + move optimize_local_size (#11907 )	2025-08-28 15:12:47 -07:00
Ben Waldron	ea1be2e4cd	[bounty] Remove using reshape to register symbolic shape (#11771 ) * Modify tests and start work towards removing symbolic reshape * Refactor symbolic reshape * fix small error * much cleaner + fix more tests * Can remove this now * Update test_symbolic_ops and test_tiny * Couple more tests * Unused import * More tests and add EXPAND to Tensor.empty * Fix test beam search * all int * Fix rangeify by adding shrink * Remove OOB check and so fix test_symbolic_jit * test_symbolic_jit doesn't need OOB Context anymore either * Should remove that test now * Cleanups part 1 * fix linters * Final cleanups * Don't reassign inside for loop --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-08-28 12:30:49 -04:00
nimlgen	874c1db4af	am: init support for aql (#11888 )	2025-08-28 18:41:46 +03:00
George Hotz	27701ef823	add locals support to rangeify (#11826 )	2025-08-24 14:03:12 -07:00
chenyu	fb8ee02424	Tensor.logaddexp (#11793 )	2025-08-23 09:15:00 -04:00
chenyu	d0d39885c3	onnx in tinygrad (#11675 )	2025-08-14 19:57:21 -04:00
chenyu	48c4033ae1	fix pylint for onnx (#11673 ) * fix pylint for onnx * too long	2025-08-14 18:48:02 -04:00
nimlgen	4176b24264	amd: support xcc in regs (#11670 ) * amd: support xcc in regs * mockamd * typong	2025-08-14 21:20:11 +03:00

1 2 3 4 5 ...

1263 Commits