tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 15:38:29 -05:00

Author	SHA1	Message	Date
George Hotz	4a151e7533	make xcode signing happy, waiting for entitlement (#12712 )	2025-10-16 10:20:34 +08:00
chenyu	c3278e5622	clean up old tests (#12708 )	2025-10-15 17:53:17 -04:00
chenyu	b8cf35fb77	print macOS version in CI (#12705 )	2025-10-15 15:05:33 -04:00
Daniel	d65bd669f8	update tiny torch backend hook (#12575 ) * update the backend to fix torch deprecation warning * use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients * fix indentation --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-10-15 14:02:33 -04:00
nimlgen	db5ae846aa	nv: do not use va_addr for cpu accesses (#12697 ) * nv: do not use va_addr for cpu accesses * mypy	2025-10-15 22:48:12 +08:00
nimlgen	3ab23af829	nv: copy prog with copyin (#12701 ) * nv: copy prog with copyin * to bytes * fix test	2025-10-15 22:48:01 +08:00
nimlgen	fafbf3daea	memory: reserve ptable (#12702 )	2025-10-15 22:47:50 +08:00
George Hotz	85a907605c	hotfix: only 20 steps of beautiful_mnist_torch, some CI machines are slow	2025-10-15 22:29:34 +08:00
Christopher Milan	e1996d358c	use RTLD_GLOBAL on macos (#12699 )	2025-10-15 22:24:50 +08:00
chenyu	312c622d35	support None in pad_to and shrink_to (#12700 )	2025-10-15 09:25:31 -04:00
George Hotz	612e3d6143	replace mop arg with vectorized index (#12695 ) * replace mop arg with vectorized index * tests passing * better viz * no compile4	2025-10-15 20:50:06 +08:00
wozeparrot	9ec4c06d7d	feat: one request per device (#12698 )	2025-10-15 05:22:07 -07:00
Sieds Lykles	99aa3bd5f9	reduce collapse reduce only the cut range (#12687 )	2025-10-15 13:57:41 +02:00
Sieds Lykles	91ac4f1f92	late merging of where and load (#12694 )	2025-10-15 13:33:06 +02:00
qazal	768dc952de	viz ui cleanups / renaming (#12691 ) * better viz names * delete unused * don't use opacity, it's multiplicative * keep styles * scrollbar coloring * pyrender doesn't work here beautiful_mnist r_64_16_32_36@lower all index dtypes	2025-10-15 18:40:22 +08:00
chenyu	2e50ed0767	increase timeout of resnet cron (#12693 ) does not finish in 6 hours now	2025-10-15 06:08:58 -04:00
Christopher Milan	0aabc1e938	Mesa NIR backend (NAK/LLVMpipe) (#12089 ) * nak works * TestOps::test_add works * testop has no crashes * fix bool casts * fix typo * add disassemble * RANGE and locals/regs * simplify NAKCompiler * disass cleanup * cleanup nir codegen * almost all tests passing * cleanup notes in extra/ * old notes * only import nak if NIR=1 * fix new SPECIAL syntax * fix local/shared memory * more tests passing * add DEFINE_VAR support * llvmpipe kinda works * diskcache * some mypy stuff * lvp passing test_ops.py * fix imports * actually fix imports * remove 'stdout' * fix llvm import * fix mypy issues * nicer errors * simpler test_dtype skips * test lvp in CI * fix github action syntax * fix more actions typos * switch to mesa 25.1.0 * diskcache_put * better generation for lvp nir_options * b64encode shader blobs * Revert diskcache changes This reverts commits `930fa3de8a` and `8428c694b3`. * general cleanup * better error messages * fix llvm import * fix windows tests * link with libm and libgcc_s * fix some errors * dont check for 'float4' * NIR uses pointer arithmetic * use tinymesa * bump tinymesa * bump tinymesa again * update lvp nir_options * print nir shader with DEBUG * simplify LVPCompiler * more tests * "gated" STORE * NAK is cacheable * more tests * all tests pass locally for NAK * test autogen in CI * autogen deps * more deps * fix uop_gc * fix macos * mypy * save 2 lines * save two more lines * save 1 line * save 4 lines * save more lines * Revert "save more lines" This reverts commit `dd3a720c5a`. * save more lines * fix LVP on windows * refactor * reorganize some code * refactor lib_gpu * move LVP check * out of order loads * remove support.mesa * bump tinymesa version * simplify LVP jit * macos * macos ci * shell: bash * testing * more testing * compute brew prefix * stupid typo * actually fix * lib * stdout on macos * inline gallivm_compile_module * Revert "inline gallivm_compile_module" This reverts commit `b65983b151`. * elf macos * semicolon * inherit from CPULLVMCompiler * ruff * disas test * fix libm linking * default is fine actually * arm works * add elf loader link test * fix NAK beam * pylint is too smart by half --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-10-15 17:38:33 +08:00
qazal	f0268d13f6	cleanup viz server (#12688 )	2025-10-15 15:58:36 +08:00
nimlgen	aa81bde150	amd: usb4/thunderbolt on macs (#12641 ) * tbgpu * works * cleaner * this * zero size * h * fix * simpler * prio over usb * c * not needed * linter * this way * mappings * mypy * mypy * mypy 2 * nn	2025-10-15 13:02:01 +08:00
George Hotz	236c4590c3	use margs as intermediate for new style mops (#12686 ) * use marg to prepare for movement op change * clean up forced reshape * move marg * more marg * more	2025-10-15 12:43:00 +08:00
qazal	7597e1dcac	pyrender in viz (#12682 ) * pyrender in viz * keep profile still print_tree * keep special in render	2025-10-15 11:53:30 +08:00
qazal	60e03eec37	viz: add View Program option (#12683 )	2025-10-15 11:37:51 +08:00
George Hotz	a59439d013	use UOp.shape property instead of UOp.st (#12664 ) * work on shape property * reshape causing issues * more mops * all mops * need to cache it * _shape is like _device * mostly works * shape is good * const uses _shape * fix tests * size doesn't use st * close * test is broken * one less st * hack for 3 op assign * oops, i didn't mean to change that * support emulate in the NullDevice * reproed failure in emulation * fix wmma	2025-10-15 10:01:34 +08:00
chenyu	89df6f611d	reenable sdxl mac benchmark (#12680 ) also updated faster sd step times	2025-10-14 17:36:17 -04:00
chenyu	d25ceffe8d	update padto opts tests (#12679 )	2025-10-14 17:00:42 -04:00
chenyu	e8380968f2	add venv_sd_mlperf to gitignore (#12676 ) training stable diffusion stuff	2025-10-14 12:51:36 -04:00
wozeparrot	f228c03f9f	fetch raid from cloud (#10799 ) * feat: initial tinyfs device * feat: don't allow compute on tinyfs device * feat: tensor helpers to load and store * feat: bufferview for tinyfs * fix: keep copy sizes correct * fix: recv large * clean: unneeded * feat: comment * clean: unneeded * clean: remove * clean: remove * feat: get request tag * feat: rename to cloud * feat: send request_id * feat: start computing tree * feat: compute store tree on this side * feat: jank chunked load * feat: more debugging * feat: rename to just load and store * feat: correct chunk count * fix: fix load for < 1mb * feat: comments * feat: don't truncate on block devices * feat: better way of testing block device * feat: don't need to pad that much * feat: connect to nodes directly on load * feat: cache connections * feat: don't hard code chunk size * feat: close mmap when closing file handle * feat: don't overwrite stuff on disk if storing from disk * clean: debug print * fix: close mmap * feat: await workers * feat: fast copy from tinyfs to disk * feat: don't copy to device on last * feat: use single socket per device * feat: raid in tinyfs * clean: remove import * clean: type * feat: maintain single event loop * feat: lower worker count * feat: use connection pool * feat: fetch mapping in its own process * fix: release lock * feat: don't fetch if exists * feat: req id only on stores * feat: always fetch * fix: rangeify * feat: allow specifying raid root * fix: dealloc buffer * feat: start support non 0 offset * clean: use cleaner * feat: don't pass to threadpool * clean: typing	2025-10-14 07:53:55 -07:00
chenyu	70dd297a05	BS=96 for bert (#12675 ) 96 trains fine now	2025-10-14 09:07:43 -04:00
Sieds Lykles	852d80dff9	better where on load folding (#12651 ) * move where clauses to load * shorten line * drop clauses if they are duplicated * add rule for swapped where branch * where on ungated load * dont move clause if load is in the clause * parse_valid returns None * no data dependent branches * fix rule * enable swapped rule * remove those	2025-10-14 13:30:47 +02:00
nimlgen	c7e63601fd	gfx1200 tc for AMD_LLVM (#12673 )	2025-10-14 19:17:48 +08:00
George Hotz	db4a359374	fix up some slow tests that launch python (#12672 ) * fix up some slow tests that launch python * svd nonfull in parallel * split test_advancedindex	2025-10-14 19:13:55 +08:00
nimlgen	4918c827c2	amd: lib_gpu does not need cpu_access (#12670 )	2025-10-14 18:34:34 +08:00
nimlgen	0c9d47deab	hcq: add alignment to kernargs (#12669 )	2025-10-14 18:33:12 +08:00
qazal	d3bfcd3277	minor patches for SQTT over usb on gfx12 (#12627 ) * disable cpu_access in the sqtt buffer allocation not sure if this is required, it results in a very slow call to pcie_mem_write over USB GPU, removing it worked fine. * fix itrace_se_mask on gfx12 on gfx11 it gave 6 se, on gfx11 this value is 2 so no instructions were traced. * Revert "fix itrace_se_mask on gfx12" This reverts commit `0644adbcd1`.	2025-10-14 18:07:46 +08:00
Sieds Lykles	1e6e5a0efd	`parse_valid` returns None instead of raising (#12663 ) * parse_valid returns None * change there too	2025-10-14 11:57:38 +02:00
qazal	471bd30d16	cleanup viz/serve.py (#12665 ) * use load_pickle * update comment	2025-10-14 17:50:39 +08:00
George Hotz	fb61f3519f	remove assign contiguous hack (#12659 ) * remove assign contiguous hack * remove bad contiguous usage in torch backend * assign	2025-10-14 16:42:14 +08:00
George Hotz	30ee7c4c26	cleanup Device usage in Tensor (#12662 )	2025-10-14 16:22:22 +08:00
Sieds Lykles	e06cbfcb8a	combine `pm_drop_and_clauses` (#12660 ) * combine those * wino kernels decreased	2025-10-14 10:09:41 +02:00
George Hotz	84d4589ed4	remove pylint from pre-commit and CI (#12658 ) * remove pylint from pre-commit and CI * multidevice test is fast * faster pre-commit * 8 is faster than 4 * better name * how did that typecheck?	2025-10-14 15:39:59 +08:00
qazal	8ecaf839e2	cleanup UOp tracing [pr] (#12657 )	2025-10-14 14:50:59 +08:00
George Hotz	b9eb5b5d49	clean up the LLM tokenizer (#12653 ) * clean up the LLM tokenizer * simple tokenizer is actually simple * ugh write good code	2025-10-14 14:22:01 +08:00
qazal	a9ef93176f	viz: add colored text helper (#12654 )	2025-10-14 13:05:26 +08:00
George Hotz	ecdc7539a2	add typing to MathTraits (#12650 ) * add typing to MathTraits * fix assign	2025-10-14 12:35:20 +08:00
qazal	9bf032de69	viz: keep focused shape in view (#12648 )	2025-10-14 10:49:08 +08:00
chenyu	77b5e6774e	fix bert training config (#12647 ) FREE_INTERMEDIATE=0 REWRITE_STACK_LIMIT=500000	2025-10-13 15:03:47 -04:00
nimlgen	f1041dc0ac	pylint 4.0.0 (#12642 ) * cpu: fix spacing * fix pylint * fix pylint * pylint 4.0.0 * lambda * keep eval for now * im so sorry	2025-10-13 23:28:36 +08:00
wozeparrot	47e0c43976	feat: Tensor.{load, store} (#12629 )	2025-10-13 08:04:41 -07:00
chenyu	0f776c6e46	examples/mlperf/training_submission_v6.0 (#12644 ) copied from v5.1	2025-10-13 09:58:25 -04:00
Sieds Lykles	e0139fafc1	UOp symbolic tests use eval to check against string (#12643 )	2025-10-13 14:19:42 +02:00

1 2 3 4 5 ...

10606 Commits