tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-08 22:48:25 -05:00

Author	SHA1	Message	Date
George Hotz	65a0a31475	AMD mi350x matmul from stream (#13040 ) * works * working mfma * 120 TFLOPS * regs * 192 TFLOPS * try pipelining * something * notes * contract * linter to 3.11 * that was a bug	2025-11-01 17:55:19 +08:00
nimlgen	f6786c1bfd	autogen: py314 (#13038 ) * autogen: py314 * bump py?	2025-11-01 04:02:19 +08:00
nimlgen	4b001ec723	amd: pmc in mockgpu (#13000 ) * amd: pmc in mockgpu * fix * do not open in ci	2025-10-30 01:52:02 +08:00
George Hotz	5e01cc299b	zero len ranges fail (#12974 ) * zero len ranges fail * fix Python backend * fix llvm * fix ptx * yolo fix nir * this works... * always store... * always store... * Revert "always store..." This reverts commit `0816cf344d`.	2025-10-28 22:49:55 +08:00
George Hotz	e936aa7974	cleanups from if range branch (#12973 )	2025-10-28 20:58:47 +08:00
George Hotz	2832954bcb	test with IGNORE_OOB=0 (#12960 )	2025-10-28 10:32:19 +08:00
George Hotz	7784cec48e	pytest-split on spec (#12959 )	2025-10-28 10:09:01 +08:00
George Hotz	25c2da1579	check SPEC=2 in CI (#12945 ) * check SPEC=2 in CI * split SPEC=2 * fast enough	2025-10-27 21:53:57 +08:00
George Hotz	8a941d95a4	SPEC=2 is full spec, SPEC=1 is default (#12910 ) * SPEC=1 passes all tests * just use SPEC, not __debug__	2025-10-25 11:10:43 +08:00
chenyu	4b7329001d	clean up test_avg_pool3d (#12905 )	2025-10-24 14:31:36 -04:00
chenyu	154b4f9f40	test FUSE_OPTIM=1 test/test_optim.py (#12895 )	2025-10-23 15:54:27 -04:00
b1tg	60d7e232f2	cuda fp8 (#12782 ) * cuda fp8 * tensor core * tc test * clean * clean pm	2025-10-21 15:05:25 -04:00
Harald Schäfer	587ccc0e5c	compile3: make selftests opt-in (#12851 )	2025-10-21 11:32:27 -07:00
Harald Schäfer	addc54b96c	Simplify openpilot compile3.py (#12748 ) * Simpler compile3 * tests * remove default args * onnx file is still fp16 * self-test FP16 too * allow test disable * absurd tolerance * Just do latest * Try simplest * use later models * kernel count not relevant if speed is good * dead improts * Revert "dead improts" This reverts commit `f68c2cd15d`. * Revert "kernel count not relevant if speed is good" This reverts commit `0955ca4ee0`. * add back kernal count check on latest model	2025-10-18 10:12:22 -04:00
George Hotz	1d1e1d9d88	delete the ShapeTracker (#12720 ) * delete the ShapeTracker * fix tests * fix more * fix gc test	2025-10-16 15:36:22 +08:00
George Hotz	592e86f6f5	remove UOp.st (#12716 ) * remove UOp.st * fix tests * torch backend disable	2025-10-16 14:44:09 +08:00
George Hotz	85a907605c	hotfix: only 20 steps of beautiful_mnist_torch, some CI machines are slow	2025-10-15 22:29:34 +08:00
George Hotz	612e3d6143	replace mop arg with vectorized index (#12695 ) * replace mop arg with vectorized index * tests passing * better viz * no compile4	2025-10-15 20:50:06 +08:00
Christopher Milan	0aabc1e938	Mesa NIR backend (NAK/LLVMpipe) (#12089 ) * nak works * TestOps::test_add works * testop has no crashes * fix bool casts * fix typo * add disassemble * RANGE and locals/regs * simplify NAKCompiler * disass cleanup * cleanup nir codegen * almost all tests passing * cleanup notes in extra/ * old notes * only import nak if NIR=1 * fix new SPECIAL syntax * fix local/shared memory * more tests passing * add DEFINE_VAR support * llvmpipe kinda works * diskcache * some mypy stuff * lvp passing test_ops.py * fix imports * actually fix imports * remove 'stdout' * fix llvm import * fix mypy issues * nicer errors * simpler test_dtype skips * test lvp in CI * fix github action syntax * fix more actions typos * switch to mesa 25.1.0 * diskcache_put * better generation for lvp nir_options * b64encode shader blobs * Revert diskcache changes This reverts commits `930fa3de8a` and `8428c694b3`. * general cleanup * better error messages * fix llvm import * fix windows tests * link with libm and libgcc_s * fix some errors * dont check for 'float4' * NIR uses pointer arithmetic * use tinymesa * bump tinymesa * bump tinymesa again * update lvp nir_options * print nir shader with DEBUG * simplify LVPCompiler * more tests * "gated" STORE * NAK is cacheable * more tests * all tests pass locally for NAK * test autogen in CI * autogen deps * more deps * fix uop_gc * fix macos * mypy * save 2 lines * save two more lines * save 1 line * save 4 lines * save more lines * Revert "save more lines" This reverts commit `dd3a720c5a`. * save more lines * fix LVP on windows * refactor * reorganize some code * refactor lib_gpu * move LVP check * out of order loads * remove support.mesa * bump tinymesa version * simplify LVP jit * macos * macos ci * shell: bash * testing * more testing * compute brew prefix * stupid typo * actually fix * lib * stdout on macos * inline gallivm_compile_module * Revert "inline gallivm_compile_module" This reverts commit `b65983b151`. * elf macos * semicolon * inherit from CPULLVMCompiler * ruff * disas test * fix libm linking * default is fine actually * arm works * add elf loader link test * fix NAK beam * pylint is too smart by half --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-10-15 17:38:33 +08:00
George Hotz	a59439d013	use UOp.shape property instead of UOp.st (#12664 ) * work on shape property * reshape causing issues * more mops * all mops * need to cache it * _shape is like _device * mostly works * shape is good * const uses _shape * fix tests * size doesn't use st * close * test is broken * one less st * hack for 3 op assign * oops, i didn't mean to change that * support emulate in the NullDevice * reproed failure in emulation * fix wmma	2025-10-15 10:01:34 +08:00
George Hotz	84d4589ed4	remove pylint from pre-commit and CI (#12658 ) * remove pylint from pre-commit and CI * multidevice test is fast * faster pre-commit * 8 is faster than 4 * better name * how did that typecheck?	2025-10-14 15:39:59 +08:00
Sieds Lykles	e537e895b1	drop unused invalid conditions (#12635 ) * drop where conditions if the ranges are not used inside the index * remove allow_any_len	2025-10-13 10:52:21 +02:00
chenyu	8f5f57c7d9	smaller CNT fuzz shapetracker (#12626 )	2025-10-12 08:52:30 -04:00
Sieds Lykles	772a8dfe31	reshape uses valid when simplifying (#12597 ) * reshape uses valid when simplifying * try with IGNORE_OOB=0 * is it this test? * skipif gpuocelot	2025-10-11 17:02:54 +02:00
Sieds Lykles	cbdc13279d	fix openpilot gated reads (#12570 ) * fix gated image counts * slice correctly	2025-10-10 04:52:57 +02:00
chenyu	a0cbbc35ad	remove LLAMA_LAYERS in ci (#12562 )	2025-10-09 04:46:41 -04:00
nimlgen	658c566e22	vars in gated_read_image_count (#12486 ) * vars in gated_read_image_count * nc	2025-10-09 14:54:15 +08:00
chenyu	942022c309	smaller LLAMA_LAYER in Test llama 3 training (#12516 ) very slow now	2025-10-08 05:10:51 -04:00
chenyu	e701106a64	remove FUSE_ARANGE (#12511 ) it was the default already	2025-10-08 04:54:07 -04:00
chenyu	da1f46ff3f	remove RANGEIFY specific test jobs (#12507 )	2025-10-08 04:12:04 -04:00
George Hotz	403fdfcfd4	check spec in test, cleanup vectorize render (#12484 )	2025-10-07 17:05:50 +08:00
chenyu	8ad5f9e74f	skip slow benchmarks (#12481 ) * skip slow benchmarks padded tc is already slow, rest are slow with rangeify (correct if run locally) * relax more	2025-10-07 03:28:56 -04:00
chenyu	1823a5043f	don't check MAX_BUFFER_SIZE on NULL (#12461 )	2025-10-05 22:09:29 -04:00
chenyu	74b04f7dca	test beautiful_mnist_multigpu (#12455 ) * test beautiful_mnist_multigpu another example that fails with RANGEIFY * now i remember * MAX_BUFFER_SIZE=0	2025-10-05 08:45:01 -04:00
chenyu	98163832e4	update RANGEIFY test_cast_padded (#12421 ) * update RANGEIFY test_cast_padded * update test	2025-10-02 04:37:35 -04:00
chenyu	37beef6de3	add null bert training test in ci (#12420 ) fails with RANGEIFY `RuntimeError: children not making progress`	2025-10-02 04:05:19 -04:00
b1tg	ec177c80c2	rangeify: fix test_where_fold (llvm) (#12416 ) * rangeify: fix test_where_fold (AMD_LLVM) * rm comment	2025-10-02 02:57:49 -04:00
qazal	d1c868f990	fix limit_bufs with multi (#12414 )	2025-10-02 05:51:56 +03:00
qazal	5b649616ff	rangeify: detect and assert cycles (#12405 ) * rangeify: assert cycles * rng=2 * any	2025-10-02 03:39:43 +03:00
b1tg	ac3d457d5e	rangeify: TestReduceOpsConstFolding (#12397 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-10-01 17:58:19 +08:00
chenyu	6c95b1f39d	explicitly set device for CI unit test (#12399 )	2025-10-01 05:16:54 -04:00
chenyu	689ab9151b	more RANGEIFY tests (#12393 ) would have caught the load alt regression without adding too many tests	2025-10-01 03:43:58 -04:00
b1tg	154d114364	rangeify: fix abstractions2.py (#12386 ) * rangeify: fix abstractions2.py * tests * lint * only abstractions2 * base	2025-10-01 09:58:56 +03:00
b1tg	da52006bde	rangeify: fix test_scatter_reduce (#12380 ) * rangeify: fix test_scatter_reduce * ext_vector_type * set alignment=1 on boolean	2025-09-30 23:26:36 -04:00
chenyu	8def8145e4	ALLOWED_KERNEL_COUNT openpilot 0.9.4 with RANGEIFY (#12381 )	2025-09-30 22:58:59 -04:00
qazal	26247573e1	rangeify multi tests on gpu (#12376 ) * rangeify multi tests on gpu * fix limit_bufs	2025-10-01 04:53:04 +03:00
chenyu	b4a4817c9c	fix rangeigy test_linalg (#12365 )	2025-09-30 06:28:35 -04:00
b1tg	c9ef5d8fe5	rangeify: fix test_tensor_index_overflow (CPU_LLVM=1) (#12362 ) * rangeify: fix test_tensor_index_overflow (CPU_LLVM=1) * add test --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-09-30 05:55:15 -04:00
qazal	6a56d3c859	rangeify: only test correctness in multi (#12339 ) * work * more work * back here * skip tests * work	2025-09-30 09:55:59 +03:00
George Hotz	ab6b0d3a21	enable cleanup_dead_axes (#12351 ) * enable cleanup_dead_axes * don't mess with user contig * correct tag behavior * double reshape isn't correct * block on assign too * skip messing with symbolic * Fix tests * disable RANGEIFY=2 * test w rangeify	2025-09-30 14:09:39 +08:00

1 2 3 4 5 ...

835 Commits