tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-11 16:08:10 -05:00

Author	SHA1	Message	Date
Gaétan Lepage	6fd7ce3832	migrate to pyproject.toml (#13189 ) * migrate to pyproject.toml * move mypy config to pyproject.toml	2025-11-11 09:09:27 -08:00
chenyu	60e55d9a2d	line count 18500 (#13191 )	2025-11-10 13:52:13 -05:00
chenyu	6c48c87e51	improved ASSERT_MIN_STEP_TIME (#13182 ) * improved ASSERT_MIN_STEP_TIME getting close, current time +1ms then round up * relax	2025-11-09 16:41:12 -05:00
chenyu	e1d46de8f8	update GROUPTOP heuristic more (#13178 ) reverts #13176	2025-11-09 02:31:12 -05:00
chenyu	8e868dced8	only GROUPTOP one reduce kernel (#13176 ) * only GROUPTOP one reduce kernel * ALLOWED_GATED_READ_IMAGE=148	2025-11-08 22:38:44 -05:00
George Hotz	42b34cf83d	bottom up linearizer (#13133 ) * bottom up linearizer * late stores * more complete * remove broken heuristic * upcast size * opt * more conservative * it needs that * disable opencl half on QCOM * fix * make that a real test * cpu test okay * ptx skip * end is after the range	2025-11-06 15:30:32 -08:00
chenyu	54141e9cb9	DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096 )	2025-11-04 11:28:18 -05:00
chenyu	ddf01fdb15	revert mlperf.yml setting (#13080 )	2025-11-03 15:24:13 -05:00
chenyu	a317d6e625	extra/amdpci/setup_python_cap.sh (#13070 )	2025-11-02 19:19:36 -05:00
chenyu	ad501ce50a	mlperf cron install tqdm (#13069 ) one more...	2025-11-02 18:09:27 -05:00
chenyu	2c8d619147	mlperf cron install influxdb3-python (#13068 )	2025-11-02 17:55:40 -05:00
chenyu	4c22f089fc	mlperf cron install tensorflow try 2 (#13067 )	2025-11-02 17:11:01 -05:00
chenyu	c58cf91850	mlperf cron install tensorflow (#13066 )	2025-11-02 16:48:05 -05:00
chenyu	74db65cf72	update mlperf bert LOGMLPERF (#13065 )	2025-11-02 15:26:37 -05:00
chenyu	b18293de96	train bert in mlperf cron (#13064 ) more relevant now	2025-11-02 15:04:02 -05:00
George Hotz	036ee9f84c	Self type + mixins (#13056 ) * use Self type * mixin * fix later	2025-11-02 13:30:01 +08:00
George Hotz	65a0a31475	AMD mi350x matmul from stream (#13040 ) * works * working mfma * 120 TFLOPS * regs * 192 TFLOPS * try pipelining * something * notes * contract * linter to 3.11 * that was a bug	2025-11-01 17:55:19 +08:00
nimlgen	f6786c1bfd	autogen: py314 (#13038 ) * autogen: py314 * bump py?	2025-11-01 04:02:19 +08:00
George Hotz	5eb87ab131	hotfix: bump cifar time to 350	2025-10-30 17:29:20 +08:00
nimlgen	4b001ec723	amd: pmc in mockgpu (#13000 ) * amd: pmc in mockgpu * fix * do not open in ci	2025-10-30 01:52:02 +08:00
b1tg	bb307b9e81	fix fp8 vectorization (#12977 ) * fix fp8 vectorization * add fp8 tc to benchmark	2025-10-28 13:55:30 -04:00
George Hotz	5e01cc299b	zero len ranges fail (#12974 ) * zero len ranges fail * fix Python backend * fix llvm * fix ptx * yolo fix nir * this works... * always store... * always store... * Revert "always store..." This reverts commit `0816cf344d`.	2025-10-28 22:49:55 +08:00
George Hotz	e936aa7974	cleanups from if range branch (#12973 )	2025-10-28 20:58:47 +08:00
George Hotz	2832954bcb	test with IGNORE_OOB=0 (#12960 )	2025-10-28 10:32:19 +08:00
George Hotz	7784cec48e	pytest-split on spec (#12959 )	2025-10-28 10:09:01 +08:00
b1tg	45e2f916a3	add quantize fp8 in llama3 (#12893 ) * add quantize fp8 in llama3 * don't truncate fp8 alu result * cast to float32 before matmul * --model weights/LLaMA-3/8B-SF-DPO/ --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-10-27 10:22:57 -04:00
George Hotz	25c2da1579	check SPEC=2 in CI (#12945 ) * check SPEC=2 in CI * split SPEC=2 * fast enough	2025-10-27 21:53:57 +08:00
George Hotz	8a941d95a4	SPEC=2 is full spec, SPEC=1 is default (#12910 ) * SPEC=1 passes all tests * just use SPEC, not __debug__	2025-10-25 11:10:43 +08:00
chenyu	4b7329001d	clean up test_avg_pool3d (#12905 )	2025-10-24 14:31:36 -04:00
chenyu	154b4f9f40	test FUSE_OPTIM=1 test/test_optim.py (#12895 )	2025-10-23 15:54:27 -04:00
wozeparrot	6e00dec95d	feat: pin openpilot 0.10.1 models (#12878 )	2025-10-22 14:57:54 -07:00
chenyu	f0831c8c30	add 0.10.0 to comma benchmark (#12875 ) * add 0.10.0 to comma benchmark disabled the 0.10.1 ones which are pinned to master. it does not work because benchmark uses the cached old version * that's pinned	2025-10-22 15:18:21 -04:00
George Hotz	726988fa4b	late ifs try 2 (#12865 ) * late ifs try 2 * fix image * fix that test * panic * ptx fixups * preserve toposort * those pass locally * Revert "those pass locally" This reverts commit `063409f828`. * no ls * make that explicit	2025-10-22 18:49:27 +08:00
chenyu	6d86e962c7	update ASSERT_MIN_STEP_TIME (#12857 ) 0.10.1 driving_policy is good now, still need driving_vision and dmonitoring to be fast	2025-10-21 22:46:07 -04:00
b1tg	60d7e232f2	cuda fp8 (#12782 ) * cuda fp8 * tensor core * tc test * clean * clean pm	2025-10-21 15:05:25 -04:00
Harald Schäfer	587ccc0e5c	compile3: make selftests opt-in (#12851 )	2025-10-21 11:32:27 -07:00
wozeparrot	62e7b8b870	feat: just use compile3 (#12849 )	2025-10-21 07:56:50 -07:00
Christopher Milan	68c045bf0a	NIR: Check for brew packages tinymesa and tinymesa_cpu (#12739 ) * brew install tinymesa_cpu * brew --prefix tinygrad_cpu too * fix brew paths * check both brew paths * better errors * handle failure	2025-10-21 09:38:43 +08:00
wozeparrot	990e8b97ee	feat: log openpilot 0.10.1 times (#12816 )	2025-10-20 18:30:34 -07:00
chenyu	350a4754a9	Update openpilot models (#12780 ) * Update openpilot models * Update slower model * fix that --------- Co-authored-by: Bruce Wayne <harald.the.engineer@gmail.com>	2025-10-18 20:32:35 -04:00
Harald Schäfer	addc54b96c	Simplify openpilot compile3.py (#12748 ) * Simpler compile3 * tests * remove default args * onnx file is still fp16 * self-test FP16 too * allow test disable * absurd tolerance * Just do latest * Try simplest * use later models * kernel count not relevant if speed is good * dead improts * Revert "dead improts" This reverts commit `f68c2cd15d`. * Revert "kernel count not relevant if speed is good" This reverts commit `0955ca4ee0`. * add back kernal count check on latest model	2025-10-18 10:12:22 -04:00
chenyu	285534ce64	delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES (#12744 ) does nothing now	2025-10-16 14:11:33 -04:00
chenyu	53478c741d	relax ASSERT_MIN_STEP_TIME for space lab policy (#12742 )	2025-10-16 11:40:36 -04:00
George Hotz	1d1e1d9d88	delete the ShapeTracker (#12720 ) * delete the ShapeTracker * fix tests * fix more * fix gc test	2025-10-16 15:36:22 +08:00
George Hotz	592e86f6f5	remove UOp.st (#12716 ) * remove UOp.st * fix tests * torch backend disable	2025-10-16 14:44:09 +08:00
chenyu	b8cf35fb77	print macOS version in CI (#12705 )	2025-10-15 15:05:33 -04:00
George Hotz	85a907605c	hotfix: only 20 steps of beautiful_mnist_torch, some CI machines are slow	2025-10-15 22:29:34 +08:00
George Hotz	612e3d6143	replace mop arg with vectorized index (#12695 ) * replace mop arg with vectorized index * tests passing * better viz * no compile4	2025-10-15 20:50:06 +08:00
chenyu	2e50ed0767	increase timeout of resnet cron (#12693 ) does not finish in 6 hours now	2025-10-15 06:08:58 -04:00
Christopher Milan	0aabc1e938	Mesa NIR backend (NAK/LLVMpipe) (#12089 ) * nak works * TestOps::test_add works * testop has no crashes * fix bool casts * fix typo * add disassemble * RANGE and locals/regs * simplify NAKCompiler * disass cleanup * cleanup nir codegen * almost all tests passing * cleanup notes in extra/ * old notes * only import nak if NIR=1 * fix new SPECIAL syntax * fix local/shared memory * more tests passing * add DEFINE_VAR support * llvmpipe kinda works * diskcache * some mypy stuff * lvp passing test_ops.py * fix imports * actually fix imports * remove 'stdout' * fix llvm import * fix mypy issues * nicer errors * simpler test_dtype skips * test lvp in CI * fix github action syntax * fix more actions typos * switch to mesa 25.1.0 * diskcache_put * better generation for lvp nir_options * b64encode shader blobs * Revert diskcache changes This reverts commits `930fa3de8a` and `8428c694b3`. * general cleanup * better error messages * fix llvm import * fix windows tests * link with libm and libgcc_s * fix some errors * dont check for 'float4' * NIR uses pointer arithmetic * use tinymesa * bump tinymesa * bump tinymesa again * update lvp nir_options * print nir shader with DEBUG * simplify LVPCompiler * more tests * "gated" STORE * NAK is cacheable * more tests * all tests pass locally for NAK * test autogen in CI * autogen deps * more deps * fix uop_gc * fix macos * mypy * save 2 lines * save two more lines * save 1 line * save 4 lines * save more lines * Revert "save more lines" This reverts commit `dd3a720c5a`. * save more lines * fix LVP on windows * refactor * reorganize some code * refactor lib_gpu * move LVP check * out of order loads * remove support.mesa * bump tinymesa version * simplify LVP jit * macos * macos ci * shell: bash * testing * more testing * compute brew prefix * stupid typo * actually fix * lib * stdout on macos * inline gallivm_compile_module * Revert "inline gallivm_compile_module" This reverts commit `b65983b151`. * elf macos * semicolon * inherit from CPULLVMCompiler * ruff * disas test * fix libm linking * default is fine actually * arm works * add elf loader link test * fix NAK beam * pylint is too smart by half --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-10-15 17:38:33 +08:00

1 2 3 4 5 ...

1143 Commits