tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
b1tg	1d71436e6a	use libllvm19 in ci (#9494 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-19 11:53:32 +08:00
Ignacio Sica	5c56cac0a0	MI300 mfma support (#9417 ) * add f16/f32 mfma support for MI300 - add 16x16 mfma shape support for f16 with f32 acc - add ops_python mfma emulation - add arch to AMDRenderer * minor cleanup * minor cleanup * add mfma emulation task to ci * add back todo * hotfix: comment * add tc=3 job to ci	2025-03-18 14:33:30 -03:00
George Hotz	cb7a7f69c7	quantization preprocessor from DSP, should be universal (#9437 ) * quantization preprocessor from DSP, should be universal * touchups * fix tests	2025-03-15 07:49:37 +08:00
qazal	4df2b6347d	hotfix: bump tinybox red training CI timeout to 30 minutes (#9426 )	2025-03-13 09:31:44 +01:00
George Hotz	931436204c	hotfix: 12000 lines, for AMD stuff	2025-03-13 10:48:14 +08:00
Priyank Patel	4714c4f9ad	torch backend multigpu - add devices and tests (#9414 ) * add multi-device support and tests * simplify	2025-03-12 11:33:11 +08:00
uuuvn	e85001b6ee	SQTT profiling (#9278 ) * sqtt * docs * multi-device * ProfileSQTTEvent * exec update * 256mb default * don't let people hang their gpus * bitfields from autogen * asic info from mesa * more bitfields from autogen * SQTT_ITRACE_SE_MASK --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-11 13:19:56 +08:00
Priyank Patel	796c3bbb23	torch: support in-place operations on views (#9371 ) * add torch inplace tests * first set of tests passing * wrap all inplace funcs, add more tests * fixes and wrap more functions * fix all uint8 tests to avoid slow tests * fix the one test * another test, another fix * and one more, works for ddp now * something on contiguous, cleanup --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-03-10 23:29:00 +08:00
hooved	136cf7b8b1	hotfix: load >2 GiB from disk on macOS (#9361 ) * enable loading >2 GiB buffer from disk on macOS * handle None case raised by mypy * add test * revert fix to repro bug in CI * tell CI to run a unit test for macOS * reapply fix	2025-03-07 14:51:58 +08:00
uuuvn	c6d76770e4	Increase timeout on macos tests (#9362 ) Process replay timeouts: https://github.com/tinygrad/tinygrad/actions/runs/13682213444/job/38257133289?pr=9360	2025-03-05 13:04:16 -05:00
nimlgen	cd9d74f7ea	use am in training benchmarks (#9357 ) * am in training benchmarks * fix * not needed anymore	2025-03-05 19:13:47 +03:00
George Hotz	7576a1da23	hotfix: line count to 11500, lines for SQTT and AMDLLVM	2025-03-05 09:21:18 +08:00
chenyu	e301f21f63	CI ubuntu-20.04 -> ubuntu-22.04 (#9345 ) 20.04 is removed now	2025-03-04 11:39:12 -05:00
chenyu	019417743c	ruff torch backend (#9341 )	2025-03-03 15:15:23 -05:00
chenyu	40619a4bbc	separate workflow for TINY_BACKEND=1 mnist (#9339 ) * separate workflow for TINY_BACKEND=1 mnist * rebalance	2025-03-03 13:05:24 -05:00
Eitan Turok	d657d5f754	[Bounty] Vectorize Transcendental (#9058 ) * init * cast everythig right * more casting * install pillow in test * quick tests * simplify * quick tests * delete test * tests * fix import error * add vec to ldexp3k * vec for bitcast * some helper tests * high level tests * clean tests * change tolerance so cuda passes * ruff passes * remove tests for transcendental helpers * ruff passes * make exponent in power vectorized * fix pow test * add newline * add vec dtype to ilogb2k * comment + clean up * ruff --------- Co-authored-by: chenyu <chenyu@fastmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-28 15:47:25 +08:00
George Hotz	387ea41e99	increase speed of torch mnist: use gradient api (#9282 )	2025-02-27 11:57:41 +08:00
Priyank Patel	a0764f0dc0	(bounty) Make mnist training run with torch backend (#9233 ) * yml changes * torch backend remove meta decomps and add test * torch backend bump timeout for tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-27 11:32:25 +08:00
George Hotz	67ba073c55	hotfix: test accuracy in beautiful_mnist_torch	2025-02-27 11:18:59 +08:00
George Hotz	2158dc4849	full fix for as_strided in torch backend (#9257 ) * fixes from chargpt for torch backend * shrink support * add stride support * comment cleanup * a few more * work * import the stream hack * llvm multi auto	2025-02-26 22:34:05 +08:00
George Hotz	7780393460	rig up torch's testing framework [pr] (#9254 ) * rig up torch's testing framework [pr] * support more movement ops * dec on expand * fix tests * work * fix tests * a few more * decomps + opt hook * installed pytest	2025-02-26 18:46:22 +08:00
George Hotz	b603af373e	run some tests from torch [pr] (#9252 ) * run some tests from torch [pr] * yml * wrap_out * clean up for the new people * a lil more	2025-02-26 15:42:22 +08:00
chenyu	731d14e718	hotfix bump testmetal2 timeout-minutes to 20 (#9235 ) setup is taking too long	2025-02-24 20:23:56 -05:00
qazal	cbfe95d306	bring cast before view back (#9230 ) * bring cast before view back * tune it to only trigger on expands --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 01:50:39 +02:00
geohotstan	f0b24d230c	add test_onnx_ops.py (#8569 ) * boom * fix webgpu * use exact variable names in test so that AI can read easier * add tag for specific test name like test a specific dtype * fix ruff * astype everything * dtype in array creation * just arange * is 67% considered fixed? * move test up * small cleanups * share function * add qgemm as well * add qgemm too * make sure qgemm comes out as int * take out qgemm for now * fixed test * add correct qgemm * addressing feedback here too, early naive fix for now * simplify bias and c to be minimalistic enough to test correctness * refactored qlinearops * maybe these asserts aren't the best.. * fix test * updated tests to cover new ops * try to add to CI * move test_onnx_ops into testextra/ * more attention tests * qlinear_add atol=1 * attention still not fullllllly correct * it is what it is --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 16:15:22 -05:00
George Hotz	fd731e740a	hotfix: add note on backend2.py	2025-02-24 11:23:03 +08:00
chenyu	e0adb1fc76	really run test_ops with TINY_BACKEND in ci (#9206 ) was failing with `line 1: pytest: command not found`	2025-02-22 15:51:24 -05:00
George Hotz	97bc723538	torch backend works for ResNet-18 (#9200 ) * torch backend progress, a few more functions * resnet works * pillow * tv	2025-02-22 22:16:23 +08:00
George Hotz	f92820d30d	torch backend tests (#9198 ) * torch backend tests * pythonpath * install ninja	2025-02-22 16:01:49 +08:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
chenyu	3e22747799	run unit test on windows ci (#9187 ) * factor out testing_minimal in setup.py [pr] * testing_unit + windows	2025-02-20 14:40:41 -05:00
qazal	574a905291	Fix running VIZ=1 after package installation + test (#9183 ) * test running viz from pip install * add pkg * do 10 connection attempts * include assets in package_data * quiet curl * better print	2025-02-20 15:02:00 +01:00
Ahmed Harmouche	0f94b98646	Force WebGPU backend type [pr] (#9164 ) * Force webgpu backend type * Mypy fix * Rename to WEBGPU_BACKEND * Add it to env_vars docs * Remove link	2025-02-19 17:19:39 +08:00
George Hotz	af9d8d39d2	dsp matchers + bump line count to 11300 (#9130 )	2025-02-17 17:31:54 +08:00
Ahmed Harmouche	59fe45f947	Solve get_grouped_dims does not split issue (#9085 ) * Solve dims too large errors on webgpu * Simplify divisor find * Test square root divisor * Fix lint * Refactor into group_dims and split_dims * Refactor * Fix lint * Add back max check in _group_dims * Prefer grouping over split --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-16 19:57:29 -05:00
George Hotz	7e09057afa	fixup clang devectorize (#9099 ) * fixup clang devectorize * __builtin_convertvector is some casts * dsp fixups	2025-02-15 09:29:47 +08:00
JaSpa99	d2ff55e9c6	OSX GPUOcelot (#8209 ) * add patches * add osx test in ci * macos specific uvm, gpfifo mask * only do that for now * Revert "add patches" This reverts commit `80d3112a57`. * use fork for now * workflow only one worker * merge osxtests with tests * Revert "merge osxtests with tests" This reverts commit `3461c8f46c`. * macos pagesize 16384 --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-13 12:24:29 +08:00
rmtew	b3eab03055	Three things to get Windows CI working correctly: (#9047 ) - Ensure that the set backend environment variable is persisted to the next step via $GITHUB_ENV - It doesn't actually persist for Windows unless shell is explicitly set to bash. - Add the assertion to ensure the selected backend is actually used.	2025-02-12 14:41:00 -05:00
Ahmed Harmouche	916d5e7f08	WebGPU f16 support (f16 bounty part 2) (#8653 ) * WebGPU f16 support * Don't enable f16 yet * dtype tests passing after bitcast fix * Maybe all WebGPU green? * Require shader-f16 in examples * Minor wgsl touchup * 1 line shorter * Simpler * Add transcendetal support * log2 nan location mismatch on Vulkan * Nan skips	2025-02-12 19:46:53 +08:00
Ignacio Sica	aaed315fee	add AMX support to LLVM (#8957 ) * init amx support for llvm * revert elf changes * fix attributes for AMX asm calls * add comments * add llvm amx job to benchmarks * cleanup * cleanup * hotfix: improve comments * comment for aux buffers * hotfix: * move amx_tc to ClangRenderer * merge master * refactor * add docs * add corsix docs reference --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-12 16:01:18 +08:00
George Hotz	45aae8a6bc	hotfix: add External Benchmark Schedule to CI	2025-02-11 22:06:17 +08:00
chenyu	6c39aa4a6b	adjust cuda ci test targets (#9014 )	2025-02-10 15:29:59 -05:00
chenyu	f9898f7554	update gpuocelot commit (#9011 )	2025-02-10 12:18:44 -05:00
qazal	b17ec42b56	remove const_arg (#9002 ) * remove const_arg * use -m pytest * remove test_const_arg test, variable arg on CONST does not exist. * use base in test_const_dtype	2025-02-10 12:45:11 +01:00
George Hotz	0568720a68	delete revectorize (#9000 ) * delete revectorize * test vectorized LLVM/CLANG * idk about that * was that the segfault?	2025-02-10 18:32:35 +08:00
George Hotz	2983285315	use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993 ) * use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] * add quantize test to dsp * fix tests * older onnx * debug, let's see what's happening	2025-02-10 11:07:35 +08:00
nimlgen	52a69dd5e9	Revert "use am in training benchmarks (#8965 )" (#8981 ) This reverts commit `107e616857`.	2025-02-09 15:43:45 +03:00
George Hotz	208097d488	try reducing testing deps [pr] (#8976 ) * reduce testing deps * break out test models * add PR to models, add models to metal * okay, not that * mac cleanup * mac typo * other typo	2025-02-09 15:22:32 +08:00
nimlgen	107e616857	use am in training benchmarks (#8965 ) * am in training benchmarks * fix * not needed anymore	2025-02-08 20:20:47 +03:00
qazal	e7182bbb2c	fix "fatal bad object" log in process replay [pr] (#8966 )	2025-02-08 11:57:38 +01:00

1 2 3 4 5 ...

723 Commits