tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-20 20:38:03 -05:00

Author	SHA1	Message	Date
chenyu	efa8dfe7fb	test cron job to run resnet (#10368 )	2025-05-17 08:57:02 -04:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
Ignacio Sica	47b3055fe2	set fail-fast behavior (#10336 )	2025-05-15 11:24:45 -07:00
George Hotz	50181ab09f	hotfix: bump to 13500 lines	2025-05-14 18:49:59 -07:00
George Hotz	7a3d4de59a	hotfix: add GRAPH_ONE_KERNEL=1 to UsbGPU openpilot test	2025-05-14 14:50:37 -07:00
George Hotz	f1130ab3d3	openpilot benchmark test (#10290 ) * openpilot benchmark test * that	2025-05-13 22:49:28 -07:00
George Hotz	ec46f658d7	openpilot llvm test [pr] (#10288 )	2025-05-13 16:51:41 -07:00
uuuvn	ddff9857b8	Remote properties is a dataclass (#10283 ) Not strictly required for anything but soon there will be like 4 new properties and having it be a huge json just seems like a bad taste. It also seems right to not have a separate endpoint for this, just `GetProperties` request that returns a repr of this similar to how requests are sent in `BatchRequest`. This will also make a switch to anything other than http much simpler if it will be required for any reason, like just a tcp stream of `BatchRequest`s	2025-05-13 11:56:58 -07:00
uuuvn	ba87eca0f1	Remote multi (basic) (#10269 ) * Basic remote multi support Simplest thing to be able to use remote with multiple gpus, very slow because no transfers (copyin copyout for cross-device copies) * tests	2025-05-13 09:52:47 -07:00
chenyu	ad5cb2717d	FUSE_ARANGE=1 in bert bench (#10263 ) still fails, something multi related maybe Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-13 09:12:19 -04:00
chenyu	0015b3921f	sleep more in CI Remove amdgpu (#10261 ) see if this is less flaky	2025-05-12 08:13:44 -04:00
hooved	7b4f05fd00	Add test for correctness of Infinity in WebGPU (#10201 ) * use function for infinity instead of uniform * test infinity math locally * test infinity math in CI * make pytest available to MacOS (WebGPU) * revert to master except failing webgpu test	2025-05-08 05:20:05 -07:00
nimlgen	7d6ed1b1e9	hotfix: mac ci (#10210 ) * fixed? * cmnt	2025-05-08 14:13:23 +03:00
nimlgen	ba52fce4b2	usbgpu: benchmark in ci (#10208 ) * usbgpu: benchmark * usbgpu: benchmark	2025-05-08 12:02:04 +03:00
uuuvn	dba073e5c0	Less messy broken graph on paravirtualized metal workaround (#10182 ) * Less messy broken graph on paravirtualized metal workaround GitHub CI macOS runners use paravirtualized metal which is broken with graph (some comments say that ICB in particular is broken but in my testing it was fine sometimes, but other times hitting an assert inside metal's code related to resouces, so not sure). > Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458. This can be reproduced locally with any virtualization software (like utm) that can create macOS VMs with apple's own virtualization framework. * unused import	2025-05-06 20:41:02 +03:00
wozeparrot	10437904cd	refactor: ops_cloud -> ops_remote [pr] (#10166 )	2025-05-05 15:59:51 -07:00
George Hotz	e07d8b147a	hotfix: don't OOM in the osx unit test	2025-05-04 17:53:55 -07:00
George Hotz	a0240d8c2b	lil work on llvm speed (#10157 ) * lil work on llvm speed * llvm failing test * 1e-4 * simpler failing test * once is fine * gpt suggests this syntax change * bump that debug	2025-05-04 16:37:26 -07:00
George Hotz	fe0724eebf	prebuild all rewrites [pr] (#10154 ) * prebuild all rewrites [pr] * fix that * tests pass with linearizer	2025-05-04 13:01:18 -07:00
qazal	230a369708	remove some IGNORE_OOB [pr] (#10142 ) * remove some IGNORE_OOB * remove fuzz_schedule stuff * test with global * add for amd ci	2025-05-03 01:16:14 +03:00
nimlgen	16e5376ae8	line limit 12800 for usb (#10130 )	2025-05-01 16:57:44 +03:00
George Hotz	ef011ff5f9	flip Ops.COPY order [pr] (#10122 ) * flip Ops.COPY order [pr] * fix copy and support multi device copy in _device	2025-05-01 00:26:24 -04:00
Ignacio Sica	bf5fb97498	fix `AMD_LLVM` bf16 tc for `gfx1100` (#10102 ) * fix amd_llvm bf16 tc * cleanup pattern	2025-04-30 20:06:38 -03:00
chenyu	4a04098389	fix llama3 with nf4 quantize (#10107 ) also int8 outputs is wrong	2025-04-29 15:14:36 -04:00
Ignacio Sica	9d5677c12c	fix `ptx` linearizer bug 2 [pr] (#9967 ) * check for local buffer * hotfix * add test_tensor_cores_emulation run for ptx	2025-04-29 14:30:07 -03:00
Ignacio Sica	58cf8cd493	add support for "shared_mem" for `LLVM` (#10093 ) * init llvm shared * add test_tensor_cores_emulation run for llvm	2025-04-29 08:56:36 -04:00
Ignacio Sica	bda116d773	fix `use_tensor_cores` propagation (#10048 ) * propagate use_tensor_cores * add use_tensor_core to arg in test and search * bugfix * get TC val from ContextVar in search * revert minor space change * add tc emulation test to ci and benchmark * revert * revert whitespace change * remove test for ptx * add comment and remove llvm test run	2025-04-28 19:30:50 -03:00
chenyu	e996584685	olmoe in mac benchmark (#10077 )	2025-04-27 21:07:02 -04:00
George Hotz	b6d2effaf5	assign is contiguous (#10066 ) * assign is contiguous * disable process replay for SDXL	2025-04-27 08:40:33 -04:00
George Hotz	ea5dddc537	reduce collapse generic (#10045 ) * reduce collapse generic * new arange folder * new range folding * correct with sym * all tests pass * indexing ops passes * failing tests * fix tests, remove unused * revert that * torch indexing is fast * skip on webgpu * touchups * comments	2025-04-26 09:13:24 -04:00
chenyu	74c6cf8be3	lint mlperf model_train (#10038 )	2025-04-24 16:19:44 -04:00
Ignacio Sica	51ca19d061	set `test_tensor_cores_padded_amd` to expectedFailure (#10036 ) * init * add expected failure to correctly track progres * hotfix * skip for amd_llvm as well * add skip * add pr number * move comment to amd test * change reason	2025-04-24 17:11:40 -03:00
b1tg	914d89fa0b	fix tensor cores for gfx1201 (#9838 ) * fix tensor cores for gfx1201 * fix typo * fix python wmma * AMDLLVMRenderer with arch + AMDLLVM tensor_cores * fix ci * clean up * more tensor cores for RDNA4 * fix half/half, bfloat16/float, bfloat16/bfloat16 for amd_llvm --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 14:57:41 -04:00
uuuvn	779aa1e2e9	Enable image tests on cloud if clouddev supports image (#9903 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 14:30:12 -04:00
uuuvn	29a12b19ea	Add macos CLOUD tests (#10033 ) A lot more work is required to enable all of them and move into osxtests matrix, for now i created a separate runner for them (copied from WebGPU) Will add test/test_graph.py to those tests in #9876	2025-04-24 14:14:13 -04:00
uuuvn	754d789f51	Fix and enable jit tests on CLOUD (#10031 )	2025-04-24 18:39:31 +03:00
George Hotz	4e2ccfddc6	ci refactor to split AMD/NVIDIA [pr] (#10024 ) * ci refactor to split AMD [pr] * split * split amd tests * explicit 0	2025-04-24 08:59:54 -04:00
Sieds Lykles	e75be6eafc	[bounty] [pr] index validation with z3 (#9981 ) * index validation with z3 * Change comment * toposort -> toposort() --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 08:06:08 -04:00
Ignacio Sica	023b1c28a2	`test_tensor_cores_padded` refactor (#9724 ) * set pad t 3 for amd padded tc test * change pad for amd regardless CI * test tc padded uops and correctness separately * add test_tensor_cores_padded_uops test to ci * remove redundant chack for amd device * cleanup	2025-04-18 17:05:54 -03:00
George Hotz	44e4934167	fast pattern matcher [pr] (#9737 ) * FastPatternMatcher * works without that * fix test pickle * strict len * compile match function * dynamic compile * fast * faster * compile * track * a lot faster * clean up * dup or * faster and simpler * fast match doesn't support store * plane * minor refactor * real speed * don't imply return None * upat * fix test * heard you wanted more speed * no generator * split cf * early fixup * fxn fixup * reconstruct_function * Revert "reconstruct_function" This reverts commit `37dac010ab`. * simpler stuff * too big * upat compile error * cleanups * don't cache that * cleanups * 10 -> 15	2025-04-14 15:24:41 +01:00
George Hotz	355739fc94	switch to universal match [pr] (#9879 ) * switch to universal match [pr] * 10 -> 15	2025-04-14 09:15:37 +01:00
chenyu	6896197978	relax ATOL for TC half tests more (#9847 )	2025-04-11 03:20:22 -04:00
George Hotz	f666dd14eb	fix get reduce contraction with test (#9834 )	2025-04-10 22:24:21 +08:00
chenyu	566e389585	more relaxed ATOL for HALF=1 simple_matmul test (#9823 ) it's a function of N so only updated in the test command	2025-04-10 00:46:16 -04:00
chenyu	06a928b341	higher ATOL for half input TC test (#9821 ) flaky	2025-04-09 23:57:25 -04:00
uuuvn	3ee317ffed	Fix kfd autogen and verify it in ci (#9818 ) Had to autogen newer uapi headers for #9746 (dmabuf export ioctl missing), submitting just the fix without updating to newer headers as they are only needed for infiniband stuff	2025-04-10 09:53:42 +08:00
chenyu	c5db5b83b9	add SHOULD_USE_TC=1 check to simple_matmul (#9802 ) * add SHOULD_USE_TC=1 check to simple_matmul also zero centered the random input and update atol for tf32 * ATOL=2e-2 for HALF	2025-04-09 02:24:42 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
George Hotz	14928fecff	Revert "fix TF32 tensor core dropped in tc_sm89 (#9798 )" This reverts commit `7c9a96824f`.	2025-04-09 12:27:39 +08:00
chenyu	7c9a96824f	fix TF32 tensor core dropped in tc_sm89 (#9798 ) also add `SHOULD_USE_TC=1` to verify TC is applied in simple_matmul	2025-04-08 23:20:50 -04:00

... 5 6 7 8 9 ...

1088 Commits