tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-20 04:18:13 -05:00

Author	SHA1	Message	Date
chenyu	1c1f578490	DISABLE_COMPILER_CACHE in sdxl search (#10614 )	2025-06-03 09:22:25 -04:00
chenyu	4ab3391e6f	`set -o pipefail` for mlperf run_and_time (#10577 ) also run the 5.1 script in ci cron job	2025-05-30 16:36:44 -04:00
wozeparrot	5e3c4a8431	fix: comma testsig (#10568 )	2025-05-29 19:00:07 -07:00
George Hotz	ee12e801a3	optional fused optimizers (#10549 ) * enumerate cases of Tensors in the JIT * optional fused optimizers * add fused optimizer test * move that there * ugh	2025-05-28 13:50:30 -07:00
Sieds Lykles	ae02a1e232	[bounty] Z3 symbolic fuzzer [pr] (#10514 ) * First version, caught a bug? * Nicely print failure to reproduce * Remove that * Put the assert back * Change fuzzing to use testing_unit so it has z3 * Test key to match * Add rule * Add test * Add test for edge case 0 * Merge patterns * update comment * consistent whitespace * whitespace * add condition * add test * update comment * use Variable * fuzzer using z3_renderer * Cleaned up printing and debugging * working new fuzzer * change some comments and printing * more formatting * fuzz failures in seperate file * fix fstring * more tests * naming * remove added line * remove comment * print number of skipped expressions * use self.assertEqual --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-28 16:28:37 -04:00
chenyu	23e41f523a	sdxl also run with cached search (#10546 )	2025-05-28 06:51:56 -04:00
chenyu	fffdc4d31c	workflow to run sdxl with search (#10543 )	2025-05-27 17:25:41 -04:00
uuuvn	c29c46853f	Very basic mock sqtt (#10512 ) This mockgpu sqtt emulation will just ignore basically everything and end up with a 0x1000 size trace full of zeroes, but just testing for things like register rename is better than nothing i guess	2025-05-26 14:38:28 -07:00
chenyu	2eeea373af	add BENCHMARK_LOG for mlperf resnet cron (#10516 )	2025-05-25 22:00:29 -04:00
b1tg	a1f64af92d	ci: setup llvm for amdremote (#10507 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-05-25 21:52:27 -04:00
wozeparrot	7c81f9f95e	fix: gate mlperf workflow (#10515 )	2025-05-25 17:06:21 -07:00
George Hotz	6b8eb5fec2	split mlperf to its own red benchmark run (#10492 ) * Add mmapeak implementation for 7900 XTX * Change identation * Use a template instead of multiple assebly files * Fix output formatting * Reduce register file bank conflicts * More accurate measurement for quick instructions * Add support for gfx1201 * RDNA4 wmma requires less VGRPs * RDNA4 does not have s_cmpk instructions * Add v_wmma_i32_16x16x32_iu4 for gfx1201 * Add sparse wmma instructions * split to tinybox red MLPerf Benchmark --------- Co-authored-by: Panagiotis Kourouklidis <panagiotis.kourouklidis@gmail.com>	2025-05-23 17:12:41 -07:00
George Hotz	bf2a0907be	gate the mockdsp behind MOCKDSP=1 [pr] (#10486 )	2025-05-23 11:44:02 -07:00
uuuvn	3ca5680920	Test remote in benchmark (#10304 ) hlb cifar is fast so added it, can add bert too if you think it's ok 6 real gpus to test multigraph and transfers + accuracy validation should probably be added to tinystats too, i don't know how though Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-23 12:12:57 -04:00
chenyu	c5acb4e06e	run mlperf resnet daily (#10482 ) Runs at 08:05 UTC (12:05 AM Pacific Time)	2025-05-23 07:16:20 -04:00
chenyu	116d9e6306	run mlperf resnet on red box (#10413 ) also made push to `update_mlperf` branch trigger	2025-05-19 12:48:36 -04:00
George Hotz	f1fe1f93c1	hotfix: 14000 lines	2025-05-19 09:40:53 -07:00
qazal	90eb3c0e5d	add MobileNetV2 benchmark to comma CI (#10250 ) * add MobileNetV2 to comma CI * symlink imagenet * also the signature * comment that out * need imagenetmock * same train and test set * quantize on CPU=1 * verbose * need __hexagon_divsf3 * 0x858d6c15 * quant cpu + CC=clang-19	2025-05-19 18:22:50 +03:00
George Hotz	b06291077c	no amdgpu kernel driver (#10408 ) * no amdgpu kernel driver * don't test hip * lower req	2025-05-18 20:52:39 -07:00
chenyu	485e80da69	run_and_time for resnet ci (#10405 )	2025-05-18 23:39:57 -04:00
uuuvn	0f825e12f2	Remote fixedvars (#10371 ) * amd mockgpu graph support For testing remote graph stuff (prompted by #10371) in ci * Remote fixedvars Somehow none of existing tests failed when fixedvars were added, looking what to add as an regression test for this --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-18 09:57:13 -07:00
uuuvn	27c12be471	amd mockgpu graph support (#10385 ) For testing remote graph stuff (prompted by #10371) in ci	2025-05-18 09:43:16 -07:00
chenyu	9b4e2a75cd	symlink datasets in mlperf workflow (#10391 )	2025-05-18 03:26:05 -04:00
qazal	0294bfe507	simpler can_pad (#10364 ) * simpler can_pad [pr] * 3 kernels * tests * less kernels	2025-05-18 10:00:07 +03:00
chenyu	efa8dfe7fb	test cron job to run resnet (#10368 )	2025-05-17 08:57:02 -04:00
chenyu	c798f2f427	brew --quiet to suppress already installed warnings (#10346 ) example https://github.com/tinygrad/tinygrad/actions/runs/15057000247	2025-05-15 23:31:18 -04:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
Ignacio Sica	47b3055fe2	set fail-fast behavior (#10336 )	2025-05-15 11:24:45 -07:00
George Hotz	50181ab09f	hotfix: bump to 13500 lines	2025-05-14 18:49:59 -07:00
George Hotz	7a3d4de59a	hotfix: add GRAPH_ONE_KERNEL=1 to UsbGPU openpilot test	2025-05-14 14:50:37 -07:00
George Hotz	f1130ab3d3	openpilot benchmark test (#10290 ) * openpilot benchmark test * that	2025-05-13 22:49:28 -07:00
George Hotz	ec46f658d7	openpilot llvm test [pr] (#10288 )	2025-05-13 16:51:41 -07:00
uuuvn	ddff9857b8	Remote properties is a dataclass (#10283 ) Not strictly required for anything but soon there will be like 4 new properties and having it be a huge json just seems like a bad taste. It also seems right to not have a separate endpoint for this, just `GetProperties` request that returns a repr of this similar to how requests are sent in `BatchRequest`. This will also make a switch to anything other than http much simpler if it will be required for any reason, like just a tcp stream of `BatchRequest`s	2025-05-13 11:56:58 -07:00
uuuvn	ba87eca0f1	Remote multi (basic) (#10269 ) * Basic remote multi support Simplest thing to be able to use remote with multiple gpus, very slow because no transfers (copyin copyout for cross-device copies) * tests	2025-05-13 09:52:47 -07:00
chenyu	ad5cb2717d	FUSE_ARANGE=1 in bert bench (#10263 ) still fails, something multi related maybe Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-13 09:12:19 -04:00
chenyu	0015b3921f	sleep more in CI Remove amdgpu (#10261 ) see if this is less flaky	2025-05-12 08:13:44 -04:00
hooved	7b4f05fd00	Add test for correctness of Infinity in WebGPU (#10201 ) * use function for infinity instead of uniform * test infinity math locally * test infinity math in CI * make pytest available to MacOS (WebGPU) * revert to master except failing webgpu test	2025-05-08 05:20:05 -07:00
nimlgen	7d6ed1b1e9	hotfix: mac ci (#10210 ) * fixed? * cmnt	2025-05-08 14:13:23 +03:00
nimlgen	ba52fce4b2	usbgpu: benchmark in ci (#10208 ) * usbgpu: benchmark * usbgpu: benchmark	2025-05-08 12:02:04 +03:00
uuuvn	dba073e5c0	Less messy broken graph on paravirtualized metal workaround (#10182 ) * Less messy broken graph on paravirtualized metal workaround GitHub CI macOS runners use paravirtualized metal which is broken with graph (some comments say that ICB in particular is broken but in my testing it was fine sometimes, but other times hitting an assert inside metal's code related to resouces, so not sure). > Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458. This can be reproduced locally with any virtualization software (like utm) that can create macOS VMs with apple's own virtualization framework. * unused import	2025-05-06 20:41:02 +03:00
wozeparrot	10437904cd	refactor: ops_cloud -> ops_remote [pr] (#10166 )	2025-05-05 15:59:51 -07:00
George Hotz	b68f036551	default on OSX is llvm 19 (#10159 )	2025-05-04 18:13:50 -07:00
George Hotz	e07d8b147a	hotfix: don't OOM in the osx unit test	2025-05-04 17:53:55 -07:00
George Hotz	a0240d8c2b	lil work on llvm speed (#10157 ) * lil work on llvm speed * llvm failing test * 1e-4 * simpler failing test * once is fine * gpt suggests this syntax change * bump that debug	2025-05-04 16:37:26 -07:00
George Hotz	fe0724eebf	prebuild all rewrites [pr] (#10154 ) * prebuild all rewrites [pr] * fix that * tests pass with linearizer	2025-05-04 13:01:18 -07:00
qazal	230a369708	remove some IGNORE_OOB [pr] (#10142 ) * remove some IGNORE_OOB * remove fuzz_schedule stuff * test with global * add for amd ci	2025-05-03 01:16:14 +03:00
nimlgen	16e5376ae8	line limit 12800 for usb (#10130 )	2025-05-01 16:57:44 +03:00
George Hotz	ef011ff5f9	flip Ops.COPY order [pr] (#10122 ) * flip Ops.COPY order [pr] * fix copy and support multi device copy in _device	2025-05-01 00:26:24 -04:00
Ignacio Sica	bf5fb97498	fix `AMD_LLVM` bf16 tc for `gfx1100` (#10102 ) * fix amd_llvm bf16 tc * cleanup pattern	2025-04-30 20:06:38 -03:00
chenyu	4a04098389	fix llama3 with nf4 quantize (#10107 ) also int8 outputs is wrong	2025-04-29 15:14:36 -04:00

... 3 4 5 6 7 ...

1021 Commits