tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 13:58:00 -05:00

Author	SHA1	Message	Date
George Hotz	5dc1bc6070	switch get_kernel -> get_program [pr] (#10817 ) * switch get_kernel -> get_program [pr] * fix tests	2025-06-15 12:26:50 -07:00
wozeparrot	eb739bb96a	hotfix: lower threshold (#10786 )	2025-06-11 19:36:20 -04:00
chenyu	612cdf5146	move fuzz_shape_ops to run with other fuzzer (#10767 ) * move fuzz_shape_ops to run with other fuzzer * don't skip CPU	2025-06-10 17:43:04 -04:00
b1tg	52c49dd4f3	fix onnx ci (#10762 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-06-10 14:28:40 -04:00
George Hotz	f84c320548	better external_benchmark_schedule [pr] (#10722 )	2025-06-09 10:26:11 -07:00
b1tg	24d328e313	onnx parser (#10435 ) * onnx parser * fix compile, lint * onnx.load -> onnx_load * compatible with ModelProto * fix test external_test_onnx_ops.py * fix tests * fix signed int * reduce to 261 lines * fix TypeProto.Optional * debug for _parse_message, add TypeProto.Sequence, cleanup * onnx_load from Tensor * remove BufferedReader * 174 lines and reduce tensor copy * cleanup * use onnx_load in external_model_benchmark.py * fix qcom test * [onnx] parser support external data --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-09 12:44:28 -04:00
George Hotz	81b9c04574	move high level stuff to unit tests [pr] (#10708 ) * move high level stuff to unit tests [pr] * process replay on unit tests * fix pr, less compute * set omp num threads * set 200MB buffer size limit * delete junk * fix tests * faster * move test_indexing to unit * faster	2025-06-08 14:05:56 -07:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
leopf	eb7305e6a4	Tensor.keccak("sha3_256") (#7186 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-06-06 15:24:05 -07:00
wozeparrot	0d86f8d375	fix failed threefry (#10646 )	2025-06-05 17:17:42 -07:00
chenyu	46811d0d3c	minor external_model_benchmark cleanup (#10644 )	2025-06-05 14:13:28 -04:00
chenyu	80ebce421d	remove metal buffer limit in external_model_benchmark [pr] (#10642 ) not needed anymore	2025-06-05 13:00:51 -04:00
wozeparrot	4d1686f767	clean: becnhmark -> benchmark (#10620 )	2025-06-03 19:28:18 -07:00
qazal	910cabb081	add kernel count to grouper process replay differ [pr] (#10611 )	2025-06-03 15:21:27 +03:00
qazal	3cc73a0172	simpler process replay main loop [pr] (#10588 ) * simpler process replay main loop [pr] * use logging * default to 1	2025-06-01 15:03:21 +03:00
qazal	dc882d3d7d	merge process replay and viz captures [pr] (#10581 ) * refactoring * test script * work * more work * diff * repr splits lines correctly * that * add location * add location * also don't need name_override * k.copy * [pr] * name_override 2 * err	2025-06-01 12:30:10 +03:00
George Hotz	b3b43a82c4	remove Tensor.no_grad, it's meaningless now [pr] (#10556 )	2025-05-28 22:20:02 -07:00
Sieds Lykles	ae02a1e232	[bounty] Z3 symbolic fuzzer [pr] (#10514 ) * First version, caught a bug? * Nicely print failure to reproduce * Remove that * Put the assert back * Change fuzzing to use testing_unit so it has z3 * Test key to match * Add rule * Add test * Add test for edge case 0 * Merge patterns * update comment * consistent whitespace * whitespace * add condition * add test * update comment * use Variable * fuzzer using z3_renderer * Cleaned up printing and debugging * working new fuzzer * change some comments and printing * more formatting * fuzz failures in seperate file * fix fstring * more tests * naming * remove added line * remove comment * print number of skipped expressions * use self.assertEqual --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-28 16:28:37 -04:00
geohotstan	fd9f236a82	move test over (#10508 )	2025-05-25 21:51:51 -04:00
George Hotz	0d39bb5de1	rename to get_kernelize_map (#10465 )	2025-05-22 11:44:44 -07:00
qazal	df4cbb69e9	move fuzz_schedule.py to extra [pr] (#10444 )	2025-05-21 10:07:24 +03:00
chenyu	29624af872	skip commavq in external_model_benchmark (#10439 ) precision issue with different onnxruntime version	2025-05-21 01:45:33 -04:00
nimlgen	2895198c36	am: download regs (#10419 ) * am: download regs * x * linter * mypy * after merge * raise * fixed name * fix * xx * remove * missing reg * missing reg * move to online * ops	2025-05-20 18:59:56 +03:00
George Hotz	b06291077c	no amdgpu kernel driver (#10408 ) * no amdgpu kernel driver * don't test hip * lower req	2025-05-18 20:52:39 -07:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
qazal	9e2089dcd4	don't raise Exception in process replay [pr] (#10392 ) * don't raise Exception in process replay [pr] * continue generating diffs unless [pr] is set, exit(1) otherwise * change * works	2025-05-18 11:23:23 +03:00
qazal	e9e5b54e43	grouper cleanups and merge with insert_kernels [pr] (#10349 ) * grouper cleanups and merge with insert_kernels [pr] * remove that	2025-05-16 14:39:56 +03:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
qazal	1770e00c41	only CAPTURE_PROCESS_REPLAY=1 + add filterwarnings back [pr] (#10292 )	2025-05-14 11:58:42 +03:00
qazal	1c97338be5	enable process replay assert for schedule [pr] (#10280 ) * enable process replay assert for schedule * start at unique+1	2025-05-14 11:10:47 +03:00
uuuvn	7bc4864bc4	Make `dev` a property of `Allocator` (#10286 ) * Make `dev` a property of `Allocator` (this is a prereq refactor for #10285) At least `BufferXfer.copy` accesses it assuming it's always present, currently most devices just add this property on their own repeating the same code over and over again. This is also a bit footguny, see `RemoteAllocator` that named this property `device` instead of `dev`, i could obviously just change that in one place but doing it globally seems like a better solution (and it reduces code duplication too). `MallocAllocator` is a bit special, but passing `None` works just fine. * typing * ignore type instead of cast	2025-05-13 17:01:01 -07:00
nimlgen	6f42bf8b54	usbgpu: 10 steps in benchmark to hit cache (#10273 )	2025-05-13 17:06:50 +03:00
geohotstan	1c4ab6b991	ONNX add tests against ORT (#10270 ) * start * clean up * indicate file location too	2025-05-13 04:03:52 -04:00
nimlgen	2145bce3f9	usbgpu: copyin size is 16k (#10240 ) * usbgpu: copyin size is 16k * ush	2025-05-09 22:12:54 +03:00
nimlgen	267ba9b592	usbgpu: better names in copy speed benchmark (#10212 )	2025-05-08 16:12:37 +03:00
nimlgen	ba52fce4b2	usbgpu: benchmark in ci (#10208 ) * usbgpu: benchmark * usbgpu: benchmark	2025-05-08 12:02:04 +03:00
wozeparrot	10437904cd	refactor: ops_cloud -> ops_remote [pr] (#10166 )	2025-05-05 15:59:51 -07:00
George Hotz	a0240d8c2b	lil work on llvm speed (#10157 ) * lil work on llvm speed * llvm failing test * 1e-4 * simpler failing test * once is fine * gpt suggests this syntax change * bump that debug	2025-05-04 16:37:26 -07:00
George Hotz	36ccaa88a6	move merge views [pr] (#10156 ) * move merge views [pr] * move flow to __init__ [pr]	2025-05-04 14:41:47 -07:00
George Hotz	5f3f162606	cache rewrites for renderer [pr] (#10155 ) * add caching to rewrites for renderer [pr] * remove that * update ebs	2025-05-04 13:45:15 -07:00
nimlgen	45bf7c5b81	am: add allocation bench (#10135 ) * init allocation bench * sorryg * betetr	2025-05-02 13:51:07 +03:00
nimlgen	30bd6a619f	usb gpu (#8766 ) * start gpu * progress * fixes * read correct * libusb * libusb works * support asm24 * hmm * one access file * fix extra * start AMBar * works on am * back to usb * patch fw * full fast write into a bar * ugh, minus one gpus, next please * mute libusb for now * usb for asm24 * 63 * hmm * ops * rescan * and gpu shoudl be there * enumerate them? * usbgpu bus 4, 100% reliable (draft) * lil * works * comments * add DEBUG * cleaner * simplest * Revert "simplest" This reverts commit `1d00354c16`. * Revert "cleaner" This reverts commit `c5662de956`. * assert we find gpu * that's simpler * this back * simpler? * correcT * work * nonsense * works with more checks * this works * the 6s in the right place * reliable now * fix after reboot * set config * 1s timeouts * close to fw loading * streams * usbhub works * endpoints * fix * want to test tiny10 * move to tiny 10 * fix gpu * ugly speed * smth * mostly broken, but signals and dmas * do not reset gpu every time * changes to run kernels * ugh, not working * t10 * pg and sc files * some prog * um? * somehow it works * patched for 24 * some tries * minimal * moving * back to working * so sloooooow * move to controller * usb.py rewrite * rework * cleaner 1 * cleaner 2 * cleaner 3 * new abstractions * aft merge * init controller * cleaner 4 * cleaner 5 * patcher + tiny changes * ignore that * cleaner 6 * after rebase * cleaner 7 * bring it back * start linter war * linter 2 * autogen was missing * fix autogen * typing * better? * mypy * extra/legacy rename and cleaner * shuffle * better printing * tiny changes and tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-01 18:03:47 +03:00
qazal	93bf8764f2	do not open devices in lowering (#10101 ) * do not open devices in lowering [pr] * ctx=opts * ctx * fuzz test	2025-04-29 23:18:16 +08:00
George Hotz	427471550a	hotfix: amd tflops to 74 and some external_benchmark_sdxl_softmax stuff	2025-04-29 09:02:27 -04:00
George Hotz	73c2f6602f	test sdxl softmax (#10096 )	2025-04-28 21:55:50 -04:00
Ignacio Sica	bda116d773	fix `use_tensor_cores` propagation (#10048 ) * propagate use_tensor_cores * add use_tensor_core to arg in test and search * bugfix * get TC val from ContextVar in search * revert minor space change * add tc emulation test to ci and benchmark * revert * revert whitespace change * remove test for ptx * add comment and remove llvm test run	2025-04-28 19:30:50 -03:00
qazal	d13c100981	don't sort dims in verify_sink_dims [pr] (#10059 ) * don't sort dims in verify_sink_dims [pr] * 1 can exist with n * put process_replay warn last * assert shape is the same * bring that back	2025-04-26 23:24:30 +08:00
quortus	5cdc96409e	Update outdated renderer.render calls (#10044 )	2025-04-26 07:35:19 -04:00
nimlgen	0fc85a2b0a	hcqfuzz: init (#10049 ) * hcqfuzz: init * fix fuzz * linter * graph * taht test * update readme	2025-04-25 23:19:21 +03:00
Rory Clear	3a189fa561	More yolo processing in tinygrad (#9928 ) * more tg less np * update webgpu html for new compile * resize boxes * remove text * add back note * fix indentation * fix indentation * remove magic num * remove now unused funcs * back to numpy nms * no loop * fix iou suppression * update test * dont suppress other classes * add working scale * fix expected value, rounded up 0.24 was being counted * add postprocess bool for onnx test * fix indents * clean * clean * fix indent * remove print * fix indent * remove unused import * remove hardcoded 0.25 * space * spacing * clean label_predictions func * remove single item lists * space * use postprocess output in test * space * clean * clean * remove redundant threshold * remove redundant threshold * clean * rename var * move loop into func * unhardcode iou_threshold * remove unused values * clean * add note * clean * keep const * move back funcs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 16:21:46 -04:00

1 2 3 4 5 ...

870 Commits