tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 05:48:08 -05:00

Author	SHA1	Message	Date
chenyu	0d8a0d7a96	update test_multi_const_folding_tensor to include pow (#11635 ) pow folds now	2025-08-12 13:35:37 -04:00
Sieds Lykles	4d6e407eb0	Extend fast_idiv to negative ints (#11632 ) * fast idiv for signed ints * Add rule and test * fix tests * redo fuzz_fast_idiv to do negative ints as well * adjust comments * remove unused imports	2025-08-12 19:34:49 +02:00
qazal	17adbe86d8	hotfix: do not default to capturing args in track_rewrites (#11634 )	2025-08-12 20:01:24 +03:00
geohotstan	ad9dec25b3	combine onnx parser and onnx (#11485 ) * start * more * fix onnx_runner test * pass * patch for disk and add domains from huggingface * simpler docs * revert domain changes * rerun ci * revert onnx ops test change * add fix from strenum stuff * correct way * revert correct way to leave the fix for another PR * test segfault * Revert "test segfault" This reverts commit `4e1aaf41e7`. * remove some unnecessary documentation * test segfault again * Revert "test segfault again" This reverts commit `56fc5f03e7`. * try gemini suggested patch for sys._getframe * keep trying with gemini * revert not working gemini suggestions and try faulthandler * remove pythonfaulthandler * trigger CI a few times * minimize diff --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-08-12 12:56:39 -04:00
Sieds Lykles	4c3982c44e	Take sign out of mod (#11631 ) * Add rule and test * fix tests	2025-08-12 18:44:36 +02:00
qazal	e28605e324	rename profile point event fields [pr] (#11633 )	2025-08-12 19:11:21 +03:00
nimlgen	8a7be0a747	metal: workaround for transfers sync issue (#11622 ) * metal: workaround for transfers sync issue * metal tracsfer sync is broken * hm * rm it? * keep it	2025-08-12 16:16:34 +03:00
qazal	efe8b5611d	move ProfilePointEvent out of device.py [pr] (#11630 ) Generic profiling events exist in helpers so they can be imported from everywhere in tinygrad.	2025-08-12 09:58:32 +03:00
chenyu	0d7075f2de	assign should broadcast input tensor (#11629 ) fixed test_assign_broadcast	2025-08-11 23:36:35 -04:00
Joshua Kissoon	c44760c89d	torch backend: fix arange, add linalg.cross, add tests (#11628 )	2025-08-11 23:34:41 -04:00
George Hotz	ca41b5e38b	skip_0 in graph rewrite [pr] (#11627 ) * skip_0 in graph rewrite [pr] * no track_rewrites on test * use dict instead of set	2025-08-11 18:29:04 -07:00
Sardor	ca7a641442	fix bugs at examples/yolov3.py (#11614 ) * Update load_weight. Give valid model url * Fix bug in iou function	2025-08-11 21:14:47 -04:00
chenyu	0c97d6de1b	don't round pow output for int pow int (#11625 ) also added atol=0 and big pows for the tests	2025-08-11 20:57:47 -04:00
chenyu	d623f6d850	support int Tensor pow to const non-negative int (#11624 ) matches torch	2025-08-11 19:50:19 -04:00
chenyu	857a830dcc	fix test_arange_float_step (#11623 )	2025-08-11 16:58:42 -04:00
chenyu	0806677b51	rewrite sort idx (#11613 )	2025-08-11 16:20:56 -04:00
George Hotz	700c11597b	switch contextvars.ContextVar to _ContextVar (#11621 )	2025-08-11 12:20:09 -07:00
ttomsa	ae0c3cfff6	change clang -march flag to -mcpu on arm (#10970 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-08-11 13:38:48 -04:00
geohotstan	27bcb9fd1c	Support cubic mode for ONNX Resize OP (#11612 ) * start * add reference * this is so much slower * this makes sense but differs from official impl, but results are still correct..? * add a comment * Just keep it simple for now since I don't fully get it yet * address comments * correct * teeny clean up * another small comment improvement lol	2025-08-11 11:49:30 -04:00
nimlgen	d2bb1bcb97	cloud: a bit better err handling (#11616 ) * cloud: err propagation to client * fix * print exc * linter * excs * fix * hm * flaky	2025-08-11 15:51:22 +03:00
qazal	6a232ccdac	viz: add tiny range drawing helper (#11620 ) * viz: add tiny range drawing helper * less	2025-08-11 15:15:43 +03:00
qazal	e768773e13	viz: use colors helper (#11618 )	2025-08-11 13:10:15 +03:00
qazal	7d6c0a8cc7	viz: refactor progress msg (#11617 )	2025-08-11 13:01:36 +03:00
chenyu	630edcffd8	remove .float calls in olmoe (#11610 ) still matches torch	2025-08-10 20:33:22 -04:00
chenyu	a67e0917c3	list indexing can normalize in python (#11609 ) * list indexing can normalize in python list index does not need to be normalized in tensor * update those	2025-08-10 20:02:38 -04:00
chenyu	1181ec0cd2	few more tensor indexing test cases (#11608 )	2025-08-10 18:56:42 -04:00
George Hotz	996c907c0b	rewrite not ready + children machinery (#11607 ) * rewrite not ready + children machinery * it doesn't like track rewrites	2025-08-10 15:28:30 -07:00
Sieds Lykles	1875bc69f9	Late rewrite rules for CMPLT (#11591 ) * add rules * more rules * fix comment spelling * remove two rules	2025-08-10 22:18:13 +02:00
nimlgen	5403a4aeaf	null dev: support offset on buffers (#11606 ) * null dev: support offset on buffers * nolimit	2025-08-10 21:58:37 +03:00
geohotstan	b0dab6a4cd	onnx Resize OP clean up (#11603 ) * start * slight clean up	2025-08-10 14:10:39 -04:00
Sieds Lykles	10540414cd	Add Ops.CMPEQ (#10431 ) * Add op * add to Groupop.ALU * fix spec * fix ptx * temporary pickle by name to see process replay * add Ops.EQ to binary ops * Actuall rename properly * add test to assert CMPEQ is being used * Ops.CMPEQ is automatic cast to bool * add Ops.CMPEQ to llvm * add Ops.CMPEQ to llvm	2025-08-10 13:13:16 +02:00
chenyu	f7aa1b85fe	minor sort cleanups (#11602 )	2025-08-10 01:51:23 -04:00
chenyu	dfb702ef33	fix sort for small dim (#11601 ) * fix sort for small dim * fixed test_sort_empty	2025-08-10 01:17:41 -04:00
chenyu	ef17af85c6	remove .float call in llama logit (#11598 ) * remove .float call in llama logit * bfloat item	2025-08-10 00:02:18 -04:00
chenyu	dd3d2eb36c	add training llama3 test in ci (#11599 )	2025-08-09 22:35:39 -04:00
chenyu	3e64467322	remove freqs_cis contiguous in llama (#11597 )	2025-08-09 21:11:12 -04:00
chenyu	7338ffead0	small beautiful_mnist update (#11596 ) gather is fast now. there's a conv/bw kernel that only gets fast with BEAM, but whole thing runs < 5 seconds now regardless	2025-08-09 19:51:14 -04:00
chenyu	45baec1aab	model parallel llama (#11588 ) MP=8 GRADIENT_ACC_STEPS=3 BS=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=70B SEQLEN=512 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py	2025-08-09 16:54:27 -04:00
nimlgen	09bc377da3	search: print runtime failures on debug (#11593 )	2025-08-09 23:01:19 +03:00
nimlgen	14f99ff1a1	amd: doorbell_cpu_addr is not used (#11592 ) * amd: doorbell_cpu_addr is not used * hm	2025-08-09 20:03:21 +03:00
Sieds Lykles	01c770c77b	Fix z3 float cast in indexing (#11590 ) * adjust dtype of z3_renderer and add rule for cast * dtypes.bool is also cast noop * add regression test * make embedding smaller * even smaller test	2025-08-09 17:59:23 +02:00
Sieds Lykles	10d388499d	Refactor optional.py (#11578 ) * move fast_idiv to transcendental * move optional.py * adjust comment * change import * mypy needs this?	2025-08-09 17:35:05 +02:00
nimlgen	20e46a175c	do not use disk with usb (#11119 ) * not use disk with usb * better name	2025-08-09 11:58:02 +03:00
qazal	53179953fc	viz: factor out memory graph render (#11586 )	2025-08-08 20:18:11 +03:00
qazal	8ce72d3fad	simpler disassembly table spec (#11583 ) * simpler disassembly table spec * update ui * move to scalar/vec render	2025-08-08 17:59:26 +03:00
qazal	44a222a9b2	viz: move resource usage summary to server (#11582 )	2025-08-08 17:08:28 +03:00
qazal	793ace530e	update amd_uop_matmul.py import (#11581 ) Using this for testing SQTT	2025-08-08 17:07:35 +03:00
chenyu	b232c60def	benchmark openpilot 0.9.9 (#11575 ) * benchmark openpilot 0.9.9 not sure what to do with the 0.9.7 ones with IMAGE=2 and validate * name	2025-08-08 01:26:14 -04:00
qazal	16f0edbe90	pass opts arg in get_program process replay [pr] (#11571 ) * fix ptx process replay * keyword arg * renderer is also optional [pr] * test_linearizer fixup * name function order is args,ret,kwargs * can use opts_to_apply * pass through p.applied_opts * sink_arg * now it opens devices too	2025-08-08 03:05:09 +03:00
qazal	960cc6533a	pass through name function args in track_rewrites (#11572 )	2025-08-08 02:28:52 +03:00

... 16 17 18 19 20 ...

10633 Commits