tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
nimlgen	ea7f2f779c	hcq: p2p nv-amd (#11195 ) * hcq: p2p between diff devices * fix	2025-07-12 18:53:34 +03:00
qazal	d3ec63a5c3	viz: add base class for unittests (#11178 )	2025-07-11 13:58:03 +03:00
nimlgen	fb278c6a02	do not recreate Compiled.profile_events in helper_collect_profile (#11171 )	2025-07-10 23:55:12 +03:00
qazal	bde80c0cdf	record GraphEvents in metal graph (#11145 ) * record GraphEvents in metal graph * add TestProfiler.test_graph, revert old stuff * move profile capture to MetalGraph * comment * don't double record graph command buffers * wait_check * explicit delete	2025-07-10 21:32:06 +03:00
chenyu	7db07e5f2c	don't narrow range of CAST on bool/unsigned (#11156 )	2025-07-09 22:20:09 -04:00
George Hotz	4156baee93	break swizzle into three chunks [pr] (#11153 ) * break swizzle into three chunks [pr] * test failed	2025-07-09 15:30:34 -07:00
George Hotz	53ae153404	tc should be in opt (#11148 ) * tc should be in opt [pr] * fix import	2025-07-09 14:12:21 -07:00
nimlgen	b6981404ed	memory: use page shifts in memory manager (#11149 ) * memory: use page shifts in memory manager * fix	2025-07-09 22:05:00 +03:00
qazal	5c1d215b41	viz: add Graph stream (#11144 ) * viz: stack an event for the entire batch * multi * whitespace * work * multi graph, Graph gets its own row	2025-07-09 20:56:46 +03:00
George Hotz	2893feb9f6	cleanups for kernel.py (#11143 ) * cleanups for kernel.py * fixups	2025-07-08 18:10:25 -07:00
George Hotz	359bed74f8	axis type tracking [pr] (#11137 ) * axis type tracking [pr] * keep update_info * keep legacy colors * update tests to apply_opt	2025-07-08 14:16:25 -07:00
chenyu	dada3f5bf3	skip some new onnx tests (#11135 ) these fails on master with latest onnx	2025-07-08 16:12:48 -04:00
qazal	3dfc0ff887	move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126 ) * move cpu_profile and shared ProfileEvents to helpers [pr] * TestProfiler.test_cpu_profile * update test_viz.py * TestProfiler.test_profile_multiops ordering, it's different streams now	2025-07-08 12:14:03 +03:00
George Hotz	f7d4638e05	start LLM app, tons of clean up required. target is 200 line ollama (#11068 ) * start LLM app, tons of clean up required. target is 200 line ollama * kind of works * simpler * add k/v cache * with SYM=1, it loops * no rope cache * simpler * more cleanups * cleanups * works * argparse and comments * from gguf * generate is a function * no copy from cpu * fix max context pass in * test * improve test * ai2_arc * fix 8B, use less ram * 136 lines	2025-07-07 17:09:46 -07:00
chenyu	341a686799	Tensor.diagonal (#11122 ) only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant	2025-07-07 16:21:26 -04:00
Sieds Lykles	584fd6af5a	Fix division by zero and mask bug in add views (#11088 ) * merge view infinite loop test * adjust condition in `x//d -> x//(-d)-1` Fix division by zero in add views * adjust offset end * fix typo in comment * add target to test_merge_views_variable * fix view incorrectly being masked * ssimplify strides and offset of the new view to canonicalize * remove print in test --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-07-07 10:05:47 -07:00
Nino Risteski	a1a146a499	adding enable_gqa in SDPA (#11097 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-07-06 23:25:33 -07:00
chenyu	7468959f4b	Tensor.argsort (#11112 )	2025-07-06 13:56:35 -04:00
kevvz	b7af9cf849	clean svd tests, set full_matrices false in torch backend (#11113 ) * clean tests, set full_matrices false * add more shape asserts	2025-07-06 13:55:49 -04:00
chenyu	ba88ec3ad0	pipe linalg svd to torch (#11109 ) and found a bug in svd	2025-07-06 08:37:25 -04:00
chenyu	845a4d32bc	Tensor.diag (#11108 ) also updated Tensor.eye to use it	2025-07-05 23:03:02 -04:00
ttomsa	4905af4ae0	remove invalid int div test (#11106 ) * rm test * also rm this	2025-07-05 18:57:55 -04:00
qazal	81781dc12b	viz: renames and spacing changes to tracing (#11102 )	2025-07-05 18:40:39 +03:00
qazal	7619bf35e7	cleanup: remove disabled TestIndexingOrdering (#11101 ) * cleanup: remove disabled TestIndexingOrdering * don't import kernelize internals	2025-07-05 18:14:37 +03:00
qazal	4fcfaa0ef7	viz: switch to TracingKey (#11100 ) * viz: switch to TracingKey * tuple * order is name, keys, fmt * add test_tracing_key	2025-07-05 17:46:18 +03:00
qazal	3d8569f6d8	hotfix: infinite loop in tracking pattern matcher (#11094 ) * failing test * fix that * given matchers	2025-07-04 19:55:26 +03:00
nimlgen	01f3c4f44d	memory: simpler paddr allocation logic (#11090 ) * memory: new paddr allocation logic * am fix * am refactrros * fix * mypy * use it * am	2025-07-04 17:00:36 +03:00
qazal	988540f401	support capturing cpu_profile on error (#11078 ) * support capturing cpu_profile on error * spacing * pylint complains	2025-07-04 11:53:12 +03:00
chenyu	a2f5a54458	move sparse_categorical_crossentropy to test_ops (#11083 ) also flattened the tests	2025-07-03 21:40:54 -04:00
chenyu	7c8ccb0267	sparse_categorical_crossentropy cleanup [pr] (#11082 )	2025-07-03 18:32:52 -04:00
chenyu	678cabc6f2	use argfix in Tensor.stack (#11077 ) works for multiple Tensor args or single tuple/list of Tensors, but not the mixed	2025-07-03 12:15:11 -04:00
qazal	b695e8c4d6	viz: remove support for naming with self (#11076 )	2025-07-03 17:29:14 +03:00
Sieds Lykles	53985297bd	add test, fix rewrite rule and raise error on division by zero (#11073 )	2025-07-03 08:25:06 -04:00
George Hotz	d049639221	little setitem test (#11064 ) * setitem has one less realize, why broken * put realize back	2025-07-02 15:10:24 -07:00
George Hotz	3b85534df0	outerworld range test [pr] (#11059 ) * outerworld range test [pr] * bound range * grad acc test * more tests * 5 steps is fine	2025-07-02 14:28:44 -07:00
qazal	ad155f5454	print inputs to get_program in process replay [pr] (#11051 ) * print inputs to get_program in process replay [pr] * colors * keep dataclass default escapes * Revert "keep dataclass default escapes" This reverts commit `c6db7e8a7a`. * note for ast_repr * add that back	2025-07-02 20:20:01 +03:00
qazal	a919b8325b	add test_kernel_info (#11054 ) * add test_kernel_info * reorder	2025-07-02 19:48:12 +03:00
kevvz	3b041d188f	[bounty] Singular Value Decomposition (#10875 ) * inital commit * add qr + expand svd to full matrix * add odd number support * add linalg tests * qr supports dims of arbitrary size * add qr tests * svd supports dims of arbitrary size * small cleanip * improvements over svd batch handling * improve linalg tests * make u_pad match q shape * add nonfull matrix tests * little less verbose nonfull svd test * added dtypes on svd + return vt instead of vt * lint * more lint * lint + set seed * small fix * small lint * lint * add int casting to indices and shapes * remove int from shape tuple in svd * small cleanup * add return types * reuse inverse_permute * refactoring * whitespace * remove regularization term to prevent bad outputs on ill conditioned matrices * remove seed * refactor * lint * refactor * spacing * remove clone * line reduction * smarter heuristic for iterations_per_round * add big test * lint * turns out no constant needed? * wrap tests * some small matrices need the constant * remove realize --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-07-02 09:06:03 -07:00
Ahmed Harmouche	e992ed10dc	WebGPU on Windows (#10890 ) * WebGPU on Windows * Fix dawn-python install * New test * pydeps * Minor fix * Only install dawn-python on windows webgpu --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-07-02 08:38:45 -07:00
chenyu	4626e9c172	is_numpy_ndarray helper [pr] (#11050 )	2025-07-02 09:12:53 -04:00
qazal	452b22c9b6	fix process replay diff in PYTHON device [pr] (#11052 ) * fix process replay diff in PYTHON device [pr] The PYTHON backend pickles and encodes UOps, the encoded binary can't be directly diffed in process replay. * note	2025-07-02 11:06:46 +03:00
geohotstan	8ebf0abaae	ONNX external_test_onnx_backend use PYTHON device for model (#10915 ) * try * ruff check --fix * no skip test * hmmmmmmm I don't get this D: * run CI again * why is PYTHON device faster than CPU? * run ci again and fix lint * actually doesn't PYTHON device make sense here? * see cpu speed again * Revert "see cpu speed again" This reverts commit `1e366f2256`. * trigger CI * pretty good --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-01 12:11:17 -04:00
qazal	8b0871ac31	viz: test for no lockup on infinite loop (#11041 ) * viz: add test infinite loop fallback * assert * continue til the end * work * bring that back * fallback to nop	2025-07-01 17:44:20 +03:00
b1tg	fcbefde8f5	fix DiskDevice reuse (#11039 ) * fix DiskDevice reuse * fix mypy and DiskDevice.count * mypy * add test --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-07-01 10:29:21 -04:00
George Hotz	0597735f28	remove TC=3 not porting this (#11045 )	2025-06-30 15:12:49 -07:00
George Hotz	cccfe6b422	hotfix: test_no_inf_loop_bottom_up	2025-06-30 14:21:45 -07:00
George Hotz	b829331219	infinite loop detect in fixed_point_rewrite [pr] (#11038 )	2025-06-30 08:57:29 -07:00
George Hotz	cb531dba42	detect infinite loop in graph rewrite [pr] (#11036 )	2025-06-30 08:15:13 -07:00
qazal	2ea4737930	viz: fix newlines breaking label colors (#11030 ) * viz: fix newlines breaking label colors * TestViz.test_colored_label * TestWordWrap	2025-06-30 13:39:44 +03:00
George Hotz	5911b71404	early support for bidirectional pattern matcher (#11027 ) * early support for bidirectional pattern matcher * expose it and add a test * no bottom up arg there * disable flaky test	2025-06-29 16:54:07 -07:00

1 2 3 4 5 ...

3999 Commits