tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 13:58:00 -05:00

Author	SHA1	Message	Date
nimlgen	fb278c6a02	do not recreate Compiled.profile_events in helper_collect_profile (#11171 )	2025-07-10 23:55:12 +03:00
George Hotz	5c5eb92ed4	tc unroll after upcast [pr] (#11170 )	2025-07-10 13:43:50 -07:00
George Hotz	05613c8cac	use shape str for tensor cores upcast/reduce [pr] (#11168 ) * use shape str for tensor cores upcast/reduce [pr] * reduce axis count isn't fixed	2025-07-10 13:10:58 -07:00
nimlgen	cc6ed30f4f	nv: relative lv addressing in NVPageTableEntry (#11164 )	2025-07-10 22:35:50 +03:00
chenyu	439d033af9	update the README matmul example (#11167 ) don't call rand and numpy to show that it's indeed one kernel	2025-07-10 14:47:29 -04:00
qazal	bde80c0cdf	record GraphEvents in metal graph (#11145 ) * record GraphEvents in metal graph * add TestProfiler.test_graph, revert old stuff * move profile capture to MetalGraph * comment * don't double record graph command buffers * wait_check * explicit delete	2025-07-10 21:32:06 +03:00
George Hotz	8ce3d5906b	use shape_str for tensor cores (#11165 )	2025-07-10 09:10:36 -07:00
nimlgen	581397110f	nv: use classes in GSP_IP (#11163 )	2025-07-10 17:47:12 +03:00
nimlgen	705de6b8a6	nv: parse sizes of ctx buffers (#11161 )	2025-07-10 17:46:48 +03:00
qazal	dcc9704b6b	viz: profile RewriteSteps in TINY device (#11125 ) * viz: profile RewriteSteps in TINY device * use TracingKey with category * split by whitespace * add tracing.py * work * tracing_key * TRACK_MATCH_STATS=3, can this be in defaults? * fallback name * work * javascript * measure text is slow * checkout * profile graph_rewrite/graph_rewrite_map * change that * no as * finally * work * linking works	2025-07-10 17:45:57 +03:00
Pyry Kovanen	32117402dd	metal: fix incorrect _free on interpreter exit (#11158 )	2025-07-10 14:01:30 +03:00
qazal	3d610f6d2b	viz: small ui cleanup (#11157 ) * viz: small ui cleanup * 2	2025-07-10 11:43:36 +03:00
chenyu	7db07e5f2c	don't narrow range of CAST on bool/unsigned (#11156 )	2025-07-09 22:20:09 -04:00
George Hotz	e154a66f43	unroll axis 0 in tensor core (#11155 ) * unroll is 0 in tc [pr] * flip order of upcast/reduce in tensor core * Revert "flip order of upcast/reduce in tensor core" This reverts commit `e564e38bcd`.	2025-07-09 17:28:23 -07:00
George Hotz	b7742ad9e4	migrate to string swizzle [pr] (#11154 )	2025-07-09 16:57:53 -07:00
George Hotz	4156baee93	break swizzle into three chunks [pr] (#11153 ) * break swizzle into three chunks [pr] * test failed	2025-07-09 15:30:34 -07:00
George Hotz	ca2dc95433	swizzle in tc can't be none [pr] (#11152 )	2025-07-09 14:44:23 -07:00
George Hotz	53ae153404	tc should be in opt (#11148 ) * tc should be in opt [pr] * fix import	2025-07-09 14:12:21 -07:00
wozeparrot	6697d0089d	initial gfx950 kfd support (#11151 ) * feat: initial gfx950 support * fix: lint	2025-07-09 13:45:16 -07:00
George Hotz	262054be52	gfx950 tc support (#11150 )	2025-07-09 13:30:42 -07:00
nimlgen	b6981404ed	memory: use page shifts in memory manager (#11149 ) * memory: use page shifts in memory manager * fix	2025-07-09 22:05:00 +03:00
qazal	5c1d215b41	viz: add Graph stream (#11144 ) * viz: stack an event for the entire batch * multi * whitespace * work * multi graph, Graph gets its own row	2025-07-09 20:56:46 +03:00
George Hotz	22305260e0	move tc to tc.py [pr] (#11147 )	2025-07-09 10:55:56 -07:00
George Hotz	2893feb9f6	cleanups for kernel.py (#11143 ) * cleanups for kernel.py * fixups	2025-07-08 18:10:25 -07:00
George Hotz	b11ca104e9	axis cleanups [pr] (#11142 )	2025-07-08 17:07:26 -07:00
chenyu	7ce9e45474	mypy onnx_parser (#11141 )	2025-07-08 19:50:28 -04:00
George Hotz	a1b8f3e64f	delete info from kernel [pr] (#11139 ) * delete info from kernel [pr] * update kernel info * delete info	2025-07-08 15:53:13 -07:00
George Hotz	359bed74f8	axis type tracking [pr] (#11137 ) * axis type tracking [pr] * keep update_info * keep legacy colors * update tests to apply_opt	2025-07-08 14:16:25 -07:00
chenyu	dada3f5bf3	skip some new onnx tests (#11135 ) these fails on master with latest onnx	2025-07-08 16:12:48 -04:00
chenyu	ffcc557986	lint onnx and onnx_parser (#11134 )	2025-07-08 15:28:35 -04:00
George Hotz	3238d21cd1	add finalized to kernel [pr] (#11132 ) * add finalized to kernel [pr] * add copy	2025-07-08 11:06:17 -07:00
George Hotz	289a411f5f	hotfix: remove unused GBARRIER, CONTIGUOUS color is GBARRIER	2025-07-08 10:31:06 -07:00
nimlgen	43650169f4	nv: switch headers to 570.144 to match gsp (#11131 )	2025-07-08 20:29:06 +03:00
quortus	790b05ab12	[pr] Unify CONTIGUOUS and GBARRIER (#11121 ) * Unify CONTIGUOUS and GBARRIER * Simplify rules	2025-07-08 10:27:23 -07:00
nimlgen	b516fe71b4	nv: return real struct in _alloc_boot_struct (#11130 )	2025-07-08 20:04:43 +03:00
qazal	3dfc0ff887	move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126 ) * move cpu_profile and shared ProfileEvents to helpers [pr] * TestProfiler.test_cpu_profile * update test_viz.py * TestProfiler.test_profile_multiops ordering, it's different streams now	2025-07-08 12:14:03 +03:00
George Hotz	397826f0b4	add a test for 1B llm (#11124 ) * add a test for 1B llm * fix mbs * add apps to release	2025-07-07 18:47:25 -07:00
George Hotz	f7d4638e05	start LLM app, tons of clean up required. target is 200 line ollama (#11068 ) * start LLM app, tons of clean up required. target is 200 line ollama * kind of works * simpler * add k/v cache * with SYM=1, it loops * no rope cache * simpler * more cleanups * cleanups * works * argparse and comments * from gguf * generate is a function * no copy from cpu * fix max context pass in * test * improve test * ai2_arc * fix 8B, use less ram * 136 lines	2025-07-07 17:09:46 -07:00
chenyu	341a686799	Tensor.diagonal (#11122 ) only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant	2025-07-07 16:21:26 -04:00
Sieds Lykles	584fd6af5a	Fix division by zero and mask bug in add views (#11088 ) * merge view infinite loop test * adjust condition in `x//d -> x//(-d)-1` Fix division by zero in add views * adjust offset end * fix typo in comment * add target to test_merge_views_variable * fix view incorrectly being masked * ssimplify strides and offset of the new view to canonicalize * remove print in test --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-07-07 10:05:47 -07:00
nimlgen	71377cd233	nv: parse falcon app descs (#11118 )	2025-07-07 18:14:14 +03:00
nimlgen	9a573a1d99	nv: finalize nvdev (#11117 ) * nv: finalize nvdev * typo	2025-07-07 16:31:59 +03:00
nimlgen	fa59c05282	nv: import flags from system (#11115 ) * nv: import flags from system * not used	2025-07-07 14:46:49 +03:00
Nino Risteski	a1a146a499	adding enable_gqa in SDPA (#11097 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-07-06 23:25:33 -07:00
nimlgen	b73e89110e	nv: align allocations for perf (#11114 )	2025-07-06 22:32:11 +03:00
chenyu	7468959f4b	Tensor.argsort (#11112 )	2025-07-06 13:56:35 -04:00
kevvz	b7af9cf849	clean svd tests, set full_matrices false in torch backend (#11113 ) * clean tests, set full_matrices false * add more shape asserts	2025-07-06 13:55:49 -04:00
qazal	a556f50668	viz: small ui fixes (#11110 ) * share styling of ctx-list and metadata * scrollbar-gutter: stable prevents layout shift when changing steps * margin-left makes left side unaligned	2025-07-06 17:05:36 +03:00
chenyu	ba88ec3ad0	pipe linalg svd to torch (#11109 ) and found a bug in svd	2025-07-06 08:37:25 -04:00
chenyu	845a4d32bc	Tensor.diag (#11108 ) also updated Tensor.eye to use it	2025-07-05 23:03:02 -04:00

... 23 24 25 26 27 ...

10633 Commits