tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
George Hotz	d81acbeef6	multi: move shrink after copy (#10109 ) * multi: move shrink after copy * passing now	2025-04-30 10:29:51 -04:00
qazal	67bd8489ad	grouper cleanups [pr] (#10113 )	2025-04-30 18:54:47 +08:00
nimlgen	b4c9a3d8f4	hcq: use mmio iface in copies (#10111 ) * hcq: use mmio iface in copies * linter * fix_am * am	2025-04-30 11:05:13 +03:00
nimlgen	5c7d004da5	hcq: refactor int ptrs to hcqbuffers (#10105 ) * hcq: refactor int ptrs to hcqbuffers * more refactors * linter * use in allocator * test fiz * fx * ops * final? * simpler * keep this for now	2025-04-30 00:12:18 +03:00
chenyu	573bbb9746	Revert "remove TransformerBlock contiguous in llama (#10104 )" (#10108 ) This reverts commit `b8d07dcc54`.	2025-04-29 15:28:38 -04:00
chenyu	4a04098389	fix llama3 with nf4 quantize (#10107 ) also int8 outputs is wrong	2025-04-29 15:14:36 -04:00
George Hotz	9c1b80499f	names for graph rewrites + null device supports exp and friends (#10106 )	2025-04-29 14:28:20 -04:00
chenyu	b8d07dcc54	remove TransformerBlock contiguous in llama (#10104 )	2025-04-29 14:15:39 -04:00
Ignacio Sica	9d5677c12c	fix `ptx` linearizer bug 2 [pr] (#9967 ) * check for local buffer * hotfix * add test_tensor_cores_emulation run for ptx	2025-04-29 14:30:07 -03:00
qazal	a59d18da21	hack for VIZ=1 with examples/llama (#10103 ) * hack for VIZ=1 with examples/llama * move it alongside BEAM=0	2025-04-29 23:42:17 +08:00
qazal	93bf8764f2	do not open devices in lowering (#10101 ) * do not open devices in lowering [pr] * ctx=opts * ctx * fuzz test	2025-04-29 23:18:16 +08:00
George Hotz	c3ff308abb	range has only one src now [pr] (#10100 ) * range has only one op now * fix z3 checker * ci fix * needs shell * try pip ensure update * that ensurepip is useless * upgrade pip before cache * windows happy?	2025-04-29 10:31:05 -04:00
George Hotz	427471550a	hotfix: amd tflops to 74 and some external_benchmark_sdxl_softmax stuff	2025-04-29 09:02:27 -04:00
Ignacio Sica	58cf8cd493	add support for "shared_mem" for `LLVM` (#10093 ) * init llvm shared * add test_tensor_cores_emulation run for llvm	2025-04-29 08:56:36 -04:00
qazal	ad7546c931	assert in test_indexing_two_bind instead of silent fail (#10099 ) * assert in test_indexing_two_bind instead of silent fail * debuggable * skip test_simple_train	2025-04-29 20:23:25 +08:00
George Hotz	cee220a1ab	always expand ssa on wheres (#9697 ) Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-04-29 20:08:41 +08:00
qazal	3b67f56c02	kernelize some llama realizes (#10098 )	2025-04-29 18:39:56 +08:00
qazal	cbf7347cd6	display viz rewrites with tabbing if they are subrewrites (#10097 ) * display viz rewrites with tabbing if they are subrewrites * update viz api	2025-04-29 17:57:21 +08:00
George Hotz	73c2f6602f	test sdxl softmax (#10096 )	2025-04-28 21:55:50 -04:00
George Hotz	eaceafecae	do fusion locally (#10095 ) * do fusion locally * oops, that's the right way * explicit delete closure	2025-04-28 20:45:37 -04:00
chenyu	3eba3d6ee9	don't pass model in convert_from_huggingface and convert_from_gguf (#10094 ) it only needs n_layers	2025-04-28 20:11:19 -04:00
George Hotz	a2d0684fc1	test_attention_simple_view (#10092 ) * test_attention_simple_view * correct comment	2025-04-28 20:01:22 -04:00
Ignacio Sica	bda116d773	fix `use_tensor_cores` propagation (#10048 ) * propagate use_tensor_cores * add use_tensor_core to arg in test and search * bugfix * get TC val from ContextVar in search * revert minor space change * add tc emulation test to ci and benchmark * revert * revert whitespace change * remove test for ptx * add comment and remove llvm test run	2025-04-28 19:30:50 -03:00
George Hotz	d32f5e9f3a	improve rendering of shapes in viz + investigate symbolic [pr] (#10091 )	2025-04-28 16:44:09 -04:00
Sieds Lykles	dbb7aee02e	Split constant in div with negative x (#10088 ) * add rule * change test * lower complexity limit * remove offset in fold_unrolled_divs * remove import * add one more condition	2025-04-28 16:24:14 -04:00
chenyu	610ee79b22	cherry pick mlperf5.0 branch to master (#10089 )	2025-04-28 15:36:56 -04:00
chenyu	459a223202	simpler Literal annotation in code_for_workitem [pr] (#10087 )	2025-04-28 14:59:25 -04:00
nimlgen	dcd9a633c3	am: load minimum fw (#10083 ) * am: load minimum psp parts * try thos * remove me & pfp	2025-04-28 21:28:05 +03:00
George Hotz	ecff82a698	fixing single kernel softmax: resolve (#10086 ) * fixing single kernel softmax: resolve * add failing lin test	2025-04-28 13:46:20 -04:00
George Hotz	4c242b0483	hotfix: tests all pass on metal local	2025-04-28 12:09:00 -04:00
George Hotz	690dac79b5	don't modify the ranges on reduce rewrite (#10062 ) * bug in div range folding * simpler * oh, this is right for indexing, but the div mod folding needs to be fixed * reenable * Passing test_complexity_w_unroll2 (#10068) * Passing * remove non_folded_divs * Add check for negative tern in div folding * Add test * bump that limit * fix casted --------- Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>	2025-04-28 12:01:19 -04:00
quortus	5130759605	Make sure clang always inline batched functions (#10037 )	2025-04-28 10:48:24 -04:00
George Hotz	c4a50f9d89	fix full shape in kernel.py [pr] (#10085 ) * fix full shape in kernel.py * fix that heuristic * full shape in shapetracker is fast * fix process replay [pr] * simpler * this * i'm just going to ignore that one	2025-04-28 09:32:58 -04:00
qazal	ac37510f60	remu: only write v_cmp result if exec is set (#10084 )	2025-04-28 20:31:52 +08:00
qazal	d6b436a815	remu bugfix with -0.0 negation (#10082 )	2025-04-28 15:46:42 +08:00
nimlgen	15e4302784	am: optimize zeroing out boot structs (#10081 )	2025-04-28 10:15:32 +03:00
nimlgen	68e5ab8552	am: fix typo in fw loading (#10080 )	2025-04-28 09:45:00 +03:00
chenyu	e996584685	olmoe in mac benchmark (#10077 )	2025-04-27 21:07:02 -04:00
George Hotz	732e172961	don't require contiguous after fuse (#10074 )	2025-04-27 13:17:22 -04:00
qazal	1aed04ec12	cpu is ground truth in VALIDATE_WITH_CPU=1 [pr] (#10067 )	2025-04-28 01:14:21 +08:00
George Hotz	129bddde74	lin failure from SINGLE_KERNEL_SOFTMAX (#10073 ) * lin failure from SINGLE_KERNEL_SOFTMAX * fix lin issue * more pure diff	2025-04-27 13:02:10 -04:00
George Hotz	b341296304	hotfix: save sdxl ram	2025-04-27 12:09:45 -04:00
George Hotz	68c5f7ba80	load fast in sdxl (#10072 ) * load fast in sdxl * back to that with the ret * no context	2025-04-27 11:58:51 -04:00
George Hotz	768eb94c3e	disable debug for load_state_dict [pr] (#10070 )	2025-04-27 11:11:56 -04:00
George Hotz	4b8ef6ce78	hotfix: sdxl corealize	2025-04-27 10:41:46 -04:00
George Hotz	b6d2effaf5	assign is contiguous (#10066 ) * assign is contiguous * disable process replay for SDXL	2025-04-27 08:40:33 -04:00
George Hotz	1253819151	make beautiful indexing use a Variable (#10063 ) * make beautiful indexing use a Variable * stunning test * better color * training is broken * fix tests * fix variable indexing * fix test * no contiguous * revert that * revert that too * indexing two bind * skip for webgpu * make not slow	2025-04-27 08:22:38 -04:00
Rory Clear	a13a43c4fe	yolo 416 to 640 res (#10047 )	2025-04-26 20:45:58 -04:00
chenyu	4c1ce1a299	don't simplify if div folding resulted in negative numerator (#10064 ) * don't simplify if div folding resulted in negative numerator * test	2025-04-26 17:01:18 -04:00
George Hotz	1805403821	fix rand arange folding (#10060 ) * test rand range * --amend * fix rand arange folding * reduce_rangeless fix	2025-04-26 12:24:05 -04:00

... 36 37 38 39 40 ...

10490 Commits