tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
chenyu	b8d07dcc54	remove TransformerBlock contiguous in llama (#10104 )	2025-04-29 14:15:39 -04:00
Ignacio Sica	9d5677c12c	fix `ptx` linearizer bug 2 [pr] (#9967 ) * check for local buffer * hotfix * add test_tensor_cores_emulation run for ptx	2025-04-29 14:30:07 -03:00
qazal	a59d18da21	hack for VIZ=1 with examples/llama (#10103 ) * hack for VIZ=1 with examples/llama * move it alongside BEAM=0	2025-04-29 23:42:17 +08:00
qazal	93bf8764f2	do not open devices in lowering (#10101 ) * do not open devices in lowering [pr] * ctx=opts * ctx * fuzz test	2025-04-29 23:18:16 +08:00
George Hotz	c3ff308abb	range has only one src now [pr] (#10100 ) * range has only one op now * fix z3 checker * ci fix * needs shell * try pip ensure update * that ensurepip is useless * upgrade pip before cache * windows happy?	2025-04-29 10:31:05 -04:00
George Hotz	427471550a	hotfix: amd tflops to 74 and some external_benchmark_sdxl_softmax stuff	2025-04-29 09:02:27 -04:00
Ignacio Sica	58cf8cd493	add support for "shared_mem" for `LLVM` (#10093 ) * init llvm shared * add test_tensor_cores_emulation run for llvm	2025-04-29 08:56:36 -04:00
qazal	ad7546c931	assert in test_indexing_two_bind instead of silent fail (#10099 ) * assert in test_indexing_two_bind instead of silent fail * debuggable * skip test_simple_train	2025-04-29 20:23:25 +08:00
George Hotz	cee220a1ab	always expand ssa on wheres (#9697 ) Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-04-29 20:08:41 +08:00
qazal	3b67f56c02	kernelize some llama realizes (#10098 )	2025-04-29 18:39:56 +08:00
qazal	cbf7347cd6	display viz rewrites with tabbing if they are subrewrites (#10097 ) * display viz rewrites with tabbing if they are subrewrites * update viz api	2025-04-29 17:57:21 +08:00
George Hotz	73c2f6602f	test sdxl softmax (#10096 )	2025-04-28 21:55:50 -04:00
George Hotz	eaceafecae	do fusion locally (#10095 ) * do fusion locally * oops, that's the right way * explicit delete closure	2025-04-28 20:45:37 -04:00
chenyu	3eba3d6ee9	don't pass model in convert_from_huggingface and convert_from_gguf (#10094 ) it only needs n_layers	2025-04-28 20:11:19 -04:00
George Hotz	a2d0684fc1	test_attention_simple_view (#10092 ) * test_attention_simple_view * correct comment	2025-04-28 20:01:22 -04:00
Ignacio Sica	bda116d773	fix `use_tensor_cores` propagation (#10048 ) * propagate use_tensor_cores * add use_tensor_core to arg in test and search * bugfix * get TC val from ContextVar in search * revert minor space change * add tc emulation test to ci and benchmark * revert * revert whitespace change * remove test for ptx * add comment and remove llvm test run	2025-04-28 19:30:50 -03:00
George Hotz	d32f5e9f3a	improve rendering of shapes in viz + investigate symbolic [pr] (#10091 )	2025-04-28 16:44:09 -04:00
Sieds Lykles	dbb7aee02e	Split constant in div with negative x (#10088 ) * add rule * change test * lower complexity limit * remove offset in fold_unrolled_divs * remove import * add one more condition	2025-04-28 16:24:14 -04:00
chenyu	610ee79b22	cherry pick mlperf5.0 branch to master (#10089 )	2025-04-28 15:36:56 -04:00
chenyu	459a223202	simpler Literal annotation in code_for_workitem [pr] (#10087 )	2025-04-28 14:59:25 -04:00
nimlgen	dcd9a633c3	am: load minimum fw (#10083 ) * am: load minimum psp parts * try thos * remove me & pfp	2025-04-28 21:28:05 +03:00
George Hotz	ecff82a698	fixing single kernel softmax: resolve (#10086 ) * fixing single kernel softmax: resolve * add failing lin test	2025-04-28 13:46:20 -04:00
George Hotz	4c242b0483	hotfix: tests all pass on metal local	2025-04-28 12:09:00 -04:00
George Hotz	690dac79b5	don't modify the ranges on reduce rewrite (#10062 ) * bug in div range folding * simpler * oh, this is right for indexing, but the div mod folding needs to be fixed * reenable * Passing test_complexity_w_unroll2 (#10068) * Passing * remove non_folded_divs * Add check for negative tern in div folding * Add test * bump that limit * fix casted --------- Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>	2025-04-28 12:01:19 -04:00
quortus	5130759605	Make sure clang always inline batched functions (#10037 )	2025-04-28 10:48:24 -04:00
George Hotz	c4a50f9d89	fix full shape in kernel.py [pr] (#10085 ) * fix full shape in kernel.py * fix that heuristic * full shape in shapetracker is fast * fix process replay [pr] * simpler * this * i'm just going to ignore that one	2025-04-28 09:32:58 -04:00
qazal	ac37510f60	remu: only write v_cmp result if exec is set (#10084 )	2025-04-28 20:31:52 +08:00
qazal	d6b436a815	remu bugfix with -0.0 negation (#10082 )	2025-04-28 15:46:42 +08:00
nimlgen	15e4302784	am: optimize zeroing out boot structs (#10081 )	2025-04-28 10:15:32 +03:00
nimlgen	68e5ab8552	am: fix typo in fw loading (#10080 )	2025-04-28 09:45:00 +03:00
chenyu	e996584685	olmoe in mac benchmark (#10077 )	2025-04-27 21:07:02 -04:00
George Hotz	732e172961	don't require contiguous after fuse (#10074 )	2025-04-27 13:17:22 -04:00
qazal	1aed04ec12	cpu is ground truth in VALIDATE_WITH_CPU=1 [pr] (#10067 )	2025-04-28 01:14:21 +08:00
George Hotz	129bddde74	lin failure from SINGLE_KERNEL_SOFTMAX (#10073 ) * lin failure from SINGLE_KERNEL_SOFTMAX * fix lin issue * more pure diff	2025-04-27 13:02:10 -04:00
George Hotz	b341296304	hotfix: save sdxl ram	2025-04-27 12:09:45 -04:00
George Hotz	68c5f7ba80	load fast in sdxl (#10072 ) * load fast in sdxl * back to that with the ret * no context	2025-04-27 11:58:51 -04:00
George Hotz	768eb94c3e	disable debug for load_state_dict [pr] (#10070 )	2025-04-27 11:11:56 -04:00
George Hotz	4b8ef6ce78	hotfix: sdxl corealize	2025-04-27 10:41:46 -04:00
George Hotz	b6d2effaf5	assign is contiguous (#10066 ) * assign is contiguous * disable process replay for SDXL	2025-04-27 08:40:33 -04:00
George Hotz	1253819151	make beautiful indexing use a Variable (#10063 ) * make beautiful indexing use a Variable * stunning test * better color * training is broken * fix tests * fix variable indexing * fix test * no contiguous * revert that * revert that too * indexing two bind * skip for webgpu * make not slow	2025-04-27 08:22:38 -04:00
Rory Clear	a13a43c4fe	yolo 416 to 640 res (#10047 )	2025-04-26 20:45:58 -04:00
chenyu	4c1ce1a299	don't simplify if div folding resulted in negative numerator (#10064 ) * don't simplify if div folding resulted in negative numerator * test	2025-04-26 17:01:18 -04:00
George Hotz	1805403821	fix rand arange folding (#10060 ) * test rand range * --amend * fix rand arange folding * reduce_rangeless fix	2025-04-26 12:24:05 -04:00
qazal	d13c100981	don't sort dims in verify_sink_dims [pr] (#10059 ) * don't sort dims in verify_sink_dims [pr] * 1 can exist with n * put process_replay warn last * assert shape is the same * bring that back	2025-04-26 23:24:30 +08:00
George Hotz	c80fe6d5fc	handle some fancier reduces (#10057 ) * reduce_unparented * handle fancier reduces * fold more * bugfix	2025-04-26 11:20:15 -04:00
nimlgen	e08270c1ba	nv: fix program init for no-args kernels (#10058 )	2025-04-26 18:08:53 +03:00
George Hotz	11113c9d07	reduce_unparented (#10056 )	2025-04-26 09:48:16 -04:00
George Hotz	ea5dddc537	reduce collapse generic (#10045 ) * reduce collapse generic * new arange folder * new range folding * correct with sym * all tests pass * indexing ops passes * failing tests * fix tests, remove unused * revert that * torch indexing is fast * skip on webgpu * touchups * comments	2025-04-26 09:13:24 -04:00
quortus	5cdc96409e	Update outdated renderer.render calls (#10044 )	2025-04-26 07:35:19 -04:00
nimlgen	e055b9422f	am: fix mmap failures (#10054 )	2025-04-26 14:21:28 +03:00

... 39 40 41 42 43 ...

10633 Commits