tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 23:18:04 -05:00

Author	SHA1	Message	Date
qazal	cde4fd3be3	do not view_left assign + elementwise sources always have a shape [pr] (#9491 )	2025-03-18 17:42:51 +08:00
George Hotz	117b7a16ef	VALIDATE_WITH_CPU [pr] (#9488 ) * VALIDATE_WITH_CPU [pr] * fix test	2025-03-18 15:15:04 +08:00
qazal	935cd01f56	simple failing test for graph_rewrite children [pr] (#9489 ) * simple failing test for graph_rewrite children [pr] * lint * update too	2025-03-18 13:07:21 +08:00
George Hotz	d20494e6d7	move buffer logic to Buffer [pr] (#9487 ) * move buffer logic to Buffer [pr] * pass shape into as_typed_buffer * pass shape into as_typed_buffer * work * cleaner * fix tests	2025-03-18 11:21:21 +08:00
qazal	3be228182f	unbind Tensor variables last [pr] (#9486 ) * reorder do_realize [pr] * move merge_views * unbind all variables at the end [pr]	2025-03-18 09:52:01 +08:00
qazal	b44f9c409a	reorder do_realize [pr] (#9485 ) * reorder do_realize [pr] * move merge_views	2025-03-18 09:30:10 +08:00
nimlgen	a82c9332d3	am: rename soc21 to soc (#9482 )	2025-03-18 08:54:26 +08:00
qazal	b100fc0b20	split the rule that uses context in scheduler simplifier [pr] (#9484 ) * split the rule that uses context in scheduler simplifier [pr] * add	2025-03-18 08:12:26 +08:00
Anish Umale	5e58f4b65b	Tiny backend test_ops fix part 3 (#9483 ) * extract straightforward things from https://github.com/tinygrad/tinygrad/pull/9302 * pass dtype and device for ones_like	2025-03-17 18:01:51 -04:00
TJ	9fcef4d009	add masked_select to tensor.py (#9468 ) * add masked_select to tensor.py * fix tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-17 16:05:36 -04:00
chenyu	4f8eac59ea	failed test case for threefry (#9469 ) * failed test case for threefry not sure if it's always like this, but increment before _threefry_random_bits is incorrect. the counts should start with random numbers generated so far. use jax to generate 20 + 20 + 10 random numbers, the first 20 + 20 matches and the last 10 are different. just moving increment after _threefry_random_bits matches the number but jit test failes * workaround * why is this different? * revert those * and that	2025-03-17 14:52:10 -04:00
b1tg	6dd8e5ba7c	refactor llvm compiler (#9403 ) * refactor LLVMCompiler * new interface * automatic configuration --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-18 00:13:49 +08:00
geohotstan	53d6f1e1bb	Add bitonic cat sort (#9422 ) * poc * repeated values fail, sigh * is this being timed out? * fix up down names * bitonic v2, does this run? * bitonic v3, faster * bitonic v3.1, faster * bitonic v3.1.1, same speed unlucky * support dim and indices * bitonic v3.2, simpler code, TODO repeated indices * bruv gimme green for once cmon * cat (stack) implementation, slow but maybe one day when cat is fast meow * revert to v3.2 * bitonic v4, who let the cats out edition * clean up variable names * figured out repeated indices :D * ruff check --fix * use sort for topk * add Tensor.sort everywhere * fix docs and add some types * slightly better variable names * am I doing torch inplace correctly? * delegate sort to values_stable * add a contig, faster first sort * maybe don't test_inplace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-17 12:01:23 -04:00
chenyu	f53be010d7	lower bert learning rate (#9481 ) slightly better. first sub 3hr run https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/0or96ink/overview	2025-03-17 10:49:56 -04:00
qazal	e03c0aacf2	more explicit DONT_PUSH_VIEWS [pr] (#9479 ) * more explicit DONT_PUSH_VIEWS [pr] * update tests to not handcode ast * lint * test_recursive_swizzle and test_simple_store_reshape	2025-03-17 20:43:21 +08:00
qazal	3b00a778ba	fix view_left for unsafe pad ops [pr] (#9478 )	2025-03-17 19:02:02 +08:00
qazal	813f713edc	merge_views for buffer ops + create valids last (#9472 ) * merge_views for buffer ops + create valids last * view.arg * pass	2025-03-17 17:15:44 +08:00
qazal	bd1f71c1e2	simple failing test for extra ops in VALID [pr] (#9474 ) * simple failing test for extra valids [pr] * this has DEBUG=4	2025-03-17 17:02:40 +08:00
qazal	e26caf4c3a	hotfix: skip test_mean_half_precision_underflow on amd ci (#9476 ) The global size is very large (781250 gidx) and the emulated version takes more than 1 minute to execute the kernel.	2025-03-17 16:47:48 +08:00
George Hotz	824c5f41ac	dsp work try 3 (#9475 ) * dsp work try 3 * padding	2025-03-17 16:42:12 +08:00
George Hotz	242daa4f9a	ptrcat (#9473 )	2025-03-17 16:06:37 +08:00
George Hotz	52ae9af4dd	Fast DSP for MobileNetV2 (try 2) (#9467 ) * Fast DSP for MobileNetV2 (try 2) * enable fast path on uchar * fix tests	2025-03-17 15:10:36 +08:00
George Hotz	15ee742afa	add get_children_map to uop (#9470 ) * add get_children_map to uop * update_children * fix new children	2025-03-17 14:36:13 +08:00
chenyu	d2cfbd8a4d	bert lower learning rate and total steps (#9466 ) closer to the other submission with BS=240. converged with 10% less epochs	2025-03-16 17:21:20 -04:00
George Hotz	09e7708b49	minimum change for rdna4 [pr] (#9455 )	2025-03-16 13:39:24 +08:00
qazal	be2161652b	reorder into swizzler + ast_fixup [pr] (#9456 )	2025-03-15 09:00:14 +01:00
George Hotz	cb7a7f69c7	quantization preprocessor from DSP, should be universal (#9437 ) * quantization preprocessor from DSP, should be universal * touchups * fix tests	2025-03-15 07:49:37 +08:00
chenyu	ca5064a5b6	remove Kernel.float4_axis [pr] (#9448 )	2025-03-14 17:54:32 -04:00
chenyu	0e591baf43	redo simple_matmul change (#9450 ) numpy does not support bfloat16	2025-03-14 17:53:52 -04:00
chenyu	b0f63d3c04	Revert "`simple_matmul.py` uses np to generate random (#9438 )" (#9449 ) This reverts commit `14018050c1`.	2025-03-14 17:14:22 -04:00
Ignacio Sica	14018050c1	`simple_matmul.py` uses np to generate random (#9438 ) * np generates randoms * hotfix: use generator for int dtype * float32 as default dtype for float generator * use np.float32 instead of stirng * add dtype= to integers generator * change import _to_np_dtype source	2025-03-14 17:36:50 -03:00
qazal	2a50e6440d	filter sink by DONT_PUSH_VIEWS + remove extra base [pr] (#9446 )	2025-03-14 21:27:46 +01:00
qazal	3af7a08a06	ast_fixup in one graph_rewrite pass [pr] (#9444 )	2025-03-14 20:14:31 +01:00
nimlgen	bd4ae5ac53	am: hotfix: import modules (#9443 ) * am: hotfix: import modules * hmm	2025-03-15 03:10:18 +08:00
nimlgen	77a8430616	am: use smu based on discovery (#9441 )	2025-03-15 02:10:45 +08:00
uuuvn	5ff90cb261	am: less magic values (#9440 )	2025-03-15 02:10:35 +08:00
Ignacio Sica	459d0cd14f	add arch to AMDRenderer and HIPRenderer (#9431 )	2025-03-13 13:06:27 -03:00
nimlgen	357e364ab8	am: turn off unord dispatch (#9433 )	2025-03-13 23:59:28 +08:00
chenyu	99b0287e4e	add GROUP and GROUPTOP to test_arange (#9432 ) it does not grow quadratically, but it's not 0 ops now	2025-03-13 11:28:38 -04:00
qazal	90ffa9bd45	swizzle without buffer ops try 2 [pr] (#9427 ) * add DONT_PUSH_VIEWS to matchers * swizzle without buffer ops try 2 [pr] * swizzle reduceop * simple failing test * fix failing test * s/on/for	2025-03-13 10:00:40 +01:00
qazal	4df2b6347d	hotfix: bump tinybox red training CI timeout to 30 minutes (#9426 )	2025-03-13 09:31:44 +01:00
George Hotz	931436204c	hotfix: 12000 lines, for AMD stuff	2025-03-13 10:48:14 +08:00
George Hotz	bfc68d1953	add gep rules to simplify (#9419 ) * add gep rules to simplify * ws * flipped direction	2025-03-13 09:46:25 +08:00
geohotstan	0bed9b6cd2	benchmark huggingface onnx models (#8493 ) * add ability to ORT=1 * test_vs_ort * useless f * actually have benchmark take in modelproto for more flexibility in huggingface stuff * ok runs * good * oops fix benchmark_onnx __main__ * 224 as default * add ORT=1 option to huggingface_onnx * use Tensor to get_input * add abilty to do single onnx model testing * better names * merge properly... * copy in onnx_helpers * better * decent script * need to add debug tool first * new limit usage * why did narrowing_error come back.. * pretty decent * revert validate change * more ops bug fixes * revert unnecessary changes * fix InstanceNorm too * remove op from O4 * minimize diff * address old feedback * unsure of this, just revert * remove that assert * working attention * to_python_const Attention * cant init from np constant so just do this * final * fix bug in attention * attention clean ups * add hard TODOs and REPOPATH and TRUNCATE envvar * fix input_ids default value * final * fix scatter * cleaner _prepare_quantize * use new attention and tempfile for huggingface script * more stats * update * remove outdated code * big refactor to something usable by CI * booooooom * clean up * update to using yaml as env var input * add dry run * try * valid pad * use argparser and fix gather bug * ignore all yaml * tiny bit more polish * woah ignoring all yaml was not right * typo * decouple huggingface_onnx_run debug run with huggingface_onnx_download * bug fix for downloading single model * WOOOO ok much better * oops argparse 'required' is an invalid argument for positionals * oops argparse 'required' is an invalid argument for positionals * add assert * fix types --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-12 20:13:12 -04:00
chenyu	4992958dae	update bert beam params (#9423 ) BEAM_MIN_PROGRESS=5 for setup speed	2025-03-12 13:00:41 -04:00
qazal	12978f0d05	reorder contiguous/assign ast rules [pr] (#9420 ) * apply setitem ShapeTracker when creating store [pr] * comments + early contiguous remove * better * linter	2025-03-12 12:13:27 +01:00
George Hotz	5f6d5b057d	expand index isn't grouping by access size (#9418 ) * expand index isn't grouping by access size * split_load_store * scalar vec * +correct_load_store * vectorized and * correct_load_store always * simplify before divides	2025-03-12 17:24:10 +08:00
George Hotz	815ad0b7a8	support load/store grouping in DEVECTORIZE=0 (#9409 )	2025-03-12 11:34:37 +08:00
Priyank Patel	4714c4f9ad	torch backend multigpu - add devices and tests (#9414 ) * add multi-device support and tests * simplify	2025-03-12 11:33:11 +08:00
chenyu	22fc0a2e36	bert sum acc in half (#9412 ) also BS=96	2025-03-11 23:03:15 -04:00

1 2 3 4 5 ...

8167 Commits