tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-27 07:48:07 -05:00

Author	SHA1	Message	Date
chenyu	ca5064a5b6	remove Kernel.float4_axis [pr] (#9448 )	2025-03-14 17:54:32 -04:00
chenyu	0e591baf43	redo simple_matmul change (#9450 ) numpy does not support bfloat16	2025-03-14 17:53:52 -04:00
chenyu	b0f63d3c04	Revert "`simple_matmul.py` uses np to generate random (#9438 )" (#9449 ) This reverts commit `14018050c1`.	2025-03-14 17:14:22 -04:00
Ignacio Sica	14018050c1	`simple_matmul.py` uses np to generate random (#9438 ) * np generates randoms * hotfix: use generator for int dtype * float32 as default dtype for float generator * use np.float32 instead of stirng * add dtype= to integers generator * change import _to_np_dtype source	2025-03-14 17:36:50 -03:00
qazal	2a50e6440d	filter sink by DONT_PUSH_VIEWS + remove extra base [pr] (#9446 )	2025-03-14 21:27:46 +01:00
qazal	3af7a08a06	ast_fixup in one graph_rewrite pass [pr] (#9444 )	2025-03-14 20:14:31 +01:00
nimlgen	bd4ae5ac53	am: hotfix: import modules (#9443 ) * am: hotfix: import modules * hmm	2025-03-15 03:10:18 +08:00
nimlgen	77a8430616	am: use smu based on discovery (#9441 )	2025-03-15 02:10:45 +08:00
uuuvn	5ff90cb261	am: less magic values (#9440 )	2025-03-15 02:10:35 +08:00
Ignacio Sica	459d0cd14f	add arch to AMDRenderer and HIPRenderer (#9431 )	2025-03-13 13:06:27 -03:00
nimlgen	357e364ab8	am: turn off unord dispatch (#9433 )	2025-03-13 23:59:28 +08:00
chenyu	99b0287e4e	add GROUP and GROUPTOP to test_arange (#9432 ) it does not grow quadratically, but it's not 0 ops now	2025-03-13 11:28:38 -04:00
qazal	90ffa9bd45	swizzle without buffer ops try 2 [pr] (#9427 ) * add DONT_PUSH_VIEWS to matchers * swizzle without buffer ops try 2 [pr] * swizzle reduceop * simple failing test * fix failing test * s/on/for	2025-03-13 10:00:40 +01:00
qazal	4df2b6347d	hotfix: bump tinybox red training CI timeout to 30 minutes (#9426 )	2025-03-13 09:31:44 +01:00
George Hotz	931436204c	hotfix: 12000 lines, for AMD stuff	2025-03-13 10:48:14 +08:00
George Hotz	bfc68d1953	add gep rules to simplify (#9419 ) * add gep rules to simplify * ws * flipped direction	2025-03-13 09:46:25 +08:00
geohotstan	0bed9b6cd2	benchmark huggingface onnx models (#8493 ) * add ability to ORT=1 * test_vs_ort * useless f * actually have benchmark take in modelproto for more flexibility in huggingface stuff * ok runs * good * oops fix benchmark_onnx __main__ * 224 as default * add ORT=1 option to huggingface_onnx * use Tensor to get_input * add abilty to do single onnx model testing * better names * merge properly... * copy in onnx_helpers * better * decent script * need to add debug tool first * new limit usage * why did narrowing_error come back.. * pretty decent * revert validate change * more ops bug fixes * revert unnecessary changes * fix InstanceNorm too * remove op from O4 * minimize diff * address old feedback * unsure of this, just revert * remove that assert * working attention * to_python_const Attention * cant init from np constant so just do this * final * fix bug in attention * attention clean ups * add hard TODOs and REPOPATH and TRUNCATE envvar * fix input_ids default value * final * fix scatter * cleaner _prepare_quantize * use new attention and tempfile for huggingface script * more stats * update * remove outdated code * big refactor to something usable by CI * booooooom * clean up * update to using yaml as env var input * add dry run * try * valid pad * use argparser and fix gather bug * ignore all yaml * tiny bit more polish * woah ignoring all yaml was not right * typo * decouple huggingface_onnx_run debug run with huggingface_onnx_download * bug fix for downloading single model * WOOOO ok much better * oops argparse 'required' is an invalid argument for positionals * oops argparse 'required' is an invalid argument for positionals * add assert * fix types --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-12 20:13:12 -04:00
chenyu	4992958dae	update bert beam params (#9423 ) BEAM_MIN_PROGRESS=5 for setup speed	2025-03-12 13:00:41 -04:00
qazal	12978f0d05	reorder contiguous/assign ast rules [pr] (#9420 ) * apply setitem ShapeTracker when creating store [pr] * comments + early contiguous remove * better * linter	2025-03-12 12:13:27 +01:00
George Hotz	5f6d5b057d	expand index isn't grouping by access size (#9418 ) * expand index isn't grouping by access size * split_load_store * scalar vec * +correct_load_store * vectorized and * correct_load_store always * simplify before divides	2025-03-12 17:24:10 +08:00
George Hotz	815ad0b7a8	support load/store grouping in DEVECTORIZE=0 (#9409 )	2025-03-12 11:34:37 +08:00
Priyank Patel	4714c4f9ad	torch backend multigpu - add devices and tests (#9414 ) * add multi-device support and tests * simplify	2025-03-12 11:33:11 +08:00
chenyu	22fc0a2e36	bert sum acc in half (#9412 ) also BS=96	2025-03-11 23:03:15 -04:00
nimlgen	f995b465b8	am: set doorbell offsets to nb (#9413 )	2025-03-12 10:35:47 +08:00
qazal	95e0f069be	hotfix: gitignore *.log [pr] (#9410 )	2025-03-11 21:39:19 +01:00
nimlgen	78ebade125	Merge pull request #9408 from nimlgen/hcq_progress_during_wait hcq: reset timer on progress in singal.wait	2025-03-11 19:40:23 +08:00
George Hotz	e174c6c3bc	new devectorizer (#9331 ) * new devectorizer * lidx * test linearizer passes * fix images * fix unfoldable image load * delete unused * improve fix_unfoldable_image_load * working for image * fixup types * fixup transcendental * cast_vec * cleaner transcendental * skip failing test * err, flip that * not devec * sqrt	2025-03-11 18:47:56 +08:00
qazal	69fac5fe89	Merge pull request #9407 from tinygrad/no_const_after_sym no const/view in schedule sink after sym [pr]	2025-03-11 12:24:09 +02:00
nimlgen	4d09ea4c06	hcq: reset timer on progress in singal.wait	2025-03-11 10:02:14 +00:00
qazal	fa69fd3afc	no const/view in schedule sink after sym [pr]	2025-03-11 10:58:38 +01:00
George Hotz	68f062c8be	cast_vec on transcendental (#9406 )	2025-03-11 17:30:46 +08:00
uuuvn	e85001b6ee	SQTT profiling (#9278 ) * sqtt * docs * multi-device * ProfileSQTTEvent * exec update * 256mb default * don't let people hang their gpus * bitfields from autogen * asic info from mesa * more bitfields from autogen * SQTT_ITRACE_SE_MASK --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-11 13:19:56 +08:00
George Hotz	2780e2027e	devectorize prereqs [pr] (#9404 )	2025-03-11 12:33:29 +08:00
Priyank Patel	beed00eabe	fix torch backend memory leak (#9395 ) * fix leak, realize everything on torch optim step * only realize a subset --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-11 10:48:20 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00
qazal	59dfb234eb	replace hardcoded ast with tensors in TestSwizzle [pr] (#9401 )	2025-03-10 19:33:57 +01:00
Priyank Patel	796c3bbb23	torch: support in-place operations on views (#9371 ) * add torch inplace tests * first set of tests passing * wrap all inplace funcs, add more tests * fixes and wrap more functions * fix all uint8 tests to avoid slow tests * fix the one test * another test, another fix * and one more, works for ddp now * something on contiguous, cleanup --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-03-10 23:29:00 +08:00
qazal	2afc7759a7	sink in kernel op [pr] (#9397 ) * sink in kernel op [pr] * metadata	2025-03-10 13:13:42 +01:00
George Hotz	25847080f0	olmoe (from stream, wip) (#9390 ) * olmoest working (but not) * it's correct * compare ropes * old code wasn't wrong * default device * no metal * fix permute * working * more minimal	2025-03-10 13:46:33 +08:00
geohotstan	1d64c12f2b	add Topk to tensor (#9343 ) * terrible but somewhat working impl * linux behaves differently than macos? * slightly better impl * small clean up; haven't figured this out yet * better * torch has different behavior on linux and macos for duplicated values * add sum docs * fix test * add torch return_type test * add an exception test * wrap_fxn instead, and move op lower in order * better repeated values test * rerun ci	2025-03-09 20:01:42 -04:00
qazal	a1f41fadf6	test_schedule cleanups + add DONT_GROUP_REDUCES [pr] (#9392 ) * test_schedule cleanups + add DONT_GROUP_REDUCES [pr] * replace with test_swizzle_reduceop * delete duplicate tests * test_allow_push_permutes * one kernel tests	2025-03-09 15:01:08 +01:00
wozeparrot	b6fe5ab4dd	fix: correct gfx10 ctl stack size (#9384 )	2025-03-09 13:03:20 +08:00
qazal	456697d0be	always create kernels for assign/contiguous/copy [pr] (#9388 )	2025-03-08 15:32:06 +01:00
qazal	286b480f82	do not replace assign with the offset buffer [pr] (#9387 )	2025-03-08 11:57:44 +01:00
qazal	ecfccdea8e	remove views from the kernel graph minimum diff (#9385 ) * remove views from the kernel graph * notes	2025-03-08 10:14:42 +01:00
qazal	0d2762c010	prep refactor for adding buffer ops last [pr] (#9383 ) * prep refactor for adding buffer ops last [pr] * freeze buffers * add swizzle_reduceop * shape for reduceop_view_right * simpler elementwise_view_right * add shapetracker to const * only const * from process replay	2025-03-08 08:00:14 +01:00
b1tg	bde0347618	amd: support relocatable elf (#9380 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-08 02:21:49 +08:00
nimlgen	243078dda9	am: optimize tlb usage (#9049 ) * am: optimize tlb usage * fxies * comments * tiny	2025-03-07 19:37:29 +03:00
qazal	46720294d6	reorder ScheduleItem creation [pr] (#9379 )	2025-03-07 17:20:53 +01:00
qazal	dc89dae994	remove unmasked valid after swizzles (#9377 )	2025-03-07 16:43:16 +01:00

... 46 47 48 49 50 ...

10490 Commits