tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
nimlgen	346b8542da	nv: fix inval from gpu_get_id_info_v2 (#10670 )	2025-06-07 00:54:32 +03:00
chenyu	bdede4924e	fix odd number in get_test_global_size (#10671 ) factor might not be a integer if input global_size has an odd number in it	2025-06-06 17:31:35 -04:00
George Hotz	bf4ffc054c	mstack replaces scheduler complexity (#10654 ) * mstack replaces scheduler complexity * leave that one * contiguous * work * upd * minimal failing test * simpler * attention is broken * fix transformer * failing tests * real fix for llama * kv cache test * jit multi assign test * better tests * comment * fix jit issue * traverse after buf_uop	2025-06-06 11:31:41 -07:00
George Hotz	7f0f97aa76	new test_multitensor tests (#10667 ) * new test_multitensor tests * cleanup scheduler	2025-06-06 10:26:28 -07:00
qazal	5170f387b3	remove UOp.metaop [pr] (#10664 ) * little simpler UOp.const_like [pr] * remove UOp.metaop * bind * remove * min diff * that comment is fine	2025-06-06 16:21:48 +03:00
chenyu	4a6d84c4c3	hotfix llama start_pos vmax is max_context-1 (#10659 ) * hotfix llama start_pos vmax is max_context-1 fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0` * hotfix: multitensor transformer test tests kv cache --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-06-06 00:41:25 -04:00
George Hotz	5eb6e1e65a	Revert "hotfix: multitensor transformer test tests kv cache" This reverts commit `ad9f88419a`.	2025-06-05 21:15:34 -07:00
George Hotz	ad9f88419a	hotfix: multitensor transformer test tests kv cache	2025-06-05 21:08:57 -07:00
George Hotz	8325c4f192	tests for multi assign (#10658 ) * tests for multi assign * transformer tests * add that assert	2025-06-05 20:56:40 -07:00
wozeparrot	0d86f8d375	fix failed threefry (#10646 )	2025-06-05 17:17:42 -07:00
chenyu	e67642d430	update doc example for multinomial (#10657 ) also added many `s` for consistency	2025-06-05 20:16:52 -04:00
Eitan Turok	61352b8aa2	Add some more docs (#10634 ) * more docs * Add multinomial to ops * better doc	2025-06-05 19:40:37 -04:00
qazal	884b6cf288	remove gbarrier on const (#10656 )	2025-06-06 02:36:52 +03:00
chenyu	ff1aad7b69	fix const float pow to int tensor (#10655 ) was incorrectly casted into int	2025-06-05 19:15:12 -04:00
George Hotz	6619f17e26	force store to be contiguous (#10652 )	2025-06-05 15:42:54 -07:00
wozeparrot	37e1ef1be3	feat: cleanup old AM processes (#10653 )	2025-06-05 15:41:00 -07:00
George Hotz	baba274a76	minimal mstack pr to fix allreduce (#10649 ) * minimal mstack pr to fix allreduce * fix webgpu	2025-06-05 15:14:53 -07:00
George Hotz	4c315f8e17	MSTACK little non-functional changes (#10648 )	2025-06-05 13:20:22 -07:00
b1tg	79d04d1baf	AMD_LLVM: support mfma for mi300x (#10625 ) * amd llvm: support mfma for mi300x * don't pass self * refactor wmma render * arch as lambda arg --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-06-05 15:55:44 -04:00
chenyu	46811d0d3c	minor external_model_benchmark cleanup (#10644 )	2025-06-05 14:13:28 -04:00
qazal	26afbc954f	delete redundant tests from test_schedule [pr] (#10643 )	2025-06-05 20:08:39 +03:00
chenyu	80ebce421d	remove metal buffer limit in external_model_benchmark [pr] (#10642 ) not needed anymore	2025-06-05 13:00:51 -04:00
qazal	28c4997236	check for matching shape order in fused reduce (#10641 ) * failing test * shapes match with ones removed	2025-06-05 19:37:22 +03:00
qazal	1190062812	prevent grouper can_chase while fusing arange [pr] (#10623 )	2025-06-05 18:50:21 +03:00
uuuvn	69f7778985	refactor renderer launch bounds [pr] (#10617 )	2025-06-05 08:38:04 -07:00
qazal	8c5ea00522	push permutes through fused reduces (#10628 ) * fix pushing reshapes through reduceops * reduceop_view_right should assert on ndims mismatch * update that, view.reshape asserts it	2025-06-05 16:14:04 +03:00
qazal	8db0ba1161	simpler swizzle_reducop + comments [pr] (#10638 )	2025-06-05 13:54:49 +03:00
qazal	ed37f29184	remove unused lib directory from viz setup [pr] (#10639 )	2025-06-05 13:54:31 +03:00
chenyu	f6d7db25b7	simpler unbind_view [pr] (#10636 )	2025-06-05 01:03:27 -04:00
chenyu	d0969f5a1f	cleanup multi tests (#10635 )	2025-06-05 00:28:44 -04:00
qazal	571c0296a9	linearizer failure from FUSE_ARANGE default diff (#10629 ) * start with test_arange_sum * test_arange_avgpool2d * device.renderer.supports_float4	2025-06-04 19:11:52 +03:00
qazal	5056d21b29	add failing TestSchedule.test_arange_sum [pr] (#10627 )	2025-06-04 17:23:59 +03:00
gill	9acaa6bc9a	Fix button layout in viz UI for safari (#10621 ) Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-06-04 15:33:22 +03:00
Xingyu	7a1bfb668d	Implement linalg_eigh function for tensor eigenvalue decomposition in torch backend (#10612 ) * Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend * Add unit test for linalg.eigh function in TestTorchBackend This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.	2025-06-04 07:59:50 -04:00
qazal	7114b6ab31	viz browser tests (#10626 ) * viz browser tests * expect failure if js/ isn't included * back green	2025-06-04 14:58:24 +03:00
Fang-Pen Lin	b0913295d2	Add missing js files in python package data for viz (#10624 )	2025-06-04 10:49:43 +03:00
wozeparrot	4d1686f767	clean: becnhmark -> benchmark (#10620 )	2025-06-03 19:28:18 -07:00
chenyu	18e9ec3ea1	add wino cifar to search benchmark (#10615 ) * add wino cifar to search benchmark * FUSE_OPTIM=1 * revert those	2025-06-03 20:38:43 -04:00
Bhavya Gada	bafd0c30d7	fix some minor typos and grammar (#10619 )	2025-06-03 15:55:25 -07:00
nimlgen	4381b54543	am: disable page migration (#10608 ) * am: disable page migration * fixed * enable * fxi * typ * fix check	2025-06-03 18:51:28 +03:00
chenyu	1c1f578490	DISABLE_COMPILER_CACHE in sdxl search (#10614 )	2025-06-03 09:22:25 -04:00
qazal	ce9f12dc13	reorder cast before masking constants (#10609 ) * failing test from fuzzer * .numpy() handles bfloat16 better * const->view->cast becomes const->cast->view * update TestMovedConstFolding.test_cast_padded	2025-06-03 15:44:03 +03:00
qazal	910cabb081	add kernel count to grouper process replay differ [pr] (#10611 )	2025-06-03 15:21:27 +03:00
chenyu	26dee71bc1	hotfix don't overwrite acc dtype in scatter_reduce (#10606 ) dtype is inferred by individul reduce	2025-06-02 21:17:01 -04:00
ihar	ba02a6331e	removed unnecessary 'isinstance(data, UOp)' check (#10605 )	2025-06-02 20:58:14 -04:00
nimlgen	07de095b27	am: more info on PFs (#10602 ) * am: more info on PFs * fix	2025-06-02 23:48:40 +03:00
qazal	b8fb2ba829	rename to finalize_gbarrier [pr] (#10596 )	2025-06-02 12:55:31 +03:00
Ahmed Harmouche	650404a143	[webgpu] Proper shared mem size for packed types (#10585 ) * Proper shared mem size in webgpu * Add test * Refactor test	2025-06-01 20:18:33 -04:00
qazal	00822603ec	allow stacking of VIEW UOps [pr] (#10532 ) * allow stacking of VIEW UOps [pr] * merge_views is first * simpler * loc for pr, this needs a helper * keep * diff [pr] * formatting	2025-06-01 23:27:04 +03:00
qazal	3cc73a0172	simpler process replay main loop [pr] (#10588 ) * simpler process replay main loop [pr] * use logging * default to 1	2025-06-01 15:03:21 +03:00

1 2 3 4 5 ...

9054 Commits