tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
George Hotz	48918fa75a	fix disktensor offset issue (#3532 )	2024-02-28 17:22:17 -08:00
Caleb Bunch	0b1fc5888a	fix 'Import Error: cannot import name compile_cuda from tinygrad.runtime.ops_cuda' error in extra/gemm/cuda_matmul.py (#3531 )	2024-02-28 17:15:32 -08:00
David Friehs	275971e616	fix: align .split, .chunk and .unsqueeze with torch, add fuzz tests (#3505 ) this fixes .split where self.shape[dim] is not perfectly divisible by sizes - .chunk is always the wrong choice here: - tensor((5,)).split(4) should result in (tensor((4,)), tensor((1,))) was (tensor((3,)), tensor((2,))) this also fixes issues in .split and .chunk where tensors with shape[dim]==0 lead to empty tuples/lists when the tensor itself should have been returned instead because tinygrad is expected to fail in all cases where torch fails tinygrad will now be strict regarding sizes having to sum up to passed dimension in .split, num having to be non-null for .chunk and only allowing valid dims in .unsqueeze	2024-02-28 17:06:39 -08:00
George Hotz	e7cda40d52	Revert "hotfix: disable metal graph" This reverts commit `3541602877`.	2024-02-28 16:25:12 -08:00
George Hotz	42eb8de0d4	Revert "move all reduces to the end in lazy (#3475 )" (#3529 ) This reverts commit `2113e1eb63`. Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-28 16:24:10 -08:00
chenyu	0c6846f9fc	failed test case for disk tensor assign into dtype int64 (#3527 ) failed case for #3510, mark as expectedFailure for now	2024-02-28 17:52:21 -05:00
chenyu	d89e3c4e08	enable METAL tests now runner is M1 and no fast-math (#3523 )	2024-02-28 14:14:23 -05:00
chenyu	1136e2a82a	`skipIf(not(` -> `skipUnless(` in test_linearizer_failures (#3519 ) if these behaves weirdly in CI might need to disable them in CI	2024-02-28 13:48:47 -05:00
George Hotz	3541602877	hotfix: disable metal graph	2024-02-28 10:33:34 -08:00
George Hotz	c34d382a1e	bump to macos-14 M1 (#3520 ) * bump to macos-14 M1 * bump cache key * no -n auto * jit=2 * real tensor cores	2024-02-28 10:28:25 -08:00
George Hotz	505ac6ac96	Revert "check buffers are seeable by other gpu before transfer (#3504 )" (#3522 ) This reverts commit `db2cf48828`.	2024-02-28 10:26:27 -08:00
nimlgen	db2cf48828	check buffers are seeable by other gpu before transfer (#3504 )	2024-02-28 10:24:50 -08:00
wozeparrot	da32c37346	use hash as key for beam (#3516 ) * feat: use hash as key for beam * feat: bump db version	2024-02-28 10:19:01 -08:00
uuuvn	1f5c24798b	Raise exception if MTLCommandBuffer fails (#3465 )	2024-02-28 10:14:08 -08:00
nimlgen	08ef77c721	hsa multigpu graph (#3403 ) * init hsa multigraph * better handling of accesses to buffers * revert sdma0 only when copies from fd	2024-02-28 09:40:53 -08:00
chenyu	fa88e1d0d0	cleanup lazy reduce (#3517 ) * cleanup lazy reduce removed useless assert now arg is axis and cleaned split logic * stride can be symbolic with int shape	2024-02-28 08:15:01 -05:00
chenyu	2127c1c6c2	test for the split reduce kernel (#3515 ) somehow this was not tested	2024-02-27 21:29:25 -05:00
nimlgen	94b7ac7a29	no cuda compile helper (#3512 )	2024-02-28 01:50:10 +01:00
chenyu	88939c3347	fix Node.max can be symbolic (#3514 ) Also made sure taking max twice can get int.	2024-02-27 17:21:31 -05:00
chenyu	969b57f0fe	enable symbolic_ops and jits test of two vars (#3513 )	2024-02-27 11:17:46 -05:00
wozeparrot	ea4b8e5b1f	feat: don't hardcode the arch (#3511 )	2024-02-27 07:58:03 -05:00
Francis Lam	11da65bccd	test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option (#3455 ) * test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option this allows us to limit the size of the kernel and reduce running times by avoiding ones that take a long time * fix spacing and re-order to put parameters together	2024-02-27 07:34:59 -05:00
qazal	a29cd6d464	run f64 increased precision tests on remu (#3509 ) * run the test in CI * temp: use the pre-release * Revert "temp: use the pre-release" This reverts commit `28e8571421`.	2024-02-26 18:01:07 -05:00
chenyu	b1426f3a4c	cleanup SumNode mod (#3503 )	2024-02-26 11:10:55 -05:00
chenyu	61605ccc69	Remove special case of SumNode div SumNode (#3502 )	2024-02-26 09:42:06 -05:00
Francis Lam	39d75f0d58	test_linearizer_failures: add more METAL examples (#3495 ) these were obtained from running fuzz_linearizer on METAL	2024-02-26 10:19:05 +01:00
chenyu	b154089884	float64 function support for HIP (#3492 ) * float64 function support for HIP * not CI	2024-02-24 09:46:20 -05:00
chenyu	35aff8b0c2	properly exclude PYTHON backend and support of half (#3491 ) should be able to run in CI with python 3.12	2024-02-24 09:22:06 -05:00
David Friehs	2fe98b64bb	fix Tensor.split not passing dim to Tensor.chunk (#3490 )	2024-02-24 07:53:11 -05:00
Caleb Bunch	b41761488d	change specific string 'CLANG' to DEVICE variable in abstractions2.py (#3488 )	2024-02-24 07:51:39 -05:00
chenyu	c032df520b	minor symbolic type related cleaups (#3489 )	2024-02-23 16:44:43 -05:00
Carson Radtke	15df9406d6	fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test (#3487 ) * fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test * sqrt(0) != nan * fix tabs	2024-02-23 18:28:00 +01:00
nimlgen	52567da07f	jit grapher simplified (#3478 )	2024-02-23 16:20:16 +01:00
George Hotz	2113e1eb63	move all reduces to the end in lazy (#3475 ) * move all reduces to the end in lazy * apply as reshape, not permute	2024-02-23 15:49:11 +01:00
David Hou	5cfcc2a8d7	support MLB reshaping on-axis for evenly sharded (#3484 ) * support MLB reshaping on-axis for evenly sharded * update test * not -> !=	2024-02-23 07:51:36 -05:00
chenyu	358a24eae6	symbolic use mod for rmod and use floordiv for rfloordiv (#3485 )	2024-02-23 01:05:13 -05:00
nimlgen	6d048a0c0b	cache collector optimizations are allowed only for kernel operations (#3476 )	2024-02-22 12:26:57 +01:00
George Hotz	7698781389	Revert "wmma: add CUDA tensor core (#3464 )" (#3474 ) This reverts commit `e9cef13f0b`.	2024-02-22 11:58:16 +01:00
Francis Lam	e9cef13f0b	wmma: add CUDA tensor core (#3464 )	2024-02-22 11:57:08 +01:00
wozeparrot	57678012e1	Upload correct benchmark artifact (#3471 ) * fix: correct filename * fix: why is this .py?	2024-02-22 01:14:16 -05:00
chenyu	ab40c0cf93	clean up long lines in symbolic (#3469 )	2024-02-21 21:57:44 -05:00
chenyu	7c0fc40123	enable test IMAGE=2 PYTHON=1 python3 test/test_ops.py TestOps.test_simple_conv2d (#3468 )	2024-02-21 18:30:12 -05:00
chenyu	77d2a4c12a	regenerate kernel dataset after reduce arg to axis change (#3467 ) ``` ./extra/optimization/generate_dataset.sh gzip /tmp/sops mv /tmp/sops.gz extra/datasets/ ```	2024-02-21 18:16:13 -05:00
David Hou	f513c37e64	support same uidx in multiple shape positions (#3205 ) * support same uidx in multiple shape positions * rename var * update comment * add contiguous index check to global_store too * update comment * small change * is this better? * smh * smaller change? * get rid of more changes * get rid of more changes * is this even making anything better * comment * fix test --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-21 19:37:03 +01:00
chenyu	1eb24af63b	fix softmax and log_softmax for 0d tensor (#3463 ) matched torch to take axis \in [-1, 0] and used axis=None internally	2024-02-21 11:30:30 -05:00
George Hotz	871ba73e65	_reduce_op is axis based now (#3462 ) * _reduce_op is axis based now * axis_ * update lin failures * disable that * fix shape	2024-02-21 16:36:31 +01:00
George Hotz	22a90cbb15	change frontend reduce API to use axis (#3460 ) * change frontend API to axis * switch lazy to also take axis input	2024-02-21 12:26:17 +01:00
chenyu	6c1063ba39	add mypy --strict-equality to pre-commit (#3458 ) matched ci mypy behavior	2024-02-21 03:41:05 -05:00
chenyu	02683a8659	gate the cast before movements in lazy (#3452 ) it made gpt2 slower (2ms -> 2.5ms on 3090, 7ms -> 8ms on M1 Max with BEAM=2). disabled it in gpt2 benchmark before understanding the full issue	2024-02-20 09:36:22 -05:00
chenyu	0d326a48b8	fix LtNode simplification when lhs and rhs contain same variables (#3451 ) * fix LtNode simplification when lhs and rhs contain same variables `(Variable("a", 1, 5) < Variable("a", 1, 5))` should eval to `NumNode(0)` * fix with less perf impact	2024-02-20 09:06:55 -05:00

1 2 3 4 5 ...

3657 Commits