tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 06:58:11 -05:00

Author	SHA1	Message	Date
Mark McLoughlin	2e82c5b7a4	README: ops_cpu and ops_torch have been removed (#3539 ) Removed by pull #3399	2024-02-29 10:22:11 -05:00
nimlgen	b05776ef3e	fix addresses of dispatch packets (#3534 )	2024-02-29 05:43:55 -08:00
geohotstan	9268a8b154	remove MULACC (#3459 ) * init * removed mulacc * is uoptimize the problem? * lol hax make work temporarily fix l8er * revert extra/ changes * clean up * flaky metal tests? * add back mulacc for metal * revert last commit * try skipping linearizer_failure tests * skip flammit tests... cuz tests all work locally * try narrow down exact linearizer failure test * try 2 * try 4 * generated code is the exact same wtf why CI fails * code for 15 and 17 are exact same with or without mulacc, this should pass * try only 1 failure * try garbage collecting lol... * try del variables lol * try gcing after del lol... * is diskcache the problem??? * try disabling opts cache idk * try remove hack * try disable github metal cache... * try CACHELEVEL=0 :D idk anymore * try increase newCommandQueueWithMaxCommandBufferCount_, im almost out of ideas... * revert * actually not a HACK * oops	2024-02-29 07:40:40 -05:00
qazal	94fc0fd546	uop the float4 acc upcast in group_for_reduce kernels (#3466 ) * simplest one * but i can trust this will be cached correctly * wait that was wrong too * cleanup * test_reduce_upcast for single reduce case * a late accumulator always outputs to gds lint	2024-02-28 17:33:47 -08:00
George Hotz	48918fa75a	fix disktensor offset issue (#3532 )	2024-02-28 17:22:17 -08:00
Caleb Bunch	0b1fc5888a	fix 'Import Error: cannot import name compile_cuda from tinygrad.runtime.ops_cuda' error in extra/gemm/cuda_matmul.py (#3531 )	2024-02-28 17:15:32 -08:00
David Friehs	275971e616	fix: align .split, .chunk and .unsqueeze with torch, add fuzz tests (#3505 ) this fixes .split where self.shape[dim] is not perfectly divisible by sizes - .chunk is always the wrong choice here: - tensor((5,)).split(4) should result in (tensor((4,)), tensor((1,))) was (tensor((3,)), tensor((2,))) this also fixes issues in .split and .chunk where tensors with shape[dim]==0 lead to empty tuples/lists when the tensor itself should have been returned instead because tinygrad is expected to fail in all cases where torch fails tinygrad will now be strict regarding sizes having to sum up to passed dimension in .split, num having to be non-null for .chunk and only allowing valid dims in .unsqueeze	2024-02-28 17:06:39 -08:00
George Hotz	e7cda40d52	Revert "hotfix: disable metal graph" This reverts commit `3541602877`.	2024-02-28 16:25:12 -08:00
George Hotz	42eb8de0d4	Revert "move all reduces to the end in lazy (#3475 )" (#3529 ) This reverts commit `2113e1eb63`. Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-28 16:24:10 -08:00
chenyu	0c6846f9fc	failed test case for disk tensor assign into dtype int64 (#3527 ) failed case for #3510, mark as expectedFailure for now	2024-02-28 17:52:21 -05:00
chenyu	d89e3c4e08	enable METAL tests now runner is M1 and no fast-math (#3523 )	2024-02-28 14:14:23 -05:00
chenyu	1136e2a82a	`skipIf(not(` -> `skipUnless(` in test_linearizer_failures (#3519 ) if these behaves weirdly in CI might need to disable them in CI	2024-02-28 13:48:47 -05:00
George Hotz	3541602877	hotfix: disable metal graph	2024-02-28 10:33:34 -08:00
George Hotz	c34d382a1e	bump to macos-14 M1 (#3520 ) * bump to macos-14 M1 * bump cache key * no -n auto * jit=2 * real tensor cores	2024-02-28 10:28:25 -08:00
George Hotz	505ac6ac96	Revert "check buffers are seeable by other gpu before transfer (#3504 )" (#3522 ) This reverts commit `db2cf48828`.	2024-02-28 10:26:27 -08:00
nimlgen	db2cf48828	check buffers are seeable by other gpu before transfer (#3504 )	2024-02-28 10:24:50 -08:00
wozeparrot	da32c37346	use hash as key for beam (#3516 ) * feat: use hash as key for beam * feat: bump db version	2024-02-28 10:19:01 -08:00
uuuvn	1f5c24798b	Raise exception if MTLCommandBuffer fails (#3465 )	2024-02-28 10:14:08 -08:00
nimlgen	08ef77c721	hsa multigpu graph (#3403 ) * init hsa multigraph * better handling of accesses to buffers * revert sdma0 only when copies from fd	2024-02-28 09:40:53 -08:00
chenyu	fa88e1d0d0	cleanup lazy reduce (#3517 ) * cleanup lazy reduce removed useless assert now arg is axis and cleaned split logic * stride can be symbolic with int shape	2024-02-28 08:15:01 -05:00
chenyu	2127c1c6c2	test for the split reduce kernel (#3515 ) somehow this was not tested	2024-02-27 21:29:25 -05:00
nimlgen	94b7ac7a29	no cuda compile helper (#3512 )	2024-02-28 01:50:10 +01:00
chenyu	88939c3347	fix Node.max can be symbolic (#3514 ) Also made sure taking max twice can get int.	2024-02-27 17:21:31 -05:00
chenyu	969b57f0fe	enable symbolic_ops and jits test of two vars (#3513 )	2024-02-27 11:17:46 -05:00
wozeparrot	ea4b8e5b1f	feat: don't hardcode the arch (#3511 )	2024-02-27 07:58:03 -05:00
Francis Lam	11da65bccd	test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option (#3455 ) * test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option this allows us to limit the size of the kernel and reduce running times by avoiding ones that take a long time * fix spacing and re-order to put parameters together	2024-02-27 07:34:59 -05:00
qazal	a29cd6d464	run f64 increased precision tests on remu (#3509 ) * run the test in CI * temp: use the pre-release * Revert "temp: use the pre-release" This reverts commit `28e8571421`.	2024-02-26 18:01:07 -05:00
chenyu	b1426f3a4c	cleanup SumNode mod (#3503 )	2024-02-26 11:10:55 -05:00
chenyu	61605ccc69	Remove special case of SumNode div SumNode (#3502 )	2024-02-26 09:42:06 -05:00
Francis Lam	39d75f0d58	test_linearizer_failures: add more METAL examples (#3495 ) these were obtained from running fuzz_linearizer on METAL	2024-02-26 10:19:05 +01:00
chenyu	b154089884	float64 function support for HIP (#3492 ) * float64 function support for HIP * not CI	2024-02-24 09:46:20 -05:00
chenyu	35aff8b0c2	properly exclude PYTHON backend and support of half (#3491 ) should be able to run in CI with python 3.12	2024-02-24 09:22:06 -05:00
David Friehs	2fe98b64bb	fix Tensor.split not passing dim to Tensor.chunk (#3490 )	2024-02-24 07:53:11 -05:00
Caleb Bunch	b41761488d	change specific string 'CLANG' to DEVICE variable in abstractions2.py (#3488 )	2024-02-24 07:51:39 -05:00
chenyu	c032df520b	minor symbolic type related cleaups (#3489 )	2024-02-23 16:44:43 -05:00
Carson Radtke	15df9406d6	fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test (#3487 ) * fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test * sqrt(0) != nan * fix tabs	2024-02-23 18:28:00 +01:00
nimlgen	52567da07f	jit grapher simplified (#3478 )	2024-02-23 16:20:16 +01:00
George Hotz	2113e1eb63	move all reduces to the end in lazy (#3475 ) * move all reduces to the end in lazy * apply as reshape, not permute	2024-02-23 15:49:11 +01:00
David Hou	5cfcc2a8d7	support MLB reshaping on-axis for evenly sharded (#3484 ) * support MLB reshaping on-axis for evenly sharded * update test * not -> !=	2024-02-23 07:51:36 -05:00
chenyu	358a24eae6	symbolic use mod for rmod and use floordiv for rfloordiv (#3485 )	2024-02-23 01:05:13 -05:00
nimlgen	6d048a0c0b	cache collector optimizations are allowed only for kernel operations (#3476 )	2024-02-22 12:26:57 +01:00
George Hotz	7698781389	Revert "wmma: add CUDA tensor core (#3464 )" (#3474 ) This reverts commit `e9cef13f0b`.	2024-02-22 11:58:16 +01:00
Francis Lam	e9cef13f0b	wmma: add CUDA tensor core (#3464 )	2024-02-22 11:57:08 +01:00
wozeparrot	57678012e1	Upload correct benchmark artifact (#3471 ) * fix: correct filename * fix: why is this .py?	2024-02-22 01:14:16 -05:00
chenyu	ab40c0cf93	clean up long lines in symbolic (#3469 )	2024-02-21 21:57:44 -05:00
chenyu	7c0fc40123	enable test IMAGE=2 PYTHON=1 python3 test/test_ops.py TestOps.test_simple_conv2d (#3468 )	2024-02-21 18:30:12 -05:00
chenyu	77d2a4c12a	regenerate kernel dataset after reduce arg to axis change (#3467 ) ``` ./extra/optimization/generate_dataset.sh gzip /tmp/sops mv /tmp/sops.gz extra/datasets/ ```	2024-02-21 18:16:13 -05:00
David Hou	f513c37e64	support same uidx in multiple shape positions (#3205 ) * support same uidx in multiple shape positions * rename var * update comment * add contiguous index check to global_store too * update comment * small change * is this better? * smh * smaller change? * get rid of more changes * get rid of more changes * is this even making anything better * comment * fix test --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-21 19:37:03 +01:00
chenyu	1eb24af63b	fix softmax and log_softmax for 0d tensor (#3463 ) matched torch to take axis \in [-1, 0] and used axis=None internally	2024-02-21 11:30:30 -05:00
George Hotz	871ba73e65	_reduce_op is axis based now (#3462 ) * _reduce_op is axis based now * axis_ * update lin failures * disable that * fix shape	2024-02-21 16:36:31 +01:00

1 2 3 4 5 ...

3661 Commits