tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-06 04:35:00 -05:00

Author	SHA1	Message	Date
chenyu	d89e3c4e08	enable METAL tests now runner is M1 and no fast-math (#3523 )	2024-02-28 14:14:23 -05:00
chenyu	1136e2a82a	`skipIf(not(` -> `skipUnless(` in test_linearizer_failures (#3519 ) if these behaves weirdly in CI might need to disable them in CI	2024-02-28 13:48:47 -05:00
George Hotz	3541602877	hotfix: disable metal graph	2024-02-28 10:33:34 -08:00
George Hotz	c34d382a1e	bump to macos-14 M1 (#3520 ) * bump to macos-14 M1 * bump cache key * no -n auto * jit=2 * real tensor cores	2024-02-28 10:28:25 -08:00
George Hotz	505ac6ac96	Revert "check buffers are seeable by other gpu before transfer (#3504 )" (#3522 ) This reverts commit `db2cf48828`.	2024-02-28 10:26:27 -08:00
nimlgen	db2cf48828	check buffers are seeable by other gpu before transfer (#3504 )	2024-02-28 10:24:50 -08:00
wozeparrot	da32c37346	use hash as key for beam (#3516 ) * feat: use hash as key for beam * feat: bump db version	2024-02-28 10:19:01 -08:00
uuuvn	1f5c24798b	Raise exception if MTLCommandBuffer fails (#3465 )	2024-02-28 10:14:08 -08:00
nimlgen	08ef77c721	hsa multigpu graph (#3403 ) * init hsa multigraph * better handling of accesses to buffers * revert sdma0 only when copies from fd	2024-02-28 09:40:53 -08:00
chenyu	fa88e1d0d0	cleanup lazy reduce (#3517 ) * cleanup lazy reduce removed useless assert now arg is axis and cleaned split logic * stride can be symbolic with int shape	2024-02-28 08:15:01 -05:00
chenyu	2127c1c6c2	test for the split reduce kernel (#3515 ) somehow this was not tested	2024-02-27 21:29:25 -05:00
nimlgen	94b7ac7a29	no cuda compile helper (#3512 )	2024-02-28 01:50:10 +01:00
chenyu	88939c3347	fix Node.max can be symbolic (#3514 ) Also made sure taking max twice can get int.	2024-02-27 17:21:31 -05:00
chenyu	969b57f0fe	enable symbolic_ops and jits test of two vars (#3513 )	2024-02-27 11:17:46 -05:00
wozeparrot	ea4b8e5b1f	feat: don't hardcode the arch (#3511 )	2024-02-27 07:58:03 -05:00
Francis Lam	11da65bccd	test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option (#3455 ) * test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option this allows us to limit the size of the kernel and reduce running times by avoiding ones that take a long time * fix spacing and re-order to put parameters together	2024-02-27 07:34:59 -05:00
qazal	a29cd6d464	run f64 increased precision tests on remu (#3509 ) * run the test in CI * temp: use the pre-release * Revert "temp: use the pre-release" This reverts commit `28e8571421`.	2024-02-26 18:01:07 -05:00
chenyu	b1426f3a4c	cleanup SumNode mod (#3503 )	2024-02-26 11:10:55 -05:00
chenyu	61605ccc69	Remove special case of SumNode div SumNode (#3502 )	2024-02-26 09:42:06 -05:00
Francis Lam	39d75f0d58	test_linearizer_failures: add more METAL examples (#3495 ) these were obtained from running fuzz_linearizer on METAL	2024-02-26 10:19:05 +01:00
chenyu	b154089884	float64 function support for HIP (#3492 ) * float64 function support for HIP * not CI	2024-02-24 09:46:20 -05:00
chenyu	35aff8b0c2	properly exclude PYTHON backend and support of half (#3491 ) should be able to run in CI with python 3.12	2024-02-24 09:22:06 -05:00
David Friehs	2fe98b64bb	fix Tensor.split not passing dim to Tensor.chunk (#3490 )	2024-02-24 07:53:11 -05:00
Caleb Bunch	b41761488d	change specific string 'CLANG' to DEVICE variable in abstractions2.py (#3488 )	2024-02-24 07:51:39 -05:00
chenyu	c032df520b	minor symbolic type related cleaups (#3489 )	2024-02-23 16:44:43 -05:00
Carson Radtke	15df9406d6	fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test (#3487 ) * fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test * sqrt(0) != nan * fix tabs	2024-02-23 18:28:00 +01:00
nimlgen	52567da07f	jit grapher simplified (#3478 )	2024-02-23 16:20:16 +01:00
George Hotz	2113e1eb63	move all reduces to the end in lazy (#3475 ) * move all reduces to the end in lazy * apply as reshape, not permute	2024-02-23 15:49:11 +01:00
David Hou	5cfcc2a8d7	support MLB reshaping on-axis for evenly sharded (#3484 ) * support MLB reshaping on-axis for evenly sharded * update test * not -> !=	2024-02-23 07:51:36 -05:00
chenyu	358a24eae6	symbolic use mod for rmod and use floordiv for rfloordiv (#3485 )	2024-02-23 01:05:13 -05:00
nimlgen	6d048a0c0b	cache collector optimizations are allowed only for kernel operations (#3476 )	2024-02-22 12:26:57 +01:00
George Hotz	7698781389	Revert "wmma: add CUDA tensor core (#3464 )" (#3474 ) This reverts commit `e9cef13f0b`.	2024-02-22 11:58:16 +01:00
Francis Lam	e9cef13f0b	wmma: add CUDA tensor core (#3464 )	2024-02-22 11:57:08 +01:00
wozeparrot	57678012e1	Upload correct benchmark artifact (#3471 ) * fix: correct filename * fix: why is this .py?	2024-02-22 01:14:16 -05:00
chenyu	ab40c0cf93	clean up long lines in symbolic (#3469 )	2024-02-21 21:57:44 -05:00
chenyu	7c0fc40123	enable test IMAGE=2 PYTHON=1 python3 test/test_ops.py TestOps.test_simple_conv2d (#3468 )	2024-02-21 18:30:12 -05:00
chenyu	77d2a4c12a	regenerate kernel dataset after reduce arg to axis change (#3467 ) ``` ./extra/optimization/generate_dataset.sh gzip /tmp/sops mv /tmp/sops.gz extra/datasets/ ```	2024-02-21 18:16:13 -05:00
David Hou	f513c37e64	support same uidx in multiple shape positions (#3205 ) * support same uidx in multiple shape positions * rename var * update comment * add contiguous index check to global_store too * update comment * small change * is this better? * smh * smaller change? * get rid of more changes * get rid of more changes * is this even making anything better * comment * fix test --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-21 19:37:03 +01:00
chenyu	1eb24af63b	fix softmax and log_softmax for 0d tensor (#3463 ) matched torch to take axis \in [-1, 0] and used axis=None internally	2024-02-21 11:30:30 -05:00
George Hotz	871ba73e65	_reduce_op is axis based now (#3462 ) * _reduce_op is axis based now * axis_ * update lin failures * disable that * fix shape	2024-02-21 16:36:31 +01:00
George Hotz	22a90cbb15	change frontend reduce API to use axis (#3460 ) * change frontend API to axis * switch lazy to also take axis input	2024-02-21 12:26:17 +01:00
chenyu	6c1063ba39	add mypy --strict-equality to pre-commit (#3458 ) matched ci mypy behavior	2024-02-21 03:41:05 -05:00
chenyu	02683a8659	gate the cast before movements in lazy (#3452 ) it made gpt2 slower (2ms -> 2.5ms on 3090, 7ms -> 8ms on M1 Max with BEAM=2). disabled it in gpt2 benchmark before understanding the full issue	2024-02-20 09:36:22 -05:00
chenyu	0d326a48b8	fix LtNode simplification when lhs and rhs contain same variables (#3451 ) * fix LtNode simplification when lhs and rhs contain same variables `(Variable("a", 1, 5) < Variable("a", 1, 5))` should eval to `NumNode(0)` * fix with less perf impact	2024-02-20 09:06:55 -05:00
George Hotz	1b6e890ef2	uops flop counter (#3373 ) * factor out winograd functions * test counter * uops flop counter * more correct * ish * correct * cleanup * tests for uops flop counter * tests still fail * fix symbolic uops flop cnt * fix symbolic uops flop cnt * hmm, it's an alu * uops alu resolve * relax that	2024-02-20 09:36:30 +01:00
Patrick Tsai	9dd64b1f5f	Fix python cast uint/int overflow (#3448 ) * Fix numpy uint/int overflow * lol * Works * Update * Move overflow test to float64/float32 * One line * Update * One more --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-02-20 09:20:43 +01:00
qazal	7864fb69d1	delete MovementOps (#3434 ) * delete MovementOps * keep extra/to_movement_ops.py	2024-02-19 23:21:44 +01:00
nimlgen	015d414786	fix gpu page fault by ensuring code memory persistence during execution (#3435 ) * fix pf for exec image memory * no new noqa: E501	2024-02-19 13:40:53 +01:00
Daniel Yeh	0a4029c519	fix path to models folder (#3442 ) Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de>	2024-02-19 13:35:57 +01:00
Patrick Tsai	ac9d94a068	Cast correctly in python emulator (dtype tests pass) (#3446 ) * Cast correctly in python emulator * Update test yml and fix lint * make ruff pass * mypy passes --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-02-19 13:34:02 +01:00

1 2 3 4 5 ...

3651 Commits