tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
chenyu	6c1063ba39	add mypy --strict-equality to pre-commit (#3458 ) matched ci mypy behavior	2024-02-21 03:41:05 -05:00
chenyu	02683a8659	gate the cast before movements in lazy (#3452 ) it made gpt2 slower (2ms -> 2.5ms on 3090, 7ms -> 8ms on M1 Max with BEAM=2). disabled it in gpt2 benchmark before understanding the full issue	2024-02-20 09:36:22 -05:00
chenyu	0d326a48b8	fix LtNode simplification when lhs and rhs contain same variables (#3451 ) * fix LtNode simplification when lhs and rhs contain same variables `(Variable("a", 1, 5) < Variable("a", 1, 5))` should eval to `NumNode(0)` * fix with less perf impact	2024-02-20 09:06:55 -05:00
George Hotz	1b6e890ef2	uops flop counter (#3373 ) * factor out winograd functions * test counter * uops flop counter * more correct * ish * correct * cleanup * tests for uops flop counter * tests still fail * fix symbolic uops flop cnt * fix symbolic uops flop cnt * hmm, it's an alu * uops alu resolve * relax that	2024-02-20 09:36:30 +01:00
Patrick Tsai	9dd64b1f5f	Fix python cast uint/int overflow (#3448 ) * Fix numpy uint/int overflow * lol * Works * Update * Move overflow test to float64/float32 * One line * Update * One more --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-02-20 09:20:43 +01:00
qazal	7864fb69d1	delete MovementOps (#3434 ) * delete MovementOps * keep extra/to_movement_ops.py	2024-02-19 23:21:44 +01:00
nimlgen	015d414786	fix gpu page fault by ensuring code memory persistence during execution (#3435 ) * fix pf for exec image memory * no new noqa: E501	2024-02-19 13:40:53 +01:00
Daniel Yeh	0a4029c519	fix path to models folder (#3442 ) Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de>	2024-02-19 13:35:57 +01:00
Patrick Tsai	ac9d94a068	Cast correctly in python emulator (dtype tests pass) (#3446 ) * Cast correctly in python emulator * Update test yml and fix lint * make ruff pass * mypy passes --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-02-19 13:34:02 +01:00
chenyu	ddec76e9c4	remove unused LtNode.__floordiv__ (#3445 )	2024-02-18 22:12:54 -05:00
chenyu	86efdf0b34	remove create_rednode (#3444 ) handle Node collapsing into NumNode similar to OpNode	2024-02-18 21:08:19 -05:00
chenyu	2da734920e	use __getnewargs__ to fix unpickling Variable (#3441 ) it's recommended to use __getnewargs__ to update the args of classes that use __new__ when unpickling. It's preferred because it does not change the __new__ behavior.	2024-02-18 10:28:37 -05:00
nimlgen	5647148937	fix hip invalid ordinal (#3440 )	2024-02-18 08:31:44 -05:00
chenyu	8c0e85fdaf	limit symbolic substitute var_vals to have NumNode or Variable (#3438 ) this can greatly reduce the posiible output types of substitute	2024-02-18 01:29:44 -05:00
George Hotz	6b4f734dc1	hotfix: better copy stats	2024-02-16 16:52:39 +01:00
George Hotz	c7fda10aa0	hotfix: disk doesn't sync	2024-02-16 16:46:48 +01:00
chenyu	230fc33d5b	limit sint to be Union[int, Variable, MulNode, SumNode] (#3430 ) * limit sint to be Union[int, Variable, MulNode, SumNode] these are the only allowed nodes in a Tensor shape * stride can be sint	2024-02-16 10:05:46 -05:00
George Hotz	fe97a85014	the compiler is a driver (#3427 )	2024-02-16 10:18:09 +01:00
zku	2d702ca073	If feasible, do not truncate float64 down to float32 in cstyle renderer (#3420 ) * do not truncate float64 precision * use l suffix to try avoid overload confusion * long line, ruff bloats the function otherwise * fmt * remove long double suffix (l), it's sufficient to have the float32 (f) suffix to avoid function overload ambigouity; add test showcasing rtol=1e-12 precision increase, the test fails without the renderer changes * use more reasonable test values, same as test_int_to_float_unary_func * disable test for CUDACPU, does not support half and segfaults on some operations per dtypes_alu test * disable test for HIP, renderer does not support f64 precision * do not use noqa E501, break up condition	2024-02-16 10:08:59 +01:00
chenyu	30f26279c5	add back "CPU" in test_onnx_backend supports_device (#3426 ) the onnx tests were all skipped.	2024-02-16 00:49:30 -05:00
xarkes	28a8b72024	Remove Interpreted device & remaining CPU/TORCH ref (#3423 ) * Remove Interpreted device & remaining CPU/TORCH ref * Oops * supports_device was useful * Fix doc wording --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-16 00:30:21 -05:00
chenyu	6efa68f97b	remove use of TORCH in pre-commit (#3424 ) it's silently using DEFAULT after removing TORCH	2024-02-15 19:38:37 -05:00
geohotstan	5eb4c902f6	correct division dtype casting (#3405 ) * 新年快乐 * fix: exclude floordiv onnx tests * fix: less weird if statements in div * 龙年大吉 * fix: tempfix onnx div * fix: use reference impl for div	2024-02-15 19:34:40 -05:00
George Hotz	5de660ca0d	disk runner (prereq for interpreted removal) (#3421 ) * disk runner * simpler diskrunner	2024-02-15 18:14:05 +01:00
qazal	e1a57fe58a	test the behavior, not the implementation (#3419 )	2024-02-15 17:23:42 +01:00
George Hotz	b1c0d8c99d	remove cpu and torch backends (#3399 ) * remove cpu and torch backends * don't copy to cpu * use clang instead of cpu * multitensor gathers on the first device * clang is cpu + use default * fixup * bugfix	2024-02-15 16:55:39 +01:00
Obada Khalili	75f7e21a80	Make tests in `test/test_ops.py` pass for Python emulator (#3384 ) * fix OverflowError in UnaryOps.EXP2 * avoid accessing outputs for void uops * skip execution for UOps.IF and UOps.ENDIF * initialize bytearray to the correct size in UOps.DEFINE_LOCAL * validate len of input that has .sz > 1 * remove comment in code * reinitialize loop of already iterated * validate first value in input to be a list for inputs with .sz > 1 * add python ops tests to CI * skip long runtime tests for PYTHON backend * respect dtype.sz arg in UOps.CONST, and remove incorrect validation in UOps.STORE * use math.inf instead of float('int') * handle 0 args to UnaryOPs.LOG2 * handle load op with default of .sz > 1 * initialize the loop correctly using UOps.LOOP arg * remove unnecessary TODO comment * remove newline * select a subset of 22 ops tests to skip in CI when PYTHON=1 * handle gated UOps.LOAD referencing values that have .sz > 1 * Revert "select a subset of 22 ops tests to skip in CI when PYTHON=1" This reverts commit `7674fee81d`. * skip tests in python backend CI command * push fix lost in conflict resolve * Revert "skip long runtime tests for PYTHON backend" This reverts commit `5dd2a0376e`. * clear loop state after last iteration	2024-02-15 16:40:25 +01:00
Obada Khalili	18bb6a22e0	make tensors sizes smaller in maxpool2d tests (#3417 )	2024-02-15 15:53:52 +01:00
Maciej Fijalkowski	736c74b010	Rename .sz to .count on DType (#3413 ) * rename .sz for .count on dtype (and ANETensor for completeness) * revert the changes to extra, as per review * try to make linter happier * remove the change to extra	2024-02-15 15:03:49 +01:00
qazal	7919a1e6ec	dtypes: delete the float cast in realize.py (#3401 ) * remove float cast * cast scalars to the correct value in creation time * cast scalar in the correct place * wrong, use y_dtype * make consts have a unique cache key * add cast_scalar back * test_load_cache_const_bufs * add bool dtype * test_const_dtype * fix linters	2024-02-15 14:20:30 +01:00
nimlgen	002bf380b0	hsa runtime (#3382 ) * hsa init * handles transfer * linter * clean up hwqueue * fix sync freezes * print errors	2024-02-15 14:14:34 +01:00
George Hotz	93eceef727	remove cpu prereqs (#3410 )	2024-02-15 13:45:06 +01:00
George Hotz	a40df14fef	ops_ext to replace cpu import (#3409 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test * fix jit issue	2024-02-15 13:03:42 +01:00
George Hotz	ede4fd4705	hotfix: test_jit_copyin	2024-02-15 12:37:53 +01:00
George Hotz	6356474d6d	Revert "ops_ext to replace cpu import (#3406 )" (#3408 ) This reverts commit `91eb93f85a`.	2024-02-15 12:16:10 +01:00
George Hotz	91eb93f85a	ops_ext to replace cpu import (#3406 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test	2024-02-15 12:14:58 +01:00
qazal	49cb1fee54	run test_indexing on remu (#3404 ) * emulated ops_hip infra * add int4 * include test_indexing in remu * Revert "Merge branch 'remu-dev-mac'" This reverts commit `6870457e57`, reversing changes made to `3c4c8c9e16`.	2024-02-15 11:52:40 +01:00
qazal	9d4d63fcfc	dynamic tc function render (#3387 ) hip cant be done right now	2024-02-15 11:19:46 +01:00
chenyu	3c4c8c9e16	bump db version to 11 (#3398 ) followup after disabled fast math on metal.	2024-02-14 10:13:18 -05:00
qazal	27f4de2ce4	delete half_prekernel (#3388 ) * generic rendering of half and bf16 hotfix * fix uops + regression test * fix the test for metal's half4 * uop.uop fixup * mypy with --strict-equality, fix ops_gpu	2024-02-14 15:40:48 +01:00
chenyu	078a2603d5	set metal fast math default to 0 (disabled) (#3370 ) * set metal fast math default to 0 (disabled) It's a correctness fix because we use inf and nan. Let's see how slow it is * skip failed onnx tests * tmp DISABLE_COMPILER_CACHE=1 in metal benchmark * Revert "tmp DISABLE_COMPILER_CACHE=1 in metal benchmark" This reverts commit `22267df380`.	2024-02-14 11:42:33 +01:00
Francis Lam	668324d92b	wmma: protect TC locals from modification and use only LOCAL (#3379 ) also remove unnecesssary upcast_dim from tensor_core and calculate it from the dimensions and thread sizes	2024-02-13 10:19:35 +01:00
Francis Lam	f1ad01fd91	test_linearizer_failures: add new linearizer compile failure on METAL (#3380 )	2024-02-12 20:28:34 -05:00
George Hotz	ce1f9f5556	hotfix: new linearizer docs	2024-02-12 18:56:30 +01:00
George Hotz	2e60012bcf	move create schedule and delete old API (#3377 ) * move create schedule and delete old API * fix test multitensor	2024-02-12 18:10:45 +01:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
George Hotz	0f6cde243d	import from wino_cleanup (#3374 )	2024-02-12 16:26:50 +01:00
George Hotz	f47e297d4e	refactor: END -> ENDLOOP	2024-02-12 15:46:18 +01:00
George Hotz	29d68ae637	uops endif (#3372 ) * use is instead of == * add endif	2024-02-12 15:43:37 +01:00
George Hotz	1d45f3899d	use is instead of == (#3371 )	2024-02-12 15:35:55 +01:00

1 2 3 4 5 ...

3610 Commits