tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	81baf3eed3	bring ptx back (#3623 ) * bring ptx back * ptx back * fix define var * fix a few bugs * bugfixes * fixes * fix llvm bug * fix test bug	2024-03-06 13:34:21 -08:00
George Hotz	568353fa84	hotfix: bump line count to 6500	2024-03-06 07:52:18 -08:00
chenyu	c3b8d285aa	cleanup uops (#3605 ) using `is` to compare with enums, remove long lines and slightly more compact	2024-03-04 11:03:14 -05:00
George Hotz	770707b376	hotfix: gpuocelot no rebuild	2024-03-02 15:57:38 -08:00
Francis Lam	162dfb07d9	fuzz_linearizer: fix uops and add to test.yml (#3588 )	2024-03-02 15:03:42 -08:00
Francis Lam	e17f1821a7	wmma: add CUDA tensor core and fix test_speed_v_torch failure (#3544 )	2024-03-01 17:51:02 -08:00
chenyu	b7e555f6c0	run test_linearizer_failures on PYTHON backend (#3565 ) * run test_linearizer_failures on PYTHON backend only test 1, some have hanging issues and gated store is not implemented * --durations=20 * two less slow ones	2024-03-01 17:00:18 -05:00
George Hotz	5a6e151844	no barrier side effect (#3550 ) * no barrier side effect * finish barrier removal	2024-02-29 18:10:04 -08:00
George Hotz	2c19ab6561	define var (#3548 ) * define var * remove vars from there * fix python symbolic ops * fix llvm * pypath	2024-02-29 16:43:27 -08:00
George Hotz	c34d382a1e	bump to macos-14 M1 (#3520 ) * bump to macos-14 M1 * bump cache key * no -n auto * jit=2 * real tensor cores	2024-02-28 10:28:25 -08:00
George Hotz	7698781389	Revert "wmma: add CUDA tensor core (#3464 )" (#3474 ) This reverts commit `e9cef13f0b`.	2024-02-22 11:58:16 +01:00
Francis Lam	e9cef13f0b	wmma: add CUDA tensor core (#3464 )	2024-02-22 11:57:08 +01:00
chenyu	7c0fc40123	enable test IMAGE=2 PYTHON=1 python3 test/test_ops.py TestOps.test_simple_conv2d (#3468 )	2024-02-21 18:30:12 -05:00
chenyu	77d2a4c12a	regenerate kernel dataset after reduce arg to axis change (#3467 ) ``` ./extra/optimization/generate_dataset.sh gzip /tmp/sops mv /tmp/sops.gz extra/datasets/ ```	2024-02-21 18:16:13 -05:00
George Hotz	871ba73e65	_reduce_op is axis based now (#3462 ) * _reduce_op is axis based now * axis_ * update lin failures * disable that * fix shape	2024-02-21 16:36:31 +01:00
qazal	7864fb69d1	delete MovementOps (#3434 ) * delete MovementOps * keep extra/to_movement_ops.py	2024-02-19 23:21:44 +01:00
Patrick Tsai	ac9d94a068	Cast correctly in python emulator (dtype tests pass) (#3446 ) * Cast correctly in python emulator * Update test yml and fix lint * make ruff pass * mypy passes --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-02-19 13:34:02 +01:00
George Hotz	b1c0d8c99d	remove cpu and torch backends (#3399 ) * remove cpu and torch backends * don't copy to cpu * use clang instead of cpu * multitensor gathers on the first device * clang is cpu + use default * fixup * bugfix	2024-02-15 16:55:39 +01:00
Obada Khalili	75f7e21a80	Make tests in `test/test_ops.py` pass for Python emulator (#3384 ) * fix OverflowError in UnaryOps.EXP2 * avoid accessing outputs for void uops * skip execution for UOps.IF and UOps.ENDIF * initialize bytearray to the correct size in UOps.DEFINE_LOCAL * validate len of input that has .sz > 1 * remove comment in code * reinitialize loop of already iterated * validate first value in input to be a list for inputs with .sz > 1 * add python ops tests to CI * skip long runtime tests for PYTHON backend * respect dtype.sz arg in UOps.CONST, and remove incorrect validation in UOps.STORE * use math.inf instead of float('int') * handle 0 args to UnaryOPs.LOG2 * handle load op with default of .sz > 1 * initialize the loop correctly using UOps.LOOP arg * remove unnecessary TODO comment * remove newline * select a subset of 22 ops tests to skip in CI when PYTHON=1 * handle gated UOps.LOAD referencing values that have .sz > 1 * Revert "select a subset of 22 ops tests to skip in CI when PYTHON=1" This reverts commit `7674fee81d`. * skip tests in python backend CI command * push fix lost in conflict resolve * Revert "skip long runtime tests for PYTHON backend" This reverts commit `5dd2a0376e`. * clear loop state after last iteration	2024-02-15 16:40:25 +01:00
qazal	49cb1fee54	run test_indexing on remu (#3404 ) * emulated ops_hip infra * add int4 * include test_indexing in remu * Revert "Merge branch 'remu-dev-mac'" This reverts commit `6870457e57`, reversing changes made to `3c4c8c9e16`.	2024-02-15 11:52:40 +01:00
qazal	27f4de2ce4	delete half_prekernel (#3388 ) * generic rendering of half and bf16 hotfix * fix uops + regression test * fix the test for metal's half4 * uop.uop fixup * mypy with --strict-equality, fix ops_gpu	2024-02-14 15:40:48 +01:00
qazal	c8fd66a131	Run RDNA3 tensor core tests in CI (#3367 ) * add test_linearizer * skip test_padto_matmul	2024-02-11 19:54:06 -05:00
Francis Lam	ce21fdfb67	ops_python: add HIP tensor core mock and refactor METAL (#3354 ) * ops_python: add HIP tensor core mock and refactor METAL * Add tests to CI * add DEBUG=2 to full tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-09 12:46:06 +01:00
George Hotz	b385234961	oops, change to 3.12 (#3357 )	2024-02-09 12:21:06 +01:00
George Hotz	7726eef464	ops_python: add image support (#3356 ) * ops_python: add image support * uops tests in their own CI * fix ci	2024-02-09 12:02:06 +01:00
George Hotz	c32ea95d7d	Python uop emulator (#3327 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style	2024-02-08 19:24:55 +01:00
chenyu	0d2dacb549	test intermediate tensors created by function have same device as input (#3338 ) run on TORCH since it's the fastest one on CI. caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.	2024-02-07 09:24:36 -05:00
qazal	5b46b0ff3d	Simple RDNA3 emulator (#2974 ) * mockhip->hipcpu * allocate buffers * launch a kernel read_asm api * run remu in CI * remu 0.0.2, real test ops * simple driver * 0.0.3, all test_ops * run the latest emulator * 9 minutes is way too long, drop backprop in CI * bring back the backward pass * Revert "bring back the backward pass" This reverts commit `3781e1bc56`. * Print slowest tests * emulated device directly in ops_hip * fix ruff, override mypy for specific rules * test in the same code path - hip backend env variables - install packages and verify autogen - run certain tests - remove the other hip tests path - verify Device.DEFAULT * remove the emulated hip in extra --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-30 10:39:28 -08:00
George Hotz	0aad8d238b	rebuild ocelot (#3259 ) * rebuild * strip trailing whitespace	2024-01-26 18:46:36 -08:00
George Hotz	03a6bc59c1	move autogen to runtime/autogen (#3254 )	2024-01-26 12:44:19 -08:00
George Hotz	a3869ffd46	move gpuctypes in tree (#3253 ) * move gpuctypes in tree * fix mypy * regex exclude * autogen sh * mypy exclude * does that fix it * fix mypy * add hip confirm * verify all autogens * build clang2py * opencl headers * gpu on 22.04	2024-01-26 12:25:03 -08:00
chenyu	bc92c4cc32	onnx Einsum, CumSum, DepthToSpace, SpaceToDepth (#3252 ) * onnx Einsum, CumSum, DepthToSpace, SpaceToDepth Einsum inner product and `...` are not supported * --durations=20	2024-01-26 10:47:53 -05:00
George Hotz	aa0d1b6330	hotfix: don't use noqa: E702 that's just dumb	2024-01-24 20:01:00 -08:00
jxdv	ef3aa6d7fb	update gh actions (#3033 ) * update checkout actions * update upload artifact * update setup python --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-09 17:52:22 -08:00
chenyu	1d730b8853	remove ACCUM_FP32 in simple_matmul.py (#3045 ) * remove ACCUM_FP32 in simple_matmul.py accumate for half inputs is always in float * move test llama compile speed to metal	2024-01-08 17:37:57 -05:00
George Hotz	50754f1494	add caches there (#3042 ) * add caches there * no curl	2024-01-08 13:02:16 -08:00
George Hotz	c5a941d466	webgl backend in extra (#3041 ) * WebGL WIP * 84% of ops passing test * tests passing 100% * Cleanup, refactor * Shave off some lines * Work on dtypes * TestOps at 100% again * Efficient net shaders compile in browser webgl2 * Compile all efficientnet shaders in browser * Create empty textures for tensor buffers * Run program. Up next weight loading * Exported WebGL model working * Add tests, refactor * Explicit cast alu for GLSL * Fix CI tests * WebGL efficientnet demo * Compile and run yolov8 in browser * Fix imports * Simplify yolo compile * Fix boolbool and cast cmplt to float More tests * Do std tests pass on CI? * Skip std tests on CI * Remove explicit_cast_alu hack, and solve it in code_for_op * Move to new dtype-less alloc api * Remove local size hack: optimize local_size only if device has local * Remove glsl.py, and move content to cstyle * dont_use_locals in opts * Fix dtype tests * type_map in CStyleLanguage * Make core changes smaller, cleaner, refactor export_model and demo * Skip pad_slice * Simplify: render_const, render_conditional * solve bool alu for other binops, cleaner ops_webgl * Fix noopt hack * Remove some skipIfs * WebGL image hack * type_names is a better name * global_max * Fix dtype import * Fix type_names -> type_map * Fix lint * Remove webgpu, back to 5k lines (#3040) * remove webgpu * max 5000 lines * revert those to master * retain that cstyle --------- Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>	2024-01-08 09:29:13 -08:00
George Hotz	8cbcd1b342	Remove webgpu, back to 5k lines (#3040 ) * remove webgpu * max 5000 lines	2024-01-08 09:10:07 -08:00
George Hotz	60abc62a3f	fast hip read (#3014 ) * fast hip read * hip read faster * fix tests * to_mv * simplify * bump to 6k lines	2024-01-05 10:33:13 -08:00
George Hotz	f494b9d463	simple multitensor API (#2903 ) * simple multitensor API * test multitensor * mt work * new api * copies * all but data parallel * allreduce there * works, but axis sharded * fix all mt tests * features/multi * work * backprop * fix tests * tests passing * mt progress * cleanups * less lines * tensor cleanup * save more lines * mypy passes * fix tests * skip for cuda too * bump download cache	2024-01-02 17:49:44 -08:00
chenyu	e53b96fdbb	fix TC=2 tensor core op test (#2951 ) * print DEBUG for TC=2 in CI * enable TC=2 * no need to check src type * LOAD has side effect * don't push any local buffer * update comment * and BARRIER	2023-12-29 21:39:49 -05:00
George Hotz	7da2325dc7	get_lazyops() -> lazyops (#2884 ) * get_lazyops() -> lazyops * don't compare empty mem	2023-12-20 18:04:49 -08:00
George Hotz	ca59054463	fix shapetracker math (#2861 ) * proper test * all st math good now * fix real_strides bug	2023-12-19 22:17:34 -08:00
chenyu	1231ec5a02	run the sz.py line count at the end of linter ci (#2857 )	2023-12-19 16:33:12 -05:00
George Hotz	6617dcf095	move graph to runtime, check line count with sz.py (#2842 ) * move graph to runtime, check line count with sz.py * oops, didn't save * dtype aliases * restore comment, REALCOUNT	2023-12-18 20:30:06 -08:00
George Hotz	80f53245e8	shapetracker add and invert (#2828 ) * invert (broken) * decent invert * shapetracker invert works * plus is meh, invert is good * support invert mask * a few more invert tests * shapetracker math invert test	2023-12-18 16:03:27 -08:00
chenyu	73cadfbb3c	Remove pytest markers (#2831 ) * remove pytest marker * fix some, skip some * tweak * fix * skip slow * skip more	2023-12-18 18:53:28 -05:00
George Hotz	051402625e	remove pushing contig + fix linearizer bug (#2798 ) * remove that logic * fix test, move LOADs * fix repeat issue on LLVM * with_phi	2023-12-16 09:36:31 -08:00
George Hotz	c6eb618013	tests from new lazy branch (#2774 ) * tests from new lazy branch * fix lin 11 * that was needed * doesn't fail * mark * meant that * llvm passes	2023-12-14 23:06:39 -08:00
Ahmed Harmouche	4b01839774	support vals on WebGPU, run more tests (#2668 ) * Vals on webgpu, run more tests * Skip slow tests, run symbolic ops tests * Balance out tests	2023-12-07 16:45:21 -08:00

1 2 3 4 5 ...

276 Commits