tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 07:18:40 -05:00

Author	SHA1	Message	Date
chenyu	1d730b8853	remove ACCUM_FP32 in simple_matmul.py (#3045 ) * remove ACCUM_FP32 in simple_matmul.py accumate for half inputs is always in float * move test llama compile speed to metal	2024-01-08 17:37:57 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
George Hotz	7da2325dc7	get_lazyops() -> lazyops (#2884 ) * get_lazyops() -> lazyops * don't compare empty mem	2023-12-20 18:04:49 -08:00
Rory Clear	f409b57854	update metal matmul and matvec for new device style (#2732 ) * update for new device style * create device before compile --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-17 16:15:07 -05:00
Nguyen Nguyen Phuong	07cf45e133	fix cuda matmul (#2725 )	2023-12-12 07:59:31 -08:00
George Hotz	b5fd160b39	hotfix: increase rtol on simple_matmul	2023-12-11 10:10:29 -08:00
George Hotz	a73579919f	mlx benchmark, a lil slower than tg	2023-12-05 19:00:43 -08:00
George Hotz	0be5d16950	only 62 gflops (#2629 )	2023-12-05 13:28:24 -08:00
Yixiang Gao	fde44aed76	update hip_matmul with new abstraction (#2605 )	2023-12-04 13:37:10 -08:00
Jake	5588922884	Update cuda_matmul.py (#2495 )	2023-11-28 19:46:01 -08:00
George Hotz	3f137b134a	jax parallel matmul example	2023-11-28 13:48:11 -08:00
Davi Silva	186ac77ec3	Update hip_matmul.py (#2480 )	2023-11-27 18:36:19 -08:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
George Hotz	0cbf6c1811	move things, clean up extra (#2292 ) * move things * idk why pylint needs that now * delete unused	2023-11-13 20:18:40 -08:00
Rory Clear	553688f12a	update metal matmul and matvec for compile api (#2238 )	2023-11-08 08:08:35 -08:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
George Hotz	5472a14544	openpilot compile2 (#1977 ) * start compile2 * tweak * why are there two more kernels? * minor cleanups * don't break onnx tests * add __metadata__ support to safetensors * no early realize in onnx * cleanups * bugfix * clean up image type, add optimize * opt to match old * try that * opt work * run compile2 * optimizer * prt more * prerealize * imp * NOLOCALS works * no locals means no locals * support fractional globals * all locals welcome * int that * cleanups * show gemv regression * clean up diff * use idx for the cond * nolocals --------- Co-authored-by: Comma Device <device@comma.ai>	2023-10-15 20:39:46 -07:00
George Hotz	8db92bd060	fix tvm gemm example	2023-10-08 05:57:41 -07:00
Francis Lam	dece9958f8	wmma: clean up to make WMMA arg order consistent (#2014 ) also add cache defeat to extra/gemm/simple_matmul.py	2023-10-07 17:45:40 -07:00
Francis Lam	0ba75c4370	optimizer: add matvec optimizations (#1972 ) * optimizer: add matvec optimizations * renderer: fix alignment of shared memory in opencl	2023-10-04 14:16:27 -07:00
George Hotz	717451a244	Revert "optimizer: add matvec optimizations (#1753 )" (#1959 ) This reverts commit `f520323054`.	2023-10-03 00:28:42 -07:00
Francis Lam	f520323054	optimizer: add matvec optimizations (#1753 ) * optimizer: add matvec optimizations * Update optimizer.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-10-03 00:01:59 -07:00
Francis Lam	f445e056ed	wmma: add test and tensor core shape (#1925 )	2023-09-28 18:04:28 -07:00
George Hotz	c36d0e3bd8	tvm import hook	2023-09-28 09:24:32 -07:00
qazal	d0e752003d	fixes (#1893 )	2023-09-22 07:20:27 +08:00
George Hotz	4613c9e77c	add tvm example, formatting (#1813 ) * add tvm example * no realize	2023-09-07 11:50:41 -07:00
Pavol Rusnak	52a92bf95d	use class Foo: instead of class Foo(): (#1797 ) * use class Foo: instead of class Foo(): * add ruff linter, copy settings from .flake8 to ruff.toml	2023-09-06 12:20:25 -07:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
George Hotz	e464442adf	WMMA for 7900XTX (#1563 ) * go * hip no LRU * work * works * 16 TFLOPS * 29 TFLOPS * 30 TFLOPS * never mind, it's 60 TFLOPS * fix metal WMMA * put hip alloc back	2023-08-19 09:07:23 -07:00
George Hotz	c417cd3c97	fast HIP gemm -> 100 TFLOPS (#1476 ) * fast HIP gemm * wmma * correct b * fix spilling * 60 TFLOPS * 64 TFLOPS * 65 TFLOPS	2023-08-09 06:54:15 -07:00
David Hou	3300d0aeaf	syncthreads before wmma (#1389 ) (venv) chaos@tiny3:~/tinygrad$ KX=2 KY=2 N=2048 python extra/gemm/hip_matmul.py 4194304 289.60 us, would be 59322.55 GFLOPS matmul, 173.80 GB/s	2023-07-31 17:05:49 -07:00
George Hotz	37fa7e96fb	Revert "update editorconfig, enforce via CI (#1343 )" (#1380 ) This reverts commit `da2efecbe2`.	2023-07-31 10:35:50 -07:00
Pavol Rusnak	da2efecbe2	update editorconfig, enforce via CI (#1343 ) * update editorconfig to set unix-style newlines and trim whitespace * add editorconfig github action to the CI * fix whitespace	2023-07-30 18:44:30 -07:00
George Hotz	67e34b356a	good stuff from tensor cores branch (#1199 )	2023-07-08 16:58:26 -07:00
George Hotz	b8dfbba703	hip_matmul: f16 gemm 2048x2048 gets 36 TFLOPS	2023-07-08 00:35:45 +00:00
George Hotz	e234bf2298	hip matmul : add K support	2023-06-28 19:54:33 +00:00
George Hotz	0e93b9642a	hip matmul	2023-06-28 19:21:01 +00:00
Casey Primozic	805eef10dd	Add tensorflow GEMM benchmark script (#1000 ) * Modelled closely after the existing torch benchmark script but just adapted slightly for tensorflow	2023-06-18 10:57:45 -07:00
George Hotz	fe71282ba1	faster RDNA assembly backend (#990 ) * fast asm * torch gemm	2023-06-16 12:06:38 -07:00
George Hotz	90fff82c8a	Rdna (#776 ) * assembler maybe * custom asm * rdna3 on quiet * trigger crashes * fixed notes * non-fatal rdna2 crash * Crash4 * improve rdna sniffer * comments * improve sniffer * asm * 131 TFLOPS RDNA3 * opt simple matmul * todos	2023-05-16 05:33:57 -07:00
George Hotz	59d0d168cd	FLOAT16 off works	2023-04-19 15:34:56 -07:00
George Hotz	3d15769a8f	50 TFLOPS cuda matmul	2023-04-19 14:38:24 -07:00
George Hotz	0b5a0b9ba4	winograd comment	2023-04-16 03:36:51 -07:00
George Hotz	8b777af571	metal_conv gets over 10.4 TFLOPS...	2023-04-15 03:31:22 -07:00
George Hotz	d66e682205	metal matmul from tcores branch	2023-04-14 23:29:29 -07:00
George Hotz	68e45fca18	metal_matmul: bw and torch sync	2023-03-23 08:02:04 -07:00
George Hotz	bd6c3c31a9	compare to torch	2023-03-22 23:58:37 -07:00
George Hotz	c3a3db75c7	fix metal matmul example	2023-03-22 23:42:51 -07:00
George Hotz	1a039306d2	good changes from llama branch (#671 ) * good changes from llama * transpose behavior changed	2023-03-09 20:51:22 -08:00

1 2

57 Commits