tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
George Hotz	0f89660ce4	Revert "change clang -march flag to -mcpu on arm (#10841 )" (#10942 ) This reverts commit `897e42fd1b`.	2025-06-23 16:48:28 -07:00
ttomsa	897e42fd1b	change clang -march flag to -mcpu on arm (#10841 ) * change clang -march flag to -mcpu with fp16 disassembly test * fix * add capstone to macos dependencies * just check no cast in test * rm import * woops * lets check * move check * llvm init before cpu chcek * try this * bump autogen llvm version * also update libclang? * revert * add comment * skip llvm test and add comment * linter	2025-06-23 16:28:48 -07:00
George Hotz	0629e45332	remove cpu graph (#10836 ) * remove cpu graph, it's different from the others * remote was blacklisting CPUGraph * remove cpugraph from dsp	2025-06-16 11:40:58 -07:00
George Hotz	413e223d6e	Revert "remove cpu graph, it's different from the others (#10743 )" (#10745 ) This reverts commit `3d64a98432`.	2025-06-09 22:40:48 -07:00
George Hotz	3d64a98432	remove cpu graph, it's different from the others (#10743 ) * remove cpu graph, it's different from the others * remote was blacklisting CPUGraph	2025-06-09 22:17:10 -07:00
quortus	9e49721c47	CPUGraph support for clang (#10014 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 07:52:35 -04:00
Alexey Zaytsev	3bce5ad2b4	clang should not emit the .comment section (#9859 ) This section gets included in the finanl image, and we get a lot of garbage with DEBUG=7	2025-04-12 10:59:11 +08:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
George Hotz	b1c0d8c99d	remove cpu and torch backends (#3399 ) * remove cpu and torch backends * don't copy to cpu * use clang instead of cpu * multitensor gathers on the first device * clang is cpu + use default * fixup * bugfix	2024-02-15 16:55:39 +01:00
chenyu	ca7973f61c	clean up einsum_mulacc (#3312 ) * clean up einsum_mulacc * push get_strides * stride * get_strides for ndim	2024-02-04 06:21:19 -05:00
Obada Khalili	b4ea0e18e3	Fix dot product on buffers with zero strides (#3303 ) * skip matacc opt if the all src buffers of mul op are const buffers * add noqa directive for long test * unskip MALACC opt * ensure that a_axes at least includes summation axes in order to perform np.einsum correctly * add regression test for mulacc op * compute a_slices using a_axes * refactor helper of function to retrieve axes and slices for nonzero strides as well as summation axes * include a regression test that uses and to test the behaviour indirectly	2024-02-04 05:15:06 -05:00
geohotstan	842053873d	fix neg logical_not inconsistencies (#3222 ) * try * test: add logical_not tests * gah im retarded, but this doesn't match types for const() * fix: can't we jsut do this? * big change: I don't actually know what I'm doing * WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later * BYE BYE noqa: E501 * fix: less lines and add test * fix: rm 2 redundant tests * fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e	2024-01-24 11:48:40 -05:00
George Hotz	23b084e70a	add device name to device, all are constructed (#3221 )	2024-01-23 20:34:56 -08:00
George Hotz	cc2969f690	simpler cstyle (#2966 ) * simpler cstyle * save lines	2024-01-01 16:20:10 -08:00
chenyu	b469fe3723	add CMPEQ (#2931 ) * CMPEQ * work * fix onnx * fix round * fix webgpu * prettier * no PADTO in actions	2023-12-25 00:15:55 -05:00
chenyu	677ae7673d	use np.less and torch.lt for CMPLT (#2899 ) also removed one unused output_type	2023-12-21 14:37:24 -05:00
chenyu	1500aca43d	remove output_type in ops_cpu and ops_torch (#2892 ) now the input types are matched and checked in lazy, we can remove these output_type. also remove the usage of least_upper_dtype in ops.py since we can just use the input type	2023-12-21 02:11:27 -05:00
chenyu	2d2c4980fe	assert for elementwise dtypes in lazy (#2888 ) * assert for elementwise dtypes in lazy * no image hack * check dtype of scalar for IMAGE=2	2023-12-21 01:42:32 -05:00
chenyu	959d9cfed4	clean up ops_torch and ops_cpu (#2819 )	2023-12-17 19:35:19 -05:00
chenyu	91adb119b8	remove match_type in ops_torch and ops_cpu (#2817 ) * remove match_type in ops_torch and ops_cpu input dtypes are aligned and casted in mlops * dict union only after python3.9 * fix that * fix Sigmoid forward cast	2023-12-17 15:32:30 -05:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00
wozeparrot	6d58c19736	binaryops xor (#2627 ) * feat: initial xor * feat: numpy xor * feat: llvm xor * feat: quick test for xor * feat: slightly working xor in torch * feat: xor in tensor * feat: slightly better test	2023-12-05 13:21:42 -08:00
George Hotz	d6b404ac11	No dtype alloc (#2570 ) * fix all allocs * improve docs * ugh fix fake alloc	2023-12-02 13:29:40 -08:00
George Hotz	f5de21e753	fast path for copy (#2548 ) * fast copy * ruff first * flat_mv on malloc * order + webgpu test	2023-12-01 11:34:47 -08:00
chenyu	7fec966b5e	bye bye NOOP (#2534 ) * bye bye NOOP * SIN * NEG	2023-11-30 23:10:35 -08:00
George Hotz	12fa846122	zero copy (#2531 ) * zero copy * zero copy test * loads coder in milliseconds * zero copy for cpu and torch * src_from_buffer is None * SLOW_METAL_COPY there	2023-11-30 18:38:41 -08:00
George Hotz	2c363b5f0b	new style device (#2530 ) * cpu tests pass * torch works * works * metal works * fix ops_disk * metal jit works * fix openpilot * llvm and clang work * fix webgpu * docs are rly broken * LRU works on metal * delete comment * revert name to ._buf. LRU only on Compiled * changes * allocator * allocator, getting closer * lru alloc * LRUAllocator * all pass * metal * cuda * test examples * linearizer * test fixes * fix custom + clean realize * fix hip * skip tests * fix tests * fix size=0 * fix MOCKHIP * fix thneed * copy better * simple * old style metal copy * fix thneed * np reshape * give cuda a device	2023-11-30 17:07:16 -08:00
George Hotz	6707f2588e	use copyin (#2500 ) * it's always copyin * all RawBuffer are RawBufferCopyIn * cleanups * this fixes it * requirements='C' * more correct	2023-11-29 09:34:00 -08:00
George Hotz	5629fc368c	Use Buffer.STORE at the end of ASTs (#2494 ) * work * store broken * interpreteds work * this passes * symbolic cpu * fix tests * fix opt tests * images fail * fix InterpretedFlopCounter * stupid hack for images	2023-11-28 20:11:37 -08:00
George Hotz	ab5d14d4ba	MEM -> LOAD (#2492 ) * MEM -> LOAD * keep legacy working	2023-11-28 16:46:37 -08:00
George Hotz	756b01f46f	why were these ever called buffer (#2483 )	2023-11-27 21:02:07 -08:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
George Hotz	8e9cdef61f	clean up the buffers (#2447 ) * clean up the buffers * remove allocate_output * functools.lru_cache is methodcache * add TestShapeTrackerSize * cache_clear * no 0 sz buffer, add _ on functions that shouldn't be imported * fix size * if -> while	2023-11-26 11:02:29 -08:00
George Hotz	8f89e21fca	torch and numpy don't share ops anymore (#2412 ) * torch and numpy don't share ops anymore * that should be filtered out elsewhere * still const * graph + enet example cleanup * hmm, we do still need it because of symbolic	2023-11-23 16:58:10 -08:00
George Hotz	0505c5ea50	remove force_wait, refactor to graph (#2405 ) * remove force_wait * refactor * get rid of stupid ASTRunner * fix del in diskbuffer * BufferOps.FROM_UNDERLYING * put offset in the rawbuffer * fix bugs * use exec	2023-11-23 12:46:07 -08:00
George Hotz	9b58d4cb37	cleanup unused movement ops (#2353 ) * cleanup_mops * no expand * nothing * revert that * add comment * add correctness check to disk tensor	2023-11-18 09:19:02 -08:00
chenyu	d2c0035c73	add back as_strided, move rebuilt mops to extra (#2344 ) * add back as_strided, move rebuilt mops to extra * negative stride for ops_cpu * Revert "negative stride for ops_cpu" This reverts commit `a13b6815ac`. * skip that * style	2023-11-17 14:34:30 -05:00
forcefieldsovereign	b64738e1d6	Remove AS_STRIDED from shapetracker (#2216 ) * very close * remove comment * negative strides working * almost everything passes * calculate offset with list comprehension * some cleanup * got disk load working * review suggestions * fix after merge * overlap working * did it * clean * fixed disk load * lint * mypy * removed as_strided * trying without simplify * added back simplify * make sure expanding to smaller shape * cleanup * removed comment * removed env file * trying whisper test again * onnx test sqlite issue * working on test * finished test * eliminate unnecessary shrink-then-pad * don't shrink buffer * added strides check * added to ci under linters * switch issue * allow symbolic stride * removed .env * isinstance * adjust strides for double expand * cleanup * needed to add type hint for mypy * set pythonpath	2023-11-15 15:50:17 -05:00
George Hotz	70a65c201e	JIT support in Interpreted (#2314 ) * factor that out * jit is supported everywhere * fix some tests * there's no jit supported device, the jit is everywhere * fix test uops	2023-11-15 11:13:38 -08:00
George Hotz	4da2ddea6e	Interpreted cleanups (#2312 ) * move the compiler out of ops * don't return realized * var_vals filter, fix custom * typing	2023-11-15 09:02:23 -08:00
George Hotz	4f7b1ac0d2	cleanups before interpreted jit (#2306 ) * jit mnist * InterpretedFlopCounter doesn't rely on Interpreted * allocator for cpu and torch * types for exec_ast * fix type issues * fix onnx, remove print * always self.from_underlying	2023-11-14 21:44:25 -08:00
geohotstan	b853e9bb8c	Onnx 1.15.0 gogogo (#2217 ) * lol * lol * add GELULULULUL * onnx 1.50 * fuk torch bool neg * exclude regex tests * exclude dequantizelinear for now * is sunny in philly * damn it affinegrid * fixed auto_pad VALID * skip 0 shape tests * add temporary cast in Reduces * tests should pass now * added comments and cleanup * try moving dequantizelinear to onnx.py * fixed dequantizedlinear? * cleanup * try? * float16 segfaults LLVM CI..??? * cleanup comments * pin to 1.50.0 * remove use of -np.inf cuz numpy is kill * 1.50? lol I'm actually retarded * thx for review, muhbad * moved Gelu higher up	2023-11-10 15:36:48 -08:00
George Hotz	9ea0448103	compile interpreted to python code (#2208 ) * sort of works * interpreted * fix flopcounter * interpreted * simpler * type * functools compile ast * lose a line * delete extra file * no self.method_cache	2023-11-03 09:16:12 -07:00
George Hotz	73a6ed7862	Apply ShapeTracker in interpreted backends (#1846 ) * applying st * tests pass * minor cleanups * torch too * hack * contiguous * move mops * contig in BN * tests should pass * make torch fast * make zeros and ones contig by default * no contig there * fix padding with expanding * might fix tests * still doesn't fix bug, but should be there * Revert "still doesn't fix bug, but should be there" This reverts commit `8ea92f3e07`. * minor cleanups	2023-09-23 10:05:13 +08:00
geohotstan	9af5645ba3	onnx full passing (#1076 ) * 1 * 83 failed * learning how git works * lol idk * zero shape aaaa * space lol * aaa * test check * haha * fixed gather * 73 failing * 71 failing * 68 failing * added some debug * fking resize * lol * 62 failing * 58 failling fucking did nearest resize hell yeah * clean up * 56 failing * janitor duty * lol * 53 failing * hi mom * 50 failing * added linear interp, but coord_trans is wrong * did lin interpolation woohoo * 43 failing * 40 failing * temporary Gather fix * 39 failing * fixed slice onnxver<10 * 37 failing * 35 failing * excluded tests that use float64 * 32 failing with hacks * added _batchnorm() for 3D 5D batchnorm, 29 failing * changed ALLOWED_KERNEL_COUNT from 199 to 207 * added improved Gather op, reverted ALLOWED_KERNEL_COUNT commit * support Round op * added storage_order/indices maxpool, 27 failing * support maxunpool, 25 failures * support Gradient, 23 failures * merged new where * added Adam * cleanups * added Momentum and Nesterov Momentum * added Adagrad * support sequence_type, 20 failing * ugh git * I give up on cubic interp :D, 9 failing * sexy 1 liner gather, much improved, wow * polished gather to make it shine bright like a diamond * clean 1 liner for gather * improved readability of gather * uhh * clean up * more clean up * WHITEspace * implemented SoftmaxCrossEntropyLoss op * added comments and cleaned up if statements * update * thank based wozeparrot for pow and new GatherElements * CPU and TORCH all pass \| cast float64 -> float32 for all fromCPU() * _nearest_gather() failing on yolo * reverted ops_cpu change and added assert in Resize * added comments for resize for multiple channels * oops * merge * test * switched np.pad to Tensor.pad for constant padding * gah * gah2 * sexy reflect pad with movementops -> add * delete commented out lines * edge mode pad sexy as well * trying out model_benchmark * revert gitignore change lol * init * Revert "init" This reverts commit `682bf2073a`. * wrote cast workaround for CPU, CPU and TORCH all pass * wrote cast workaround for CPU, CPU and TORCH all pass * skipped tests w/ 0 shape for METAL and GPU * excluded tests for CLANG, CPU, TORCH, CLANG pass * fixed hacky ConvTranspose * gotta figure out autopad * UOps.STORE support cast bool -> float * small fix for fast gather * reverted 0 shape skipped tests * oops missed a file * added comment * fixed slice op hack * First commit to pr * More trig ops * More trig ops * format * isinf support * More ops * changed onnx_ops to use our new gather :D * Det op bug fix * rebase * fixed some tests * det broken and slow * fixed compress to use new gather * implemented argmax argmin * support variable types in type_proto * support Upsample and Identity sequence * we support float64 now and tinygrad support automatic broadcasting * added EyeLike op * resize does support multiple channels now actually * yolov8 onnx runs successfully * added batch size 1 * oops * finally fixed type_proto I think * fixed some llvm bugs * del whitespaces * added ZenginU Format PR * test * oops * added float64 exclude tests back * more skipped tests * try * ok openpilot pass * flake8 pass * woooooohooo * revert external_model_benchmark changes * perf tested gather * removed promote types from ops_cpu * numerical errors from 1681 is fixed --------- Co-authored-by: ZenginU <umutzengin00@gmail.com>	2023-09-05 13:23:32 -07:00
George Hotz	e17b1af160	UnaryOps.NEG (#1749 )	2023-09-03 12:44:26 -07:00
Roelof van Dijk	62536d6000	perf: use enumerate where possible (#1692 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-30 10:41:51 -07:00
George Hotz	d24f936501	just cmplt (#1493 ) * just cmplt * fix maximum * don't save, there's no backward * ugh, no slot either * eq is a scam	2023-08-08 13:58:10 -07:00
George Hotz	d67e248d9b	simple bitcast 2 (#1445 ) * simple bitcast 2 * bc 2 * empty * Revert "empty" This reverts commit `d8ee083655`.	2023-08-06 00:30:50 -07:00
geohotstan	4056f97187	Gather (#1329 )	2023-07-25 15:05:41 -04:00

1 2

78 Commits