tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
chenyu	13575f080a	remove bitcast backward in function.py (#7031 ) bitcast cannot backward	2024-10-13 10:08:27 -04:00
Markiian Novosad	8831c691e2	Add slice parameter type checking to disallow Tensor usage for slices (#6967 ) * add support for single el tensors for slices * rm trailing spaces * cleanup long lines * remove tensor in slice support, add comprehensive err msg * cleanup getitem, add slice type check * Edit err message	2024-10-11 16:20:21 -04:00
chenyu	e4c0743188	failed example for logcumsumexp (#6936 ) need cummax for numerical stability	2024-10-07 10:55:45 -04:00
jeffzh4ng	19a7e41113	implement logcumsumexp (#6921 ) * implement logcumsumexp * change axis=None to axis=0	2024-10-06 10:45:36 -04:00
George Hotz	c178dc1071	faster uops ci [run_process_replay] (#6774 )	2024-09-26 20:15:01 +08:00
George Hotz	e945fa9c5c	put local on the PtrDtype [run_process_replay] (#6656 ) * put local on the PtrDtype [run_process_replay] * those are local too	2024-09-23 10:29:17 +08:00
Gaétan Lepage	f214bb140d	test: relax tolerance of test_broadcastdot (#6560 )	2024-09-17 03:26:39 -04:00
chenyu	b2c286f567	fix typing for test_ops (#6520 ) mostly passed TYPED=1 python3 -m pytest -n=auto test/test_ops.py. one last test specifically set an invalid value to test the exception, and to ignore that we need to import typeguard. And to get a working version of typeguard, we would need to get rid of dependency on tensorflow_addons because it requires a very old version of typeguard	2024-09-15 06:18:36 -04:00
chenyu	7df4373fd9	tensor reduction touchup (#6402 ) - fixing spacing - use get_args to get valid Literal values and raise ValueError to match, and a test for that - use `Y` to be consistent	2024-09-08 03:55:51 -04:00
Irakli Salia	2e01efc35f	tensor roll (#6375 ) * tensor roll function and tests * fix type annotations * reduce line count * more readable	2024-09-07 05:14:28 +08:00
Tim Becker	dfb818788e	Support `reduction` parameter in more loss functions (#6302 )	2024-09-07 05:11:20 +08:00
Oleg Rybalko	64f1384f5b	Einsum ellipsis support (#6333 ) * working ellipsis expansion * refactor * fix commas in output * add capital letters * refactor	2024-09-05 10:08:55 +08:00
nimlgen	326a77336e	qcom remove some tests skips (#6353 )	2024-09-04 15:38:18 +03:00
Vyacheslav Pachkov	4c33192a8b	add qcom runtime (#5213 ) * qcom: driver init * autogen stubs for msm_kgsl also fixup ioctls to show numbers instead of _IOW macros * autogen: add adreno commands and registers * ops_qcom: QcomAllocator + signals * fix EDEADLK in hwqueue, init timestamps, use opencl compiler for qcom * qcom: we do not really need all these constants input/output is enough * qcom: perfctr for CS (do not really need all the rest) * qcom: HALFREGFOOTPRINT and FULLREGFOOTPRINT are set to be around max * qcom: explicitly set instruction len based on the shader size * ops_qcom: Program init extracts shader from open cl binary sets input/output buffers allocates stack sets cs mode runs shader * use data64_le from helpers * ops_qcom: use fill_kernargs for filling i/o buffers * ops_qcom: add QcomCopyQueue just for api & set kernargs_args_offset * new signals & fix exec * add QCOM to the list of supported devices * correct QcomComputeQueue._wait using CP_WAIT_REG_MEM * fix exec, synchronize before copyout * correct setting num_units for ST_SHADER * fix gpu hangs on sigs with CP_MEM_WRITE, it is uncached mem anyway * extract offsets to kernel arguments from opencl binary * extract constants values and offsets from opencl binary * handle KGSL_MEMFLAGS_USE_CPU_MAP correctly * align kernel name to 4 bytes when skipping kernel opencl struct * skip to consts directly using an offset from opencl binary header * fix alloc * get halfreg and fullreg from opencl bin * set unmultipled global sizes as kernel group in HLSQ_CS_NDRANGE * parse prg offset from open cl binary * save loc with HLSQ_CS_CNTL. set this with HLSQ_CONTROL_2_REG * support for vals in _fill_kernargs * support 16-bit constants * use KGSL_CONTEXT_NO_FAULT_TOLERANCE for contexts this helps to not fall down when executing big kernels /* Don't time out if the context has disabled it / if (drawobj->context->flags & KGSL_CONTEXT_NO_FAULT_TOLERANCE) return; minor changes of _exec * QCOMRenderer * disable HCQGraph for demo. TOOD: support HCQ update api * support HCQ - remove copy queue - add updates - add strides for buffs and vars for QCOM * bufs_stride * clean ups * linter * call super().__init__(value) in QcomSignal * disable=unused-import * mypy * type ignore when queue is on the device * fix * query gpu_id. Will be useful for selecting commands e.g. CP_EVENT_WRITE vs CP_EVENT_WRITE7 * working timestamps * free context after device is done * move gpu stack to the device * reserve some space with lib_gpu for gpu to write to this fixes test_interpolate_bilinear * exclude tests that fails with GPU=1 on qualcomm * lint * unmap mem in _gpu_free * ctxt priority and preemtion policy * remove old qcom * pass size to self.device.allocator.free * skip tests only on qcom * use kgsl and adreno defines instead of numeric vals * use allocator for allocating lib_gpu * update to QcomArgsState from master * intermediate commit while conquering images * enable image tests on qcom * fix shader disasm size, dump textures stuff * working images * allow signals to be 0 * set branchstack from OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * set shared memory size from OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * update images in QcomArgsState & less loc for images * set stack sizes from OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * stack allocation based on OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * better autogen for kgsl and adreno. no more bitshifts Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * cleanup commit for parse cl lib Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * dont forget actual generated files * refactor + less loc Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * device.py back * lint * ruff * timestamp divisor Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * fix tex fmt & round global size Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * dtypes * 19.2MHz * -1 loc in _update_exec * remove noqa --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-09-02 19:35:47 +03:00
pedro	7de4eac8f7	add support and tests for nearest modes in interpolate, adapt uint8 bilinear to torch implementation (#6308 ) * add `nearest` mode to interpolate matching pytorch `nearest` which is knowingly buggy + relevant TestsOps * add `nearest-exact` mode to interpolate matching pytorch `nearest-exact` + relevant TestOps * fix uint8 bilinear interpolation by matching custom torch implementation * implement uint8 lerp with torch interpolation trick without converting it to float	2024-08-28 21:59:51 -07:00
Max-We	ab2714423b	Add einsum tests (#6286 ) Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com>	2024-08-26 09:09:25 -07:00
chenyu	af7c04ff57	Tensor.__floordiv__ (#6283 ) support Tensor.__floordiv__ and friends	2024-08-26 09:43:40 -04:00
chenyu	da5cf11859	fix acc init value for MUL (#6263 )	2024-08-23 23:19:44 -04:00
chenyu	590c0922b6	Tensor.prod (#6250 ) * Tensor.prod a new reduce op! * onnx ReduceProd	2024-08-23 10:06:32 -04:00
Gabe Caldwell	bdd6325f31	default num_classes value for one_hot (#6182 ) * num_classes=-1 If num_classes set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor. * num_classes desc comment to explain num_classes default and what that means. * replacing ' with `	2024-08-19 12:07:14 -07:00
Alessandro Benetti	9328248610	support for std_mean and cross_entropy (#6181 ) * support for std_mean and cross_entropy (#3) * Cross entropy and std mean support * remove extra examples	2024-08-19 12:06:44 -07:00
George Hotz	553ae9ebc0	bilinear interp uint8 fails (#6103 ) * new test for e2e compile failures * fix bug * bilinear interp uint8 fails * better tests	2024-08-15 19:34:39 -07:00
chenyu	4a65010de8	remove CUDACPU flag in tests [run_process_replay] (#5902 ) no longer used	2024-08-04 16:06:38 -04:00
chenyu	b392b8edc3	increase atol and rtol test_gemm_fp16 (#5866 ) * increase atol and rtol test_gemm_fp16 made it pass with NOOPT which has larger accumulated error * revert that	2024-08-01 19:09:58 -04:00
chenyu	defd89e8e0	unify negative shape creation to raise ValueError (#5817 ) [run_process_replay]	2024-07-30 13:42:59 -04:00
P4ssenger	6742a4789a	Add check for negative dimension in view (#5790 ) * add check for negative dimension in view * add negative dim tests * move check to tensor level * fix error message * move check to view create --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-30 13:26:27 -04:00
samm393	573e0f9a48	remove float division from idiv in python_alu (#5777 ) * removes float division from idiv in python_alu * add test * cleaner logic * pass clang unsigned literals correctly * suffix ULL instead of U --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-29 12:14:12 -04:00
George Hotz	053550c3f3	remove MERGE opt, cleanup wmma upcast (#5669 ) * remove MERGE opt, cleanup wmma upcast * upcast first * fix broken vectorize folding rule	2024-07-23 20:43:42 -07:00
George Hotz	e3f00ac77d	Fix cuda tc emu test (#5663 ) * fix acc folding for NV tensor cores * fix correctness of reduce_before_expand * fix test emulated CUDA tensor cores * test_gemm_fp16 on some devices	2024-07-23 15:04:25 -07:00
George Hotz	386fb5e7f8	folding without UNMUL (#5628 ) * folding without UNMUL * fix failures, index_collapse * import ReduceOps * test_arange_4096 isn't folding	2024-07-21 20:14:44 -07:00
George Hotz	0ad87021e2	move acc to end (#5568 ) * move acc to end * confirmed pictures are the same * relax that * Update test_ops.py	2024-07-19 03:06:52 -07:00
chenyu	6e405b0a2b	add 0d tensor to trunc/floor/ceil/round tests (#5512 ) existing trunc test passes backward but its backward is incorrect in general. added tests that would fail	2024-07-16 16:48:25 -04:00
Tobias Fischer	87a2ef2bc2	Add Interpolate Function (#5482 ) * add interpolate function * fixed linter issue * reduced sizes in test --------- Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2024-07-16 09:44:01 -07:00
Tobias Fischer	e219103677	Add Pad to Pooling (#5488 )	2024-07-14 21:50:20 -07:00
Tobias Fischer	5849130cbb	gather negative dim fix (#5486 )	2024-07-14 20:20:53 -04:00
chenyu	00813a92a0	update Tensor.eye api to match torch (#5433 ) * update Tensor.eye api to match torch input is n for nrows and optional m for ncols * space * fix onnx	2024-07-12 20:25:12 -04:00
chenyu	64986f949c	more transcend math tests in ci (#5368 ) * more transcend math tests in ci test large input to trig functions that hit different reduction algo, and test TRANSCENDENTAL=2 for all backend * no CUDACPU * try that	2024-07-10 21:19:09 -04:00
chenyu	0f0940225a	fix Tensor.all and Tensor.any for PTX (#5335 ) supported boolean acc and boolean phi. and rewrite boolean max to uint8 max	2024-07-08 18:15:04 -04:00
chenyu	6856f915d6	Tensor.any and Tensor.all (#5320 ) does not work in ptx yet due to how boolean tensor is handled	2024-07-07 14:36:00 -04:00
chenyu	2029cb7047	support passing None to Tensor.clip (#5319 ) passing None for no upper bound or no lower bound	2024-07-07 13:04:22 -04:00
chenyu	c1e330f302	Tensor.int and Tensor.bool (#5317 )	2024-07-07 11:52:58 -04:00
George Hotz	e53b164e1a	small changes from lowerer (#5266 )	2024-07-02 15:03:54 -07:00
George Hotz	3df47bc21e	OpenELM + repeat_interleave (#5234 ) * start writing openelm * progress...hit bug * repeat_interleave support * gqa * add rotary embedding * spp * i think it runs correctly * broken * output is good now * cleanups * no io_uring on android	2024-06-30 15:18:39 -07:00
hikettei	ad1ca7da64	[Feature] Added BinaryOps.AND/BinaryOps.OR (#5223 ) * [Feature] Added BinaryOps.AND/BinaryOps.OR * Add: __rand__, __ror__	2024-06-29 17:20:25 -07:00
chenyu	ee0c6dfc15	build Tensor._tri with movements only (#5110 ) * build Tensor._tri with movements only doesn't need arange, saved a kernel in attention mask * simpler, more tests	2024-06-23 00:07:36 -04:00
chenyu	20fabd8a5b	update Tensor.triu and Tensor.tril (#5109 ) renamed arg to `diagonal` that matches torch api, and added document and examples	2024-06-22 21:59:50 -04:00
George Hotz	9f875123b6	small changes from lowerer. [run_process_replay] [no_assert] (#5102 )	2024-06-22 11:09:35 -07:00
chenyu	166a2b19b5	fix reduce axis of 0d tensors (#5089 ) `x.sum(())` is fine, and `x.sum((1,))` should throw IndexError	2024-06-21 13:51:40 -04:00
chenyu	36b4a492a1	explicitly check getitem indices can have at most one ellipsis (#5087 ) * explicitly check getitem indices can have at most one ellipsis previous error with multiple `...`: ``` if index_type not in [None, int, slice, Tensor]: raise IndexError(f"{index_type=} not supported") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IndexError: index_type=<class 'ellipsis'> not supported ``` this pr: ``` if len(ellipsis_idx) > 1: raise IndexError("an index can only have a single ellipsis ('...')") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IndexError: an index can only have a single ellipsis ('...') ``` * oh we have that already * test that * test these	2024-06-21 12:33:18 -04:00
chenyu	f6d6760f71	don't cast tuple to list before creating Tensor (#5071 ) Tensor constructor supports creating from tuple now	2024-06-20 13:32:56 -04:00

1 2 3 4 5 ...

434 Commits