tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-18 18:35:12 -05:00

Author	SHA1	Message	Date
Jacky Lee	754e54ebb9	Fix Tensor ceil and floor for whole numbers (#1071 ) * Works on non-special numbers * Test different cases	2023-06-27 23:22:17 -07:00
George Hotz	9fabdbd054	speed (#1070 )	2023-06-27 20:28:57 -07:00
George Hotz	d16c16ec28	new upcast works (#1066 ) * new upcast works * float4 try * fix unaligned float4 * disallow unaligned access * upcast dim * maybe good now * fix gpu half * vstore_half4 * fix deep image bugs * improve symbolic to fix issues * fix symbolic * cl test * this maybe * gcd of 1 is 1 * real fix for old python * improve fuzzer	2023-06-27 19:34:53 -07:00
George Hotz	c8d87eb8d4	strip whitespace	2023-06-27 10:11:43 -07:00
Rayan Hatout	23648538fa	fix folding of float4 add/mul (#1060 )	2023-06-26 20:59:29 -07:00
George Hotz	3e33befc1d	realize hotspots (#1059 ) * realize hotspots * no str check * minor changes * make this an assert * faster and more readable * nicer self.buffers * tests for weak op + LAZYCACHE=0	2023-06-26 18:31:18 -07:00
George Hotz	2977fb17f6	various touchups (#1058 ) * op isn't optional * barrier + named local buffers * end global and local loop together to avoid useless if statement * better comments	2023-06-26 15:41:23 -07:00
George Hotz	f265e8523a	movement ops aren't really ops (#1056 )	2023-06-26 15:01:28 -07:00
George Hotz	571089f10e	Back off minor speed stuff for simplicity (#1053 ) * passing in buffers doesn't increase speed * functools.reduce * no more get_buffers	2023-06-26 14:42:17 -07:00
Rayan Hatout	dedbd970aa	Optimizations in lazy.py (#987 ) * optimizations in lazy.py * make mypy happy with stubs and fix the graph import hack * merge conflict in helpers.py	2023-06-26 13:55:42 -07:00
Roelof van Dijk	8bea6b6d35	perf/refactor_weakops (#1052 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-06-26 10:13:33 -07:00
Roelof van Dijk	c604ef4beb	symbolic.py: faster Node.sum, faster SumNode.div (#1014 ) * refactor: replace isinstance with class check where possible * refactor: faster partition * fix; flake8 * feat: rework node.sum, correct list typing * fix: typo * feat: refactor sum * fix: pylint * refactor: simpler sum and factorize * feat; clean up sumnode div, all cpu tests pass * feat: simplify floordiv, cache factorization * don't factor numnodes at all * python 3.8 functools does not yet have @cache * fix: restore assert * refactor, fix failing tests * fix: address review comments * feat: rework, add specialization, remove cache * fix: remove specialization * feat: no tuple conversion, faster loop --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-06-26 09:47:17 -07:00
Casey Primozic	52b7105f87	Dedup params in `Optimizer` (#1047 ) * Dedup params in optimizer * Passing the same tensor multiple times in the set of learnable params passed to optimizers can result in models completely failing to learn, but no errors are produced. This dedups tensors to avoid the problem. * Fix types * Use new variable to satisfy linter * Use `helpers.dedup` instead of `set()` to dedup params * Add test for duped params in optimizers	2023-06-26 00:49:23 -07:00
Kunwar Raj Singh	5d3310ce56	MaskRCNN Inference (#884 ) * MaskRCNN weights loading * backbone maybe works * backbone works, but resnet body atol 1e-3 * RPN Call, but veryy wrong output * fixed topk * RPN maybe works, not sure about nms * Fix cursed modules * add back editorconfig * Full call, wrong output * Full call works * fix mask * use NMS from retinanet * Removing extra funcs * refactor * readable * Add example to run model * remove filter * Fix split, batched inference is worse * Fix image sizes * Matching reference * merge master * add filter on top detections * cuda backend fixed * add model eval and spec * convert images to rgb * fix eval * simplify examples code * remove extra code * meshgrid using tinygrad * removing numpy * roi align, floor, ceil * remove numpy from level_mapper * remove numpy from pooler * Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference" This reverts commit `4b95a3cb49`, reversing changes made to `98f2b1fa2e`. * roi align gather * fix master merge * revert to old floor, ceil as ints present in domain * use log2 op * fix indexes * weird bug with ints and gpu * weird bug with ints and gpu * refactors, add env var for gather * floor with contiguous, where * refactor topk, sort * remove staticmethod * refactor stride * remove log2 mlop * realize -> contiguous * refactor forward * remove num_classes, stride_in_1x1 from state * refactor forward * refactoring * flake8 * removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk * keep using tinygrad for smaller gathers * fix empty tensors * comms * move from tensor.py * resnet test passing * add coco dataset back * fix spaces * add test for log2 * no need to create Tensors * no need to create Tensors --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-06-25 15:37:51 -07:00
George Hotz	c8fbdeb48e	test speed llama (#1046 ) * test speed llama * oops, put it back * uses the real device codegen * just do it on the mac * pp * is faster? * Revert "is faster?" This reverts commit `42db542010`. * disable docker again for less load on CI	2023-06-25 15:22:56 -07:00
Francesco Castelli	6ff720103e	Reduce tensor dot line count and fixed 1d tensor dot (#1045 ) * fixed tensor.dot * no 1d dot for image=1 * shorter lines * add 3d dot tests	2023-06-25 10:32:45 -07:00
cloud11665	2407690d82	add cuda on cpu tests (#1020 )	2023-06-22 14:15:50 -07:00
Eli Frigo	e09219df0f	fixed division by zero for fast kernels (#1021 ) * fixed division by zero for fast operations * made et closer to 0	2023-06-22 14:02:53 -07:00
George Hotz	18892242b0	global -> group (#1007 ) * global -> group * allow None for local_size in custom function * lil local * comment on shape * fix cuda * smart local cast * better local heuristic * fix ptx, and work_dim cleanup * fix metal * fix ops test * fix openpilot jit * no more optlocal * might fix metal tests * try metal now * see generated metal code * test free removal. REVERT THIS * mergable	2023-06-21 11:50:43 -07:00
Casey Primozic	aab9ee0fca	Add RDNA3 assembler `UOps.CAST` partial support + other fixes/improvements (#1012 ) * Add support for one case of `UOps.CAST` for RDNA3 assembler * Adds support for casting from `bool` -> `float32`. Seems like a very common operation that is required in many places. * Fix bool register definition for vector operations * Use `vcc_lo` instead of `vcc` which seems to be required since it's configured to use wavefront_size=32 * Add vector support for some places that were scalar only in register definition and comparison ops * Fix some issues in what seems to be defunct `external_test_image.py` * Some tests still don't pass for other reasons, but it at least runs now and one broken test is now fixed * Refactor RDNA3 assembler register definition * Unify multi-registor code between dtypes and combine with single-register allocation since they're all untyped registers at the end of the day	2023-06-20 11:34:10 -07:00
Casey Primozic	651d6ea457	Minor improvements + cleanup to `ops_gpu.py` (#1006 ) * Minor improvements + cleanup to `ops_gpu.py` * Add some previously undocumented environment variables from `ops_gpu.py` to `env_vars.md` * Update debug print for OpenCL to print the devices that will be used post-filtering with `CL_EXCLUDE` * Remove a couple unused or superfluous variables and assignments * Use `fromimport` shorthand to shave off a couple precious LOC * Couple small whitespace changes to clean things up * Revert change to ordering of OpenCL devices * Small refactor for OpenCL context creation	2023-06-18 21:26:40 -07:00
George Hotz	5428b5d774	good changes from tensor_cores branch (#1005 ) * good changes from tensor_cores branch * touchups * real_strides fixup * refactor merge_views	2023-06-18 20:28:06 -07:00
George Hotz	b14b7bc749	don't make HIP the default...it's slower	2023-06-18 19:11:39 +00:00
Alex Wang	3d63c71e27	HIP backend (#750 ) * llama works for HIP backend * Use hipMemcpyAsync; Less lines of code * Remove unused code * Refactor * Add comments; hipDeviceSynchronize * HIP over GPU; Remove PyHIP dependency * Cleanups * Fix mypy check * Merge master; Dump assembly code	2023-06-18 11:35:57 -07:00
George Hotz	c690eeaca9	flip mulacc to save a line (#997 )	2023-06-17 16:47:55 -07:00
Diogo	d2b837c1d9	Adds floor/ceil (#989 ) * floor ceil impl * control casting in numpy	2023-06-17 10:56:21 -07:00
George Hotz	fe71282ba1	faster RDNA assembly backend (#990 ) * fast asm * torch gemm	2023-06-16 12:06:38 -07:00
George Hotz	ba56ee6020	RDNA assembly backend ($1000 bounty) (#787 ) * Revert "Revert "ops rdna"" This reverts commit `0400315078`. * Revert "Revert "writing 2"" This reverts commit `325a3bf2cf`. * no dump * 2x 2 * simple asm * local size * sub * lil work * support args != 3 * assembler work * generate that * ptx assembler * begin index renderer * max * ptx loops * gemms work * valid works * asm working a bit more * close * passing all ops tests * ptx is a codegen only, not a backend * ptx * float16 support * rdna goes here * install types * make amd disassemble * ansilen for pretty print * fix ptx log2/exp2 * assemblyinstruction * new asm * working gemm * fix cmp * more passing * mod * ptx works again * rdan3 add works * log exp * sin is sin 2pi * fix types * progress * loops work * rdna xyz * better addressing * cleanups * handle exception in early process * div support * rdna float4 * locals work * fix neg index * cast * smaller diff * yaml * import only if selected * fromimport * types * this all needs rewriting * a few more	2023-06-16 09:33:18 -07:00
George Hotz	dca084f227	minor == to is touchups	2023-06-15 17:11:12 -07:00
blake	041d96083c	clang rt for msvc (#986 ) * added platform config for clang runtime and tempfile dir for xplatform /tmp * flake8 lint * mypy lint * pythonic? * python? * return darwin cflags * <lines * lint;	2023-06-15 17:06:44 -07:00
George Hotz	039f0d372f	delete ltypes (#984 ) * delete ltypes * only upcast float types * test dtype on mac passes * ugh, these upcasts	2023-06-15 16:24:45 -07:00
Rayan Hatout	2d567ef688	Optimizations in tensor.py (#974 ) * optimizations in tensor.py * make mypy happy * revert split of Function class	2023-06-14 08:44:35 -07:00
Diogo	0629791cbd	F64 support (#976 ) * initial commit * added osx check for opencl * added llvm f64 conversions * typo in llvmir * more tests and modified unsupported error * fixed linting error * added pragma fp64 * simplified exclusion for OSX * fixed device check and also added it to cast func * added ifdef check for fp16 in ops_gpu * Revert "added ifdef check for fp16 in ops_gpu" This reverts commit `92de754d48`. * f64 prekernel signature match f16 * moved condition to buffer init	2023-06-13 21:31:31 -07:00
George Hotz	ba4eadb04c	PTX assembly support (#977 ) * ptx assembly * all ops tests pass * fix tests	2023-06-13 12:31:42 -07:00
Rayan Hatout	727416201f	Shapetracker optimizations (#966 ) * optimizations in shapetracker.py * revert micro-optimizations in assertions * make mypy happy * list comp instead of map in get_unsafe_resize_offset * list comp instead of map in get_unsafe_resize_offset	2023-06-12 18:13:21 -07:00
cloud11665	5f13e7c3cf	cuda: fix fp16, uint8, int64, half4 codegen (#968 ) * cuda: add uchar, int64 typedefs * cuda: fix float16 codegen * fuck it, half4 stub. llama time! * inline fp16 half4, revert changes to CStyleLanguage * add inline just in case * remove half4 operators * use dict	2023-06-12 11:15:44 -07:00
Diogo	613c74ca9f	maintain input tensor dtype (#969 )	2023-06-12 10:12:47 -07:00
Diogo	2d4370b487	Adds tril & triu support (#936 ) * triu & tril support * lint and kernel count error * switched shape indicies * larger shape tests * reverted numpy removal until #942 is resolved	2023-06-09 22:13:20 -07:00
George Hotz	c62c64f0b7	remove GeNode (#965 )	2023-06-09 21:48:56 -07:00
George Hotz	2c324d0685	fix metal uaf (#964 )	2023-06-09 21:28:06 -07:00
George Hotz	df40a9c238	EXP+LOG -> EXP2+LOG2 (#954 ) * EXP+LOG -> EXP2+LOG2 * update docs	2023-06-08 10:57:31 -07:00
Diogo	666d151f8a	Onnx slice fixups (#952 ) * resolved some slice test errors and added some more debugging logs * use same device in cumsum * increased float priority * onnx debug ouput match input	2023-06-07 19:44:30 -07:00
SnakeOnex	990fc40219	made seed None by default -> numpy picks a random seed (#946 ) * made seed None by default -> numpy picks a random seed * fixed _seed type * set the seed to unix timestamp * make filetype int only	2023-06-06 13:06:23 -07:00
Diogo	3bb38c3518	limit split to 1 due to windows path containing : (#944 )	2023-06-06 10:27:54 -07:00
cloud11665	43ea1614b0	fix inf/nan codegen (#935 ) * fix inf/nan codegen * remove nasty oneliner, fix -inf * inf/nan const mul/div tests	2023-06-05 11:24:09 -07:00
Filip Dimitrovski	78460034ff	Initial ellipsis support when slicing Tensors (#843 ) * Initial ellipsis support when slicing Tensors * Better comments in ellipsis slicing * Formatting	2023-06-05 07:52:49 -07:00
Steven Anderson	79613eb83e	Test min (#932 ) * fix __neg__ defaulting to float32 due to 0.0 * fixed __neg__ always defaulting to float32 * fixed openpilot (OpenCL) Test	2023-06-05 00:03:30 -07:00
Tom Edwards	5bbcbd145c	Add cumsum with n-dim inputs (#922 ) * add cumsum with n-dim inputs, over arbitrary axis + relevant tests * increased rtol for cumsum test * move test_cumsum into test_ops * skip arange test for images as relies on cumsum * Fix typo * rewrite cumsum to work with images	2023-06-04 16:55:23 -07:00
Steven Anderson	657e642e3a	Fixed test suite for Clip (#912 ) * Fixed test suite for Clip * fixed issue with clip when taking large negative numbers as min * Remove typings	2023-06-04 09:01:01 -07:00
kposborne2	0b88c5f923	Eliminate LoadOps.FROMCPU (#920 ) * Add fromCPU method to init LazyBuffer to eliminate LoadOps.FROMCPU * squish * remove failing test * seems logical * Revert "seems logical" This reverts commit `bbdcdc8713`. * inline and remove assertion * fromCPU staticmethod, defer non-cpu device to loadop * restore test	2023-06-04 08:55:50 -07:00

1 2 3 4 5 ...

1241 Commits