tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 21:38:10 -05:00

Author	SHA1	Message	Date
Kunwar Raj Singh	9e6067378f	Broken Sigmoid backward: Add test and mlop for Sigmoid (#1113 ) * Add failing sigmoid test * update more tests * add mlop for sigmoid * add back test * math.log(math.e) = 1 * remove divides --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-07-04 00:14:22 -07:00
Reza Rezvan	8ae9a054ae	Refactor nn.optim (#1091 ) * Refactor: nn.optim.py * Refactor: nn.optim.py; Fix all tests * Refactor: Replace all optim.get_parameters() * Refactor: Revert list comp. * Refactor: Replace optim.get_state_dict * Refactor: Change quickstart.md	2023-07-02 15:07:30 -07:00
geohotstan	575f75f613	hello (#1084 )	2023-07-01 01:29:35 -07:00
Jacky Lee	754e54ebb9	Fix Tensor ceil and floor for whole numbers (#1071 ) * Works on non-special numbers * Test different cases	2023-06-27 23:22:17 -07:00
George Hotz	d16c16ec28	new upcast works (#1066 ) * new upcast works * float4 try * fix unaligned float4 * disallow unaligned access * upcast dim * maybe good now * fix gpu half * vstore_half4 * fix deep image bugs * improve symbolic to fix issues * fix symbolic * cl test * this maybe * gcd of 1 is 1 * real fix for old python * improve fuzzer	2023-06-27 19:34:53 -07:00
George Hotz	a98e361da0	torch speed test, add add	2023-06-26 18:55:27 -07:00
George Hotz	3e33befc1d	realize hotspots (#1059 ) * realize hotspots * no str check * minor changes * make this an assert * faster and more readable * nicer self.buffers * tests for weak op + LAZYCACHE=0	2023-06-26 18:31:18 -07:00
George Hotz	2977fb17f6	various touchups (#1058 ) * op isn't optional * barrier + named local buffers * end global and local loop together to avoid useless if statement * better comments	2023-06-26 15:41:23 -07:00
Rayan Hatout	dedbd970aa	Optimizations in lazy.py (#987 ) * optimizations in lazy.py * make mypy happy with stubs and fix the graph import hack * merge conflict in helpers.py	2023-06-26 13:55:42 -07:00
Roelof van Dijk	8c65f9324c	refactor: print formatting for llama timing (#1050 ) * refactor: print formatting for llama timing, report median and individual runs * feat: back to mean * fix: whitespace * fix: add mean to print --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-06-26 09:49:31 -07:00
Casey Primozic	52b7105f87	Dedup params in `Optimizer` (#1047 ) * Dedup params in optimizer * Passing the same tensor multiple times in the set of learnable params passed to optimizers can result in models completely failing to learn, but no errors are produced. This dedups tensors to avoid the problem. * Fix types * Use new variable to satisfy linter * Use `helpers.dedup` instead of `set()` to dedup params * Add test for duped params in optimizers	2023-06-26 00:49:23 -07:00
Kunwar Raj Singh	5d3310ce56	MaskRCNN Inference (#884 ) * MaskRCNN weights loading * backbone maybe works * backbone works, but resnet body atol 1e-3 * RPN Call, but veryy wrong output * fixed topk * RPN maybe works, not sure about nms * Fix cursed modules * add back editorconfig * Full call, wrong output * Full call works * fix mask * use NMS from retinanet * Removing extra funcs * refactor * readable * Add example to run model * remove filter * Fix split, batched inference is worse * Fix image sizes * Matching reference * merge master * add filter on top detections * cuda backend fixed * add model eval and spec * convert images to rgb * fix eval * simplify examples code * remove extra code * meshgrid using tinygrad * removing numpy * roi align, floor, ceil * remove numpy from level_mapper * remove numpy from pooler * Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference" This reverts commit `4b95a3cb49`, reversing changes made to `98f2b1fa2e`. * roi align gather * fix master merge * revert to old floor, ceil as ints present in domain * use log2 op * fix indexes * weird bug with ints and gpu * weird bug with ints and gpu * refactors, add env var for gather * floor with contiguous, where * refactor topk, sort * remove staticmethod * refactor stride * remove log2 mlop * realize -> contiguous * refactor forward * remove num_classes, stride_in_1x1 from state * refactor forward * refactoring * flake8 * removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk * keep using tinygrad for smaller gathers * fix empty tensors * comms * move from tensor.py * resnet test passing * add coco dataset back * fix spaces * add test for log2 * no need to create Tensors * no need to create Tensors --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-06-25 15:37:51 -07:00
George Hotz	0f281e7b18	touchups	2023-06-25 15:24:26 -07:00
George Hotz	c8fbdeb48e	test speed llama (#1046 ) * test speed llama * oops, put it back * uses the real device codegen * just do it on the mac * pp * is faster? * Revert "is faster?" This reverts commit `42db542010`. * disable docker again for less load on CI	2023-06-25 15:22:56 -07:00
Jacky Lee	5d16cc283f	Docker fix (#1039 ) * Docker test * Remove extra installs * Don't run full test * No need for testing dependencies	2023-06-25 10:38:58 -07:00
Francesco Castelli	6ff720103e	Reduce tensor dot line count and fixed 1d tensor dot (#1045 ) * fixed tensor.dot * no 1d dot for image=1 * shorter lines * add 3d dot tests	2023-06-25 10:32:45 -07:00
cloud11665	2407690d82	add cuda on cpu tests (#1020 )	2023-06-22 14:15:50 -07:00
George Hotz	18892242b0	global -> group (#1007 ) * global -> group * allow None for local_size in custom function * lil local * comment on shape * fix cuda * smart local cast * better local heuristic * fix ptx, and work_dim cleanup * fix metal * fix ops test * fix openpilot jit * no more optlocal * might fix metal tests * try metal now * see generated metal code * test free removal. REVERT THIS * mergable	2023-06-21 11:50:43 -07:00
Casey Primozic	aab9ee0fca	Add RDNA3 assembler `UOps.CAST` partial support + other fixes/improvements (#1012 ) * Add support for one case of `UOps.CAST` for RDNA3 assembler * Adds support for casting from `bool` -> `float32`. Seems like a very common operation that is required in many places. * Fix bool register definition for vector operations * Use `vcc_lo` instead of `vcc` which seems to be required since it's configured to use wavefront_size=32 * Add vector support for some places that were scalar only in register definition and comparison ops * Fix some issues in what seems to be defunct `external_test_image.py` * Some tests still don't pass for other reasons, but it at least runs now and one broken test is now fixed * Refactor RDNA3 assembler register definition * Unify multi-registor code between dtypes and combine with single-register allocation since they're all untyped registers at the end of the day	2023-06-20 11:34:10 -07:00
Diogo	57d3aa76a5	Windows & Ubuntu CLANG CI support (#1011 ) * matrix strategy * push env to GITHUB_ENV * use printf instead of echo * use temp helper function for cross os paths * use path join * switched to using temp helper function * skip test on windows due to memory limit * small fix * removed semi * touchups * clean up * seperate tests * test changes to test_utils on windows * small refactor * more cleanups * undo helpers change * only skip if in CI and WINDOWS	2023-06-19 09:33:24 -07:00
George Hotz	0d4c4f4e9e	metal ci attempt (#1010 ) * metal ci attempt * skip failing ops tests * skip in the ops test * no dtype test	2023-06-19 09:23:55 -07:00
George Hotz	0ac84d5e94	exclude a few more onnx tests	2023-06-19 08:51:29 -07:00
George Hotz	0fd648dff4	exclude more dumb onnx tests	2023-06-19 08:51:29 -07:00
George Hotz	5428b5d774	good changes from tensor_cores branch (#1005 ) * good changes from tensor_cores branch * touchups * real_strides fixup * refactor merge_views	2023-06-18 20:28:06 -07:00
Diogo	d2b837c1d9	Adds floor/ceil (#989 ) * floor ceil impl * control casting in numpy	2023-06-17 10:56:21 -07:00
sehaj	775287ed91	Add yolov8 implementation (#806 ) * added SPPF module from yolov8 * added conv_block, bottleneck modules * cleaned modules * c2f example * spf changes * C2f * fixed and tested bottleneck * improved detect class * tested spf and conv * checked c2f * DFL structure * fixed dfl * added dist2bbox function * added dist2bbox function * added and tested make_anchors function for the head * keeping functions above * creating the detection head * fixing head * untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou * head works * structure fixx * added darknet (backbone) * yolov8 neck, and intialize bias function while detection * fixed spacing * yolov8 class, init bias, and fixed c2f * forward pass almost working * fixed net structure * init bias not needed, forward pass working * load weights boilerplate * load weights done? * all variants loading! * post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested) * fix scale_boxes * box_iou fixed and tested * created the pre nms function * fix nms * fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked * added letterbox and pre_tranform for pre_process function * fixed letterbox, pre_transform and added preprocess function * custom NMS done, integrated prepare_boxes and nms, improved box_iou * added postprocess function till parsing * added draw_bounding_boxes_and_save function * testing full flow * using fetch for class names * fixed make_anchors + all tinygrad now * added command line arguments, weight downloading * single image for now only * made draw boxes more efficient * made NMS functions efficient * made compute_transform better * v8 working now, inference is done * prints objects detected in console now * fixed image loading (pre processing) * batch post processing * created initial tests * fixes bounding box thickness AND added get_detected_classes_with_frequency function * cleaning for testing * two tests * added url option for image, removed need for specifiying arguments * tests complete, but lots on things are printed on screen by ultralytics * remove parse arguments * fixed weight location * fixed colours of classes, and black font when high brightness * minor changes * TODOs for later * removed use of torch, using .npz weights * fixed tests * one path for fetch * preprocess now in tinygrad, plus test fix for that * updated tests * fix tests * no class labels needed * Add files via upload * Update showcase.md * Update showcase.md * added safe tensors as weights, and tests fix for that * safe tensors test * using safe_load * using tinygrad functions now to load weights * update tests --------- Co-authored-by: r3sist-uniq <amanmatreja@gmail.com> Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>	2023-06-16 18:55:19 -07:00
George Hotz	ba56ee6020	RDNA assembly backend ($1000 bounty) (#787 ) * Revert "Revert "ops rdna"" This reverts commit `0400315078`. * Revert "Revert "writing 2"" This reverts commit `325a3bf2cf`. * no dump * 2x 2 * simple asm * local size * sub * lil work * support args != 3 * assembler work * generate that * ptx assembler * begin index renderer * max * ptx loops * gemms work * valid works * asm working a bit more * close * passing all ops tests * ptx is a codegen only, not a backend * ptx * float16 support * rdna goes here * install types * make amd disassemble * ansilen for pretty print * fix ptx log2/exp2 * assemblyinstruction * new asm * working gemm * fix cmp * more passing * mod * ptx works again * rdan3 add works * log exp * sin is sin 2pi * fix types * progress * loops work * rdna xyz * better addressing * cleanups * handle exception in early process * div support * rdna float4 * locals work * fix neg index * cast * smaller diff * yaml * import only if selected * fromimport * types * this all needs rewriting * a few more	2023-06-16 09:33:18 -07:00
George Hotz	039f0d372f	delete ltypes (#984 ) * delete ltypes * only upcast float types * test dtype on mac passes * ugh, these upcasts	2023-06-15 16:24:45 -07:00
Diogo	0629791cbd	F64 support (#976 ) * initial commit * added osx check for opencl * added llvm f64 conversions * typo in llvmir * more tests and modified unsupported error * fixed linting error * added pragma fp64 * simplified exclusion for OSX * fixed device check and also added it to cast func * added ifdef check for fp16 in ops_gpu * Revert "added ifdef check for fp16 in ops_gpu" This reverts commit `92de754d48`. * f64 prekernel signature match f16 * moved condition to buffer init	2023-06-13 21:31:31 -07:00
George Hotz	80e665bddb	a couple new tests	2023-06-13 12:36:05 -07:00
Diogo	2d4370b487	Adds tril & triu support (#936 ) * triu & tril support * lint and kernel count error * switched shape indicies * larger shape tests * reverted numpy removal until #942 is resolved	2023-06-09 22:13:20 -07:00
George Hotz	48e9461197	broken tests for #862 and #942	2023-06-09 22:02:59 -07:00
George Hotz	c62c64f0b7	remove GeNode (#965 )	2023-06-09 21:48:56 -07:00
George Hotz	2c324d0685	fix metal uaf (#964 )	2023-06-09 21:28:06 -07:00
Diogo	666d151f8a	Onnx slice fixups (#952 ) * resolved some slice test errors and added some more debugging logs * use same device in cumsum * increased float priority * onnx debug ouput match input	2023-06-07 19:44:30 -07:00
cloud11665	43ea1614b0	fix inf/nan codegen (#935 ) * fix inf/nan codegen * remove nasty oneliner, fix -inf * inf/nan const mul/div tests	2023-06-05 11:24:09 -07:00
Filip Dimitrovski	78460034ff	Initial ellipsis support when slicing Tensors (#843 ) * Initial ellipsis support when slicing Tensors * Better comments in ellipsis slicing * Formatting	2023-06-05 07:52:49 -07:00
Tom Edwards	5bbcbd145c	Add cumsum with n-dim inputs (#922 ) * add cumsum with n-dim inputs, over arbitrary axis + relevant tests * increased rtol for cumsum test * move test_cumsum into test_ops * skip arange test for images as relies on cumsum * Fix typo * rewrite cumsum to work with images	2023-06-04 16:55:23 -07:00
MohammedAlkhrashi	2b4baa97e9	exclude string type from external_test_onnx_backend.py (#918 )	2023-06-03 19:10:52 -07:00
George Hotz	791530045d	Refactor LoadOps (#910 ) * test * work * upd test * loadops * cleanups * real ones * remove LazyNumpyArray * fix assign test * remove range * np.require * llama uses arange kernels * no caching consts * fix enet * torch load support * tests cleanup * fix shufflenet * fix image * fix torch_load test	2023-06-03 09:40:43 -07:00
George Hotz	d58586bb17	safetensors! (#903 ) * safetensors test * safe_save * load back with real safetensors * bugfix in device name. add simple torch_load * it works for llama, but it's slower... * mmap * no intermediate * load mmaped * readinto speed * not ready yet * revert that	2023-06-02 13:41:09 -07:00
Alexey Zaytsev	5feee9c94b	Fix .std() tests on torch=1.13 (#904 )	2023-06-02 07:33:51 -07:00
George Hotz	4d28d55683	add nn layer tests	2023-06-01 21:34:24 -07:00
George Hotz	8a928ed2f3	nn init matches torch (#901 )	2023-06-01 21:24:11 -07:00
wozeparrot	bfea5215e9	Add weight decay to SGD (#883 ) * feat: add weight decay to sgd * fix: fix tests	2023-06-01 13:13:18 -07:00
SnakeOnex	67a7674787	added conv1d tests -> simple, padding, stride, asymmetric padding (#896 ) * added conv1d tests -> simple, padding, stride, asymmetric padding * fixed linting * skip conv1d tests for images	2023-06-01 13:10:37 -07:00
Joqsan	ef129bcb85	Zero dim Tensor support (#777 ) * add and reorganize test_slice_* tests * refactor Tensor.__getitem__() * preliminary tests for 1) 0D tensors and 2) varargs for Tensor.zeros and Tensor.ones * always compare shapes of the numpy arrays obtained from tinygrad and torch tensors * add more tests for 0D support * remove test_tensor.test_slicing(). All slicing tests at test/test_ops.py * add zero-dim support * make test_end2end.py consistent with 0dim support * add test for tensor with zero in shape * don't simplify ones if shape is () * skip tests that need zero-size tensor support. - zero-size tensor support not related to 0dim tensors. * add tests for __getitem__() supporting strides >= 1 * refactor __getitem__: support for strides >= 1 * minor refactors and add comments to __getitem__ * add tests for slices with negative steps * add support for slices with negative strides	2023-06-01 11:32:02 -07:00
kposborne2	ae83e9844c	add output_padding to transposed conv (#875 )	2023-06-01 00:03:22 -07:00
Tom Edwards	115903a15c	Add unbiased std and corresponding tests (#881 ) * add unbiased std and corresponding tests * replaced unbiased with correction + tests	2023-05-31 16:32:36 -07:00
Bartłomiej Jargut	447b5847e2	Added test for empty tensor for Tensor.numel(), added missing numel call (#880 ) * Added few missing return typehints for tensor.py * added test for empty tensor for Tensor.numel() * fixed missing numel call in test_numel --------- Co-authored-by: deefi <dee7ine@gmail.com>	2023-05-31 12:28:47 -07:00

... 75 76 77 78 79 ...

4433 Commits