tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 09:28:04 -05:00

Author	SHA1	Message	Date
cloud11665	2407690d82	add cuda on cpu tests (#1020 )	2023-06-22 14:15:50 -07:00
George Hotz	18892242b0	global -> group (#1007 ) * global -> group * allow None for local_size in custom function * lil local * comment on shape * fix cuda * smart local cast * better local heuristic * fix ptx, and work_dim cleanup * fix metal * fix ops test * fix openpilot jit * no more optlocal * might fix metal tests * try metal now * see generated metal code * test free removal. REVERT THIS * mergable	2023-06-21 11:50:43 -07:00
Casey Primozic	aab9ee0fca	Add RDNA3 assembler `UOps.CAST` partial support + other fixes/improvements (#1012 ) * Add support for one case of `UOps.CAST` for RDNA3 assembler * Adds support for casting from `bool` -> `float32`. Seems like a very common operation that is required in many places. * Fix bool register definition for vector operations * Use `vcc_lo` instead of `vcc` which seems to be required since it's configured to use wavefront_size=32 * Add vector support for some places that were scalar only in register definition and comparison ops * Fix some issues in what seems to be defunct `external_test_image.py` * Some tests still don't pass for other reasons, but it at least runs now and one broken test is now fixed * Refactor RDNA3 assembler register definition * Unify multi-registor code between dtypes and combine with single-register allocation since they're all untyped registers at the end of the day	2023-06-20 11:34:10 -07:00
Diogo	57d3aa76a5	Windows & Ubuntu CLANG CI support (#1011 ) * matrix strategy * push env to GITHUB_ENV * use printf instead of echo * use temp helper function for cross os paths * use path join * switched to using temp helper function * skip test on windows due to memory limit * small fix * removed semi * touchups * clean up * seperate tests * test changes to test_utils on windows * small refactor * more cleanups * undo helpers change * only skip if in CI and WINDOWS	2023-06-19 09:33:24 -07:00
George Hotz	0d4c4f4e9e	metal ci attempt (#1010 ) * metal ci attempt * skip failing ops tests * skip in the ops test * no dtype test	2023-06-19 09:23:55 -07:00
George Hotz	0ac84d5e94	exclude a few more onnx tests	2023-06-19 08:51:29 -07:00
George Hotz	0fd648dff4	exclude more dumb onnx tests	2023-06-19 08:51:29 -07:00
George Hotz	5428b5d774	good changes from tensor_cores branch (#1005 ) * good changes from tensor_cores branch * touchups * real_strides fixup * refactor merge_views	2023-06-18 20:28:06 -07:00
Diogo	d2b837c1d9	Adds floor/ceil (#989 ) * floor ceil impl * control casting in numpy	2023-06-17 10:56:21 -07:00
sehaj	775287ed91	Add yolov8 implementation (#806 ) * added SPPF module from yolov8 * added conv_block, bottleneck modules * cleaned modules * c2f example * spf changes * C2f * fixed and tested bottleneck * improved detect class * tested spf and conv * checked c2f * DFL structure * fixed dfl * added dist2bbox function * added dist2bbox function * added and tested make_anchors function for the head * keeping functions above * creating the detection head * fixing head * untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou * head works * structure fixx * added darknet (backbone) * yolov8 neck, and intialize bias function while detection * fixed spacing * yolov8 class, init bias, and fixed c2f * forward pass almost working * fixed net structure * init bias not needed, forward pass working * load weights boilerplate * load weights done? * all variants loading! * post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested) * fix scale_boxes * box_iou fixed and tested * created the pre nms function * fix nms * fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked * added letterbox and pre_tranform for pre_process function * fixed letterbox, pre_transform and added preprocess function * custom NMS done, integrated prepare_boxes and nms, improved box_iou * added postprocess function till parsing * added draw_bounding_boxes_and_save function * testing full flow * using fetch for class names * fixed make_anchors + all tinygrad now * added command line arguments, weight downloading * single image for now only * made draw boxes more efficient * made NMS functions efficient * made compute_transform better * v8 working now, inference is done * prints objects detected in console now * fixed image loading (pre processing) * batch post processing * created initial tests * fixes bounding box thickness AND added get_detected_classes_with_frequency function * cleaning for testing * two tests * added url option for image, removed need for specifiying arguments * tests complete, but lots on things are printed on screen by ultralytics * remove parse arguments * fixed weight location * fixed colours of classes, and black font when high brightness * minor changes * TODOs for later * removed use of torch, using .npz weights * fixed tests * one path for fetch * preprocess now in tinygrad, plus test fix for that * updated tests * fix tests * no class labels needed * Add files via upload * Update showcase.md * Update showcase.md * added safe tensors as weights, and tests fix for that * safe tensors test * using safe_load * using tinygrad functions now to load weights * update tests --------- Co-authored-by: r3sist-uniq <amanmatreja@gmail.com> Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>	2023-06-16 18:55:19 -07:00
George Hotz	ba56ee6020	RDNA assembly backend ($1000 bounty) (#787 ) * Revert "Revert "ops rdna"" This reverts commit `0400315078`. * Revert "Revert "writing 2"" This reverts commit `325a3bf2cf`. * no dump * 2x 2 * simple asm * local size * sub * lil work * support args != 3 * assembler work * generate that * ptx assembler * begin index renderer * max * ptx loops * gemms work * valid works * asm working a bit more * close * passing all ops tests * ptx is a codegen only, not a backend * ptx * float16 support * rdna goes here * install types * make amd disassemble * ansilen for pretty print * fix ptx log2/exp2 * assemblyinstruction * new asm * working gemm * fix cmp * more passing * mod * ptx works again * rdan3 add works * log exp * sin is sin 2pi * fix types * progress * loops work * rdna xyz * better addressing * cleanups * handle exception in early process * div support * rdna float4 * locals work * fix neg index * cast * smaller diff * yaml * import only if selected * fromimport * types * this all needs rewriting * a few more	2023-06-16 09:33:18 -07:00
George Hotz	039f0d372f	delete ltypes (#984 ) * delete ltypes * only upcast float types * test dtype on mac passes * ugh, these upcasts	2023-06-15 16:24:45 -07:00
Diogo	0629791cbd	F64 support (#976 ) * initial commit * added osx check for opencl * added llvm f64 conversions * typo in llvmir * more tests and modified unsupported error * fixed linting error * added pragma fp64 * simplified exclusion for OSX * fixed device check and also added it to cast func * added ifdef check for fp16 in ops_gpu * Revert "added ifdef check for fp16 in ops_gpu" This reverts commit `92de754d48`. * f64 prekernel signature match f16 * moved condition to buffer init	2023-06-13 21:31:31 -07:00
George Hotz	80e665bddb	a couple new tests	2023-06-13 12:36:05 -07:00
Diogo	2d4370b487	Adds tril & triu support (#936 ) * triu & tril support * lint and kernel count error * switched shape indicies * larger shape tests * reverted numpy removal until #942 is resolved	2023-06-09 22:13:20 -07:00
George Hotz	48e9461197	broken tests for #862 and #942	2023-06-09 22:02:59 -07:00
George Hotz	c62c64f0b7	remove GeNode (#965 )	2023-06-09 21:48:56 -07:00
George Hotz	2c324d0685	fix metal uaf (#964 )	2023-06-09 21:28:06 -07:00
Diogo	666d151f8a	Onnx slice fixups (#952 ) * resolved some slice test errors and added some more debugging logs * use same device in cumsum * increased float priority * onnx debug ouput match input	2023-06-07 19:44:30 -07:00
cloud11665	43ea1614b0	fix inf/nan codegen (#935 ) * fix inf/nan codegen * remove nasty oneliner, fix -inf * inf/nan const mul/div tests	2023-06-05 11:24:09 -07:00
Filip Dimitrovski	78460034ff	Initial ellipsis support when slicing Tensors (#843 ) * Initial ellipsis support when slicing Tensors * Better comments in ellipsis slicing * Formatting	2023-06-05 07:52:49 -07:00
Tom Edwards	5bbcbd145c	Add cumsum with n-dim inputs (#922 ) * add cumsum with n-dim inputs, over arbitrary axis + relevant tests * increased rtol for cumsum test * move test_cumsum into test_ops * skip arange test for images as relies on cumsum * Fix typo * rewrite cumsum to work with images	2023-06-04 16:55:23 -07:00
MohammedAlkhrashi	2b4baa97e9	exclude string type from external_test_onnx_backend.py (#918 )	2023-06-03 19:10:52 -07:00
George Hotz	791530045d	Refactor LoadOps (#910 ) * test * work * upd test * loadops * cleanups * real ones * remove LazyNumpyArray * fix assign test * remove range * np.require * llama uses arange kernels * no caching consts * fix enet * torch load support * tests cleanup * fix shufflenet * fix image * fix torch_load test	2023-06-03 09:40:43 -07:00
George Hotz	d58586bb17	safetensors! (#903 ) * safetensors test * safe_save * load back with real safetensors * bugfix in device name. add simple torch_load * it works for llama, but it's slower... * mmap * no intermediate * load mmaped * readinto speed * not ready yet * revert that	2023-06-02 13:41:09 -07:00
Alexey Zaytsev	5feee9c94b	Fix .std() tests on torch=1.13 (#904 )	2023-06-02 07:33:51 -07:00
George Hotz	4d28d55683	add nn layer tests	2023-06-01 21:34:24 -07:00
George Hotz	8a928ed2f3	nn init matches torch (#901 )	2023-06-01 21:24:11 -07:00
wozeparrot	bfea5215e9	Add weight decay to SGD (#883 ) * feat: add weight decay to sgd * fix: fix tests	2023-06-01 13:13:18 -07:00
SnakeOnex	67a7674787	added conv1d tests -> simple, padding, stride, asymmetric padding (#896 ) * added conv1d tests -> simple, padding, stride, asymmetric padding * fixed linting * skip conv1d tests for images	2023-06-01 13:10:37 -07:00
Joqsan	ef129bcb85	Zero dim Tensor support (#777 ) * add and reorganize test_slice_* tests * refactor Tensor.__getitem__() * preliminary tests for 1) 0D tensors and 2) varargs for Tensor.zeros and Tensor.ones * always compare shapes of the numpy arrays obtained from tinygrad and torch tensors * add more tests for 0D support * remove test_tensor.test_slicing(). All slicing tests at test/test_ops.py * add zero-dim support * make test_end2end.py consistent with 0dim support * add test for tensor with zero in shape * don't simplify ones if shape is () * skip tests that need zero-size tensor support. - zero-size tensor support not related to 0dim tensors. * add tests for __getitem__() supporting strides >= 1 * refactor __getitem__: support for strides >= 1 * minor refactors and add comments to __getitem__ * add tests for slices with negative steps * add support for slices with negative strides	2023-06-01 11:32:02 -07:00
kposborne2	ae83e9844c	add output_padding to transposed conv (#875 )	2023-06-01 00:03:22 -07:00
Tom Edwards	115903a15c	Add unbiased std and corresponding tests (#881 ) * add unbiased std and corresponding tests * replaced unbiased with correction + tests	2023-05-31 16:32:36 -07:00
Bartłomiej Jargut	447b5847e2	Added test for empty tensor for Tensor.numel(), added missing numel call (#880 ) * Added few missing return typehints for tensor.py * added test for empty tensor for Tensor.numel() * fixed missing numel call in test_numel --------- Co-authored-by: deefi <dee7ine@gmail.com>	2023-05-31 12:28:47 -07:00
Alexey Zaytsev	b58d875937	Add Tensor.ndim .element_size .is_floating_point (#876 )	2023-05-31 09:00:35 -07:00
Diogo	1272d8526a	Llvm int support (#866 ) * added int val support to llvm * lint fix * added types * fix merge issues	2023-05-30 17:49:26 -07:00
Nima Khodaveisi	5670123d88	Add tensor.numel (#869 ) * add tensor.numel * add tensor.numel	2023-05-30 16:08:09 -07:00
Diogo	0dab8edc97	support Int64 type in cstyle gen (#860 ) * added metal int64 and some simple tests * removed bool return type def * typo in test * also missing in clang and gpu runtimes * switched order for opencl * increased atol and removed new line in kernel prefix	2023-05-30 16:04:46 -07:00
Ubaidullah Khan	502e33652f	add Tensor.full and Tensor.full_like and reuse them (#852 ) * add Tensor.ones_like() * add full_like and full and reuse in zeros,ones * add tests for full and full_like	2023-05-29 17:48:09 -07:00
Rabia Eda Yılmaz	3075988468	Added kaiming_uniform initialization for Conv2d and Linear layers (#756 ) * added kaiming_uniform init for conv2d and linear layers * fix: set getattr * up * fix: set getattr * fix comments * better does not mean it is good * more nonlinearities * added test checks the distribution of default relu option * prettier * fix kernel size * edit distribution of returned tensor * complete tests and fix fan_mode * added higher dim test * prettier test * fix silly blank * just leaky_relu mode * default fan in and leaky relu * update params * fix test * shorter * generalize Tensor.uniform and adjust kaiming init - added low and high parameters to Tensor.uniform function, so it can have a specific range (default is 0 to 1) - adjusted return line of kaiming_uniform * range from -1 to 1 * delete comment * adjusted test_uniform * fixed * delete comment	2023-05-29 15:09:55 -07:00
Ubaidullah Khan	0e89c3f456	zeros_like use dtype if specified else default to tensor’s dtype (#848 )	2023-05-29 11:38:34 -07:00
Diogo	1a5d72f812	Onnx ops And, Or, Xor, Not (#847 ) * onnx and, or, xor, not * added bool type to llvm and clang * removed float conversion * switched where op to use tensor func	2023-05-29 11:09:20 -07:00
George Hotz	ddc9dafe62	tighten up the kernel count tests	2023-05-29 08:48:54 -07:00
Ubaidullah Khan	c825cc4774	use tensor dtype for zeros_like() (#842 ) * use tensor dtype for zeros_like() * add tests for zeros_like dtype * iterate over dtypes * remove space * remove print * fix test, iterate over a list	2023-05-29 08:05:50 -07:00
Marcello Fuschi	6ea5df19b2	Fix conv_transpose2d asymmetric padding (#840 )	2023-05-29 07:57:06 -07:00
wozeparrot	2fd2fb6380	int8/uint8 support (#837 ) * feat: int8 support * feat: uint8 support * feat: int8 tests * fix: fix uint8 on clang * feat: test casting between int8/uint8/float16/float32 * clean: way cleaner dtype tests * feat: preprocess_imagenet using the correct dtype * feat: add test for overflow between uint8 and int8	2023-05-28 23:15:06 -07:00
Jacky Lee	5d212864b5	Add MLPerf UNet3D model (#775 ) * Add ResNet inference test and cannon * Test with ResNet50 * test_car works with resnet fix * Add KiTS19 dataset * KiTS19: Implement iterate * No batch load for this dataset * Save results on iterate * Implement dice score * Add data prep and eval functions * Resolve shape issue * Conversion works but wrong values * Segfaults when load_from_pretrained is called * Fix segfault and assign properly * Final result generated, though very slow * Store and load final result to save time * Fix typo in finalize * Score computes * More bug fixes, dice score is very low * Working broken code * Assign output values to result * Getting a much higher score now * Fix dataset preprocessing * Mean DICE score of 88.5 * Ugh, typo * Attempt to reimplement model * Rename layers * Tiny model works, kinda * Accuracy? gone * Implement InstanceNorm and match torch * Test instance norm 2d and 3d * Combined input block with downsample block * Tiny model works, support strided convtranspose * Commands to download dataset * Clean up a bit * unet3d_v2 -> unet3d * Remove duplicated code * Oops, put tests back	2023-05-28 20:38:19 -07:00
George Hotz	59f9bcd4a4	Disktensors! (#819 ) * make empty a real thing * start ops_disk * disk tensor works * interpreted cleanup * slice write to disk * preprocess imagenet * fix custom function	2023-05-28 15:40:37 -07:00
Marcello Fuschi	6d49925a26	Add max_pool2d dilation (#833 )	2023-05-28 15:16:48 -07:00
wozeparrot	7460bd9b02	Add LAMB optimizer (#821 ) * feat: initial lamb optimizer * feat: corrently match tf impl and add test	2023-05-28 15:09:05 -07:00

... 80 81 82 83 84 ...

4667 Commits