Commit Graph

4667 Commits

Author SHA1 Message Date
cloud11665
2407690d82 add cuda on cpu tests (#1020) 2023-06-22 14:15:50 -07:00
George Hotz
18892242b0 global -> group (#1007)
* global -> group

* allow None for local_size in custom function

* lil local

* comment on shape

* fix cuda

* smart local cast

* better local heuristic

* fix ptx, and work_dim cleanup

* fix metal

* fix ops test

* fix openpilot jit

* no more optlocal

* might fix metal tests

* try metal now

* see generated metal code

* test free removal. REVERT THIS

* mergable
2023-06-21 11:50:43 -07:00
Casey Primozic
aab9ee0fca Add RDNA3 assembler UOps.CAST partial support + other fixes/improvements (#1012)
* Add support for one case of `UOps.CAST` for RDNA3 assembler

 * Adds support for casting from `bool` -> `float32`.  Seems like a very common operation that is required in many places.
 * Fix bool register definition for vector operations
   * Use `vcc_lo` instead of `vcc` which seems to be required since it's configured to use wavefront_size=32
 * Add vector support for some places that were scalar only in register definition and comparison ops
 * Fix some issues in what seems to be defunct `external_test_image.py`
   * Some tests still don't pass for other reasons, but it at least runs now and one broken test is now fixed

* Refactor RDNA3 assembler register definition

 * Unify multi-registor code between dtypes and combine with single-register allocation since they're all untyped registers at the end of the day
2023-06-20 11:34:10 -07:00
Diogo
57d3aa76a5 Windows & Ubuntu CLANG CI support (#1011)
* matrix strategy

* push env to GITHUB_ENV

* use printf instead of echo

* use temp helper function for cross os paths

* use path join

* switched to using temp helper function

* skip test on windows due to memory limit

* small fix

* removed semi

* touchups

* clean up

* seperate tests

* test changes to test_utils on windows

* small refactor

* more cleanups

* undo helpers change

* only skip if in CI and WINDOWS
2023-06-19 09:33:24 -07:00
George Hotz
0d4c4f4e9e metal ci attempt (#1010)
* metal ci attempt

* skip failing ops tests

* skip in the ops test

* no dtype test
2023-06-19 09:23:55 -07:00
George Hotz
0ac84d5e94 exclude a few more onnx tests 2023-06-19 08:51:29 -07:00
George Hotz
0fd648dff4 exclude more dumb onnx tests 2023-06-19 08:51:29 -07:00
George Hotz
5428b5d774 good changes from tensor_cores branch (#1005)
* good changes from tensor_cores branch

* touchups

* real_strides fixup

* refactor merge_views
2023-06-18 20:28:06 -07:00
Diogo
d2b837c1d9 Adds floor/ceil (#989)
* floor ceil impl

* control casting in numpy
2023-06-17 10:56:21 -07:00
sehaj
775287ed91 Add yolov8 implementation (#806)
* added SPPF module from yolov8

* added conv_block, bottleneck modules

* cleaned modules

* c2f example

* spf changes

* C2f

* fixed and tested bottleneck

* improved detect class

* tested spf and conv

* checked c2f

* DFL structure

* fixed dfl

* added dist2bbox function

* added dist2bbox function

* added and tested make_anchors function for the head

* keeping functions above

* creating the detection head

* fixing head

* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou

* head works

* structure fixx

* added darknet (backbone)

* yolov8 neck, and intialize bias function while detection

* fixed spacing

* yolov8 class, init bias, and fixed c2f

* forward pass almost working

* fixed net structure

* init bias not needed, forward pass working

* load weights boilerplate

* load weights done?

* all variants loading!

* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)

* fix scale_boxes

* box_iou fixed and tested

* created the pre nms function

* fix nms

* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked

* added letterbox and pre_tranform for pre_process function

* fixed letterbox, pre_transform and added preprocess function

* custom NMS done, integrated prepare_boxes and nms, improved box_iou

* added postprocess function till parsing

* added draw_bounding_boxes_and_save function

* testing full flow

* using fetch for class names

* fixed make_anchors + all tinygrad now

* added command line arguments, weight downloading

* single image for now only

* made draw boxes more efficient

* made NMS functions efficient

* made compute_transform better

* v8 working now, inference is done

* prints objects detected in console now

* fixed image loading (pre processing)

* batch post processing

* created initial tests

* fixes bounding box thickness AND added get_detected_classes_with_frequency function

* cleaning for testing

* two tests

* added url option for image, removed need for specifiying arguments

* tests complete, but lots on things are printed on screen by ultralytics

* remove parse arguments

* fixed weight location

* fixed colours of classes, and black font when high brightness

* minor changes

* TODOs for later

* removed use of torch, using .npz weights

* fixed tests

* one path for fetch

* preprocess now in tinygrad, plus test fix for that

* updated tests

* fix tests

* no class labels needed

* Add files via upload

* Update showcase.md

* Update showcase.md

* added safe tensors as weights, and tests fix for that

* safe tensors test

* using safe_load

* using tinygrad functions now to load weights

* update tests

---------

Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
2023-06-16 18:55:19 -07:00
George Hotz
ba56ee6020 RDNA assembly backend ($1000 bounty) (#787)
* Revert "Revert "ops rdna""

This reverts commit 0400315078.

* Revert "Revert "writing 2""

This reverts commit 325a3bf2cf.

* no dump

* 2x 2

* simple asm

* local size

* sub

* lil work

* support args != 3

* assembler work

* generate that

* ptx assembler

* begin index renderer

* max

* ptx loops

* gemms work

* valid works

* asm working a bit more

* close

* passing all ops tests

* ptx is a codegen only, not a backend

* ptx

* float16 support

* rdna goes here

* install types

* make amd disassemble

* ansilen for pretty print

* fix ptx log2/exp2

* assemblyinstruction

* new asm

* working gemm

* fix cmp

* more passing

* mod

* ptx works again

* rdan3 add works

* log exp

* sin is sin 2pi

* fix types

* progress

* loops work

* rdna xyz

* better addressing

* cleanups

* handle exception in early process

* div support

* rdna float4

* locals work

* fix neg index

* cast

* smaller diff

* yaml

* import only if selected

* fromimport

* types

* this all needs rewriting

* a few more
2023-06-16 09:33:18 -07:00
George Hotz
039f0d372f delete ltypes (#984)
* delete ltypes

* only upcast float types

* test dtype on mac passes

* ugh, these upcasts
2023-06-15 16:24:45 -07:00
Diogo
0629791cbd F64 support (#976)
* initial commit

* added osx check for opencl

* added llvm f64 conversions

* typo in llvmir

* more tests and modified unsupported error

* fixed linting error

* added pragma fp64

* simplified exclusion for OSX

* fixed device check and also added it to cast func

* added ifdef check for fp16 in ops_gpu

* Revert "added ifdef check for fp16 in ops_gpu"

This reverts commit 92de754d48.

* f64 prekernel signature match f16

* moved condition to buffer init
2023-06-13 21:31:31 -07:00
George Hotz
80e665bddb a couple new tests 2023-06-13 12:36:05 -07:00
Diogo
2d4370b487 Adds tril & triu support (#936)
* triu & tril support

* lint and kernel count error

* switched shape indicies

* larger shape tests

* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
George Hotz
48e9461197 broken tests for #862 and #942 2023-06-09 22:02:59 -07:00
George Hotz
c62c64f0b7 remove GeNode (#965) 2023-06-09 21:48:56 -07:00
George Hotz
2c324d0685 fix metal uaf (#964) 2023-06-09 21:28:06 -07:00
Diogo
666d151f8a Onnx slice fixups (#952)
* resolved some slice test errors and added some more debugging logs

* use same device in cumsum

* increased float priority

* onnx debug ouput match input
2023-06-07 19:44:30 -07:00
cloud11665
43ea1614b0 fix inf/nan codegen (#935)
* fix inf/nan codegen

* remove nasty oneliner, fix -inf

* inf/nan const mul/div tests
2023-06-05 11:24:09 -07:00
Filip Dimitrovski
78460034ff Initial ellipsis support when slicing Tensors (#843)
* Initial ellipsis support when slicing Tensors

* Better comments in ellipsis slicing

* Formatting
2023-06-05 07:52:49 -07:00
Tom Edwards
5bbcbd145c Add cumsum with n-dim inputs (#922)
* add cumsum with n-dim inputs, over arbitrary axis + relevant tests

* increased rtol for cumsum test

* move test_cumsum into test_ops

* skip arange test for images as relies on cumsum

* Fix typo

* rewrite cumsum to work with images
2023-06-04 16:55:23 -07:00
MohammedAlkhrashi
2b4baa97e9 exclude string type from external_test_onnx_backend.py (#918) 2023-06-03 19:10:52 -07:00
George Hotz
791530045d Refactor LoadOps (#910)
* test

* work

* upd test

* loadops

* cleanups

* real ones

* remove LazyNumpyArray

* fix assign test

* remove range

* np.require

* llama uses arange kernels

* no caching consts

* fix enet

* torch load support

* tests cleanup

* fix shufflenet

* fix image

* fix torch_load test
2023-06-03 09:40:43 -07:00
George Hotz
d58586bb17 safetensors! (#903)
* safetensors test

* safe_save

* load back with real safetensors

* bugfix in device name. add simple torch_load

* it works for llama, but it's slower...

* mmap

* no intermediate

* load mmaped

* readinto speed

* not ready yet

* revert that
2023-06-02 13:41:09 -07:00
Alexey Zaytsev
5feee9c94b Fix .std() tests on torch=1.13 (#904) 2023-06-02 07:33:51 -07:00
George Hotz
4d28d55683 add nn layer tests 2023-06-01 21:34:24 -07:00
George Hotz
8a928ed2f3 nn init matches torch (#901) 2023-06-01 21:24:11 -07:00
wozeparrot
bfea5215e9 Add weight decay to SGD (#883)
* feat: add weight decay to sgd

* fix: fix tests
2023-06-01 13:13:18 -07:00
SnakeOnex
67a7674787 added conv1d tests -> simple, padding, stride, asymmetric padding (#896)
* added conv1d tests -> simple, padding, stride, asymmetric padding

* fixed linting

* skip conv1d tests for images
2023-06-01 13:10:37 -07:00
Joqsan
ef129bcb85 Zero dim Tensor support (#777)
* add and reorganize test_slice_* tests

* refactor Tensor.__getitem__()

* preliminary tests for 1) 0D tensors and 2) varargs for Tensor.zeros and Tensor.ones

* always compare shapes of the numpy arrays obtained from tinygrad and torch tensors

* add more tests for 0D support

* remove test_tensor.test_slicing(). All slicing tests at test/test_ops.py

* add zero-dim support

* make test_end2end.py consistent with 0dim support

* add test for tensor with zero in shape

* don't simplify ones if shape is ()

* skip tests that need zero-size tensor support.

- zero-size tensor support not related to 0dim tensors.

* add tests for __getitem__() supporting strides >= 1

* refactor __getitem__: support for strides >= 1

* minor refactors and add comments to __getitem__

* add tests for slices with negative steps

* add support for slices with negative strides
2023-06-01 11:32:02 -07:00
kposborne2
ae83e9844c add output_padding to transposed conv (#875) 2023-06-01 00:03:22 -07:00
Tom Edwards
115903a15c Add unbiased std and corresponding tests (#881)
* add unbiased std and corresponding tests

* replaced unbiased with correction + tests
2023-05-31 16:32:36 -07:00
Bartłomiej Jargut
447b5847e2 Added test for empty tensor for Tensor.numel(), added missing numel call (#880)
* Added few missing return typehints for tensor.py

* added test for empty tensor for Tensor.numel()

* fixed missing numel call in test_numel

---------

Co-authored-by: deefi <dee7ine@gmail.com>
2023-05-31 12:28:47 -07:00
Alexey Zaytsev
b58d875937 Add Tensor.ndim .element_size .is_floating_point (#876) 2023-05-31 09:00:35 -07:00
Diogo
1272d8526a Llvm int support (#866)
* added int val support to llvm

* lint fix

* added types

* fix merge issues
2023-05-30 17:49:26 -07:00
Nima Khodaveisi
5670123d88 Add tensor.numel (#869)
* add tensor.numel

* add tensor.numel
2023-05-30 16:08:09 -07:00
Diogo
0dab8edc97 support Int64 type in cstyle gen (#860)
* added metal int64 and some simple tests

* removed bool return type def

* typo in test

* also missing in clang and gpu runtimes

* switched order for opencl

* increased atol and removed new line in kernel prefix
2023-05-30 16:04:46 -07:00
Ubaidullah Khan
502e33652f add Tensor.full and Tensor.full_like and reuse them (#852)
* add Tensor.ones_like()

* add full_like and full and reuse in zeros,ones

* add tests for full and full_like
2023-05-29 17:48:09 -07:00
Rabia Eda Yılmaz
3075988468 Added kaiming_uniform initialization for Conv2d and Linear layers (#756)
* added kaiming_uniform init for conv2d and linear layers

* fix: set getattr

* up

* fix: set getattr

* fix comments

* better does not mean it is good

* more nonlinearities

* added test

checks the distribution of default relu option

* prettier

* fix kernel size

* edit distribution of returned tensor

* complete tests and fix fan_mode

* added higher dim test

* prettier test

* fix silly blank

* just leaky_relu mode

* default fan in and leaky relu

* update params

* fix test

* shorter

* generalize Tensor.uniform and adjust kaiming init

- added low and high parameters to Tensor.uniform function, so it can have a specific range (default is 0 to 1)
- adjusted return line of kaiming_uniform

* range from -1 to 1

* delete comment

* adjusted test_uniform

* fixed

* delete comment
2023-05-29 15:09:55 -07:00
Ubaidullah Khan
0e89c3f456 zeros_like use dtype if specified else default to tensor’s dtype (#848) 2023-05-29 11:38:34 -07:00
Diogo
1a5d72f812 Onnx ops And, Or, Xor, Not (#847)
* onnx and, or, xor, not

* added bool type to llvm and clang

* removed float conversion

* switched where op to use tensor func
2023-05-29 11:09:20 -07:00
George Hotz
ddc9dafe62 tighten up the kernel count tests 2023-05-29 08:48:54 -07:00
Ubaidullah Khan
c825cc4774 use tensor dtype for zeros_like() (#842)
* use tensor dtype for zeros_like()

* add tests for zeros_like dtype

* iterate over dtypes

* remove space

* remove print

* fix test, iterate over a list
2023-05-29 08:05:50 -07:00
Marcello Fuschi
6ea5df19b2 Fix conv_transpose2d asymmetric padding (#840) 2023-05-29 07:57:06 -07:00
wozeparrot
2fd2fb6380 int8/uint8 support (#837)
* feat: int8 support

* feat: uint8 support

* feat: int8 tests

* fix: fix uint8 on clang

* feat: test casting between int8/uint8/float16/float32

* clean: way cleaner dtype tests

* feat: preprocess_imagenet using the correct dtype

* feat: add test for overflow between uint8 and int8
2023-05-28 23:15:06 -07:00
Jacky Lee
5d212864b5 Add MLPerf UNet3D model (#775)
* Add ResNet inference test and cannon

* Test with ResNet50

* test_car works with resnet fix

* Add KiTS19 dataset

* KiTS19: Implement iterate

* No batch load for this dataset

* Save results on iterate

* Implement dice score

* Add data prep and eval functions

* Resolve shape issue

* Conversion works but wrong values

* Segfaults when load_from_pretrained is called

* Fix segfault and assign properly

* Final result generated, though very slow

* Store and load final result to save time

* Fix typo in finalize

* Score computes

* More bug fixes, dice score is very low

* Working broken code

* Assign output values to result

* Getting a much higher score now

* Fix dataset preprocessing

* Mean DICE score of 88.5

* Ugh, typo

* Attempt to reimplement model

* Rename layers

* Tiny model works, kinda

* Accuracy? gone

* Implement InstanceNorm and match torch

* Test instance norm 2d and 3d

* Combined input block with downsample block

* Tiny model works, support strided convtranspose

* Commands to download dataset

* Clean up a bit

* unet3d_v2 -> unet3d

* Remove duplicated code

* Oops, put tests back
2023-05-28 20:38:19 -07:00
George Hotz
59f9bcd4a4 Disktensors! (#819)
* make empty a real thing

* start ops_disk

* disk tensor works

* interpreted cleanup

* slice write to disk

* preprocess imagenet

* fix custom function
2023-05-28 15:40:37 -07:00
Marcello Fuschi
6d49925a26 Add max_pool2d dilation (#833) 2023-05-28 15:16:48 -07:00
wozeparrot
7460bd9b02 Add LAMB optimizer (#821)
* feat: initial lamb optimizer

* feat: corrently match tf impl and add test
2023-05-28 15:09:05 -07:00