Commit Graph

44 Commits

Author SHA1 Message Date
sehaj
775287ed91 Add yolov8 implementation (#806)
* added SPPF module from yolov8

* added conv_block, bottleneck modules

* cleaned modules

* c2f example

* spf changes

* C2f

* fixed and tested bottleneck

* improved detect class

* tested spf and conv

* checked c2f

* DFL structure

* fixed dfl

* added dist2bbox function

* added dist2bbox function

* added and tested make_anchors function for the head

* keeping functions above

* creating the detection head

* fixing head

* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou

* head works

* structure fixx

* added darknet (backbone)

* yolov8 neck, and intialize bias function while detection

* fixed spacing

* yolov8 class, init bias, and fixed c2f

* forward pass almost working

* fixed net structure

* init bias not needed, forward pass working

* load weights boilerplate

* load weights done?

* all variants loading!

* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)

* fix scale_boxes

* box_iou fixed and tested

* created the pre nms function

* fix nms

* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked

* added letterbox and pre_tranform for pre_process function

* fixed letterbox, pre_transform and added preprocess function

* custom NMS done, integrated prepare_boxes and nms, improved box_iou

* added postprocess function till parsing

* added draw_bounding_boxes_and_save function

* testing full flow

* using fetch for class names

* fixed make_anchors + all tinygrad now

* added command line arguments, weight downloading

* single image for now only

* made draw boxes more efficient

* made NMS functions efficient

* made compute_transform better

* v8 working now, inference is done

* prints objects detected in console now

* fixed image loading (pre processing)

* batch post processing

* created initial tests

* fixes bounding box thickness AND added get_detected_classes_with_frequency function

* cleaning for testing

* two tests

* added url option for image, removed need for specifiying arguments

* tests complete, but lots on things are printed on screen by ultralytics

* remove parse arguments

* fixed weight location

* fixed colours of classes, and black font when high brightness

* minor changes

* TODOs for later

* removed use of torch, using .npz weights

* fixed tests

* one path for fetch

* preprocess now in tinygrad, plus test fix for that

* updated tests

* fix tests

* no class labels needed

* Add files via upload

* Update showcase.md

* Update showcase.md

* added safe tensors as weights, and tests fix for that

* safe tensors test

* using safe_load

* using tinygrad functions now to load weights

* update tests

---------

Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
2023-06-16 18:55:19 -07:00
Diogo
2d4370b487 Adds tril & triu support (#936)
* triu & tril support

* lint and kernel count error

* switched shape indicies

* larger shape tests

* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
George Hotz
2c324d0685 fix metal uaf (#964) 2023-06-09 21:28:06 -07:00
Diogo
666d151f8a Onnx slice fixups (#952)
* resolved some slice test errors and added some more debugging logs

* use same device in cumsum

* increased float priority

* onnx debug ouput match input
2023-06-07 19:44:30 -07:00
MohammedAlkhrashi
2b4baa97e9 exclude string type from external_test_onnx_backend.py (#918) 2023-06-03 19:10:52 -07:00
George Hotz
791530045d Refactor LoadOps (#910)
* test

* work

* upd test

* loadops

* cleanups

* real ones

* remove LazyNumpyArray

* fix assign test

* remove range

* np.require

* llama uses arange kernels

* no caching consts

* fix enet

* torch load support

* tests cleanup

* fix shufflenet

* fix image

* fix torch_load test
2023-06-03 09:40:43 -07:00
Diogo
1a5d72f812 Onnx ops And, Or, Xor, Not (#847)
* onnx and, or, xor, not

* added bool type to llvm and clang

* removed float conversion

* switched where op to use tensor func
2023-05-29 11:09:20 -07:00
George Hotz
ddc9dafe62 tighten up the kernel count tests 2023-05-29 08:48:54 -07:00
wozeparrot
7460bd9b02 Add LAMB optimizer (#821)
* feat: initial lamb optimizer

* feat: corrently match tf impl and add test
2023-05-28 15:09:05 -07:00
SnakeOnex
1b337b5533 ONNX tests exclude all unsupported filetype tests (#832) 2023-05-28 13:31:20 -07:00
George Hotz
26014a0fa1 add convtranspose (#809)
* add convtranspose

* onnx convtranspose
2023-05-26 12:35:03 -07:00
wozeparrot
0dc333cfab Promote Embedding to nn (#798)
* feat: promote Embedding to nn

* fix: fix failing test

* feat: add test with jit

* feat: rewrite embedding to no longer need stacked for loops

* clean+fix: don't know how that happened
2023-05-25 18:39:45 -07:00
Diogo
c19ef0fcce Add sin/cos/tan (#794)
* added sin/cos/tan

* fix lint

* added onnx ops support
2023-05-25 09:04:56 -07:00
George Hotz
f2a964f447 nocopy (#764) 2023-05-05 09:32:06 -07:00
George Hotz
f28df9900f multidevice works (#763)
* basic multigpu working

* better multigpu test

* upper

* touchups

* cl sync
2023-05-04 01:04:58 -07:00
George Hotz
7ecf4dff68 multi cl_queue (#762)
* multi cl_queue

* only platforms 1

* gpus first, then cpus

* put device on underlying buffer

* cl_queue array
2023-05-03 12:15:28 -07:00
George Hotz
03b38864db fix batchnorm at training (#753)
* e2e testing

* min failure

* no affine on bn, still fails

* why did i think i could detach that?

* allow more kernels for bn

* some test issue i don't understand
2023-04-19 08:01:04 -07:00
George Hotz
8b7ecd63bb Remove Zeroview (#748)
* no zeroview start

* closer

* stride mask

* st tests pass, delete ZeroView

* byebye zv

* close to working

* not contiguous with mask

* subtract, don't add

* mask on view

* ugh, that shouldn't have been in there

* shape merge

* bugfixes

* fuzzer + 4 fuzzer failures

* fuzzer for symbolic

* more fuzzing and nothing

* that fuzzer doesn't hit either

* fixes padding...ugh

* no more offsets

* working

* rewrite load and store

* all checks

* fix idxs

* progress

* bugfix

* float4_axis

* works

* cleanups

* complex valids_okay
2023-04-17 08:21:46 -07:00
George Hotz
17e37157b6 fix backward convs (#746)
* fix backward convs

* no pushing in reduce

* late cout

* test_fold_4convs_sgd
2023-04-14 10:42:11 -07:00
George Hotz
f7f416d6f4 back to 6 for test_fold_conv_sgd 2023-04-14 07:34:00 -07:00
worldwalker2000
552a048a33 make maximum split the grad like torch when equal (#738)
* make maximum split grad

* added test for maximum split grad when equal

* minor expr simplification

* (2-eq)/2 only once

* update test bc one more sum output child stays
2023-04-14 00:17:46 -07:00
George Hotz
20894991ed good changes from the M1 Tensor Core project (#730)
* good changes

* working except llvm

* llvm types

* nice acc

* archprobe

* lang.float4

* use self.acc for late acc

* fix store bug
2023-03-29 05:11:02 +04:00
George Hotz
1cb5b2d015 test_enet_se 2023-03-24 10:04:30 -07:00
George Hotz
e88b9bfe1e print gflops avg with DEBUG=2 2023-03-23 16:07:08 -07:00
George Hotz
b12b60af20 fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
George Hotz
902906f909 Fix constant folding (#713)
* fix

* codegen

* contiguous is real

* no bufs_to_delete

* don't assign rawconst

* remove neg and not

* need exec to fix custom function jit
2023-03-18 17:52:46 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
George Hotz
37cf6fc4c0 err, external_test_opt.py broke...fusing will have to wait. correctness over speed 2023-03-11 17:54:47 -08:00
George Hotz
61071f881a fix bug, and add unit test to catch failure 2023-03-11 16:57:25 -08:00
George Hotz
3ec457248c failing llama test 2023-03-11 16:28:10 -08:00
George Hotz
01f39b19dc move to shapetracker.py 2023-03-11 07:50:07 -08:00
George Hotz
d7cb8e3e56 multithreaded fake_torch_load_zipped 2023-03-10 19:16:27 -08:00
George Hotz
00641aa45d add challenge tests 2023-03-07 19:39:04 -08:00
George Hotz
4eb880550f enable contract test 2023-03-07 17:32:28 -08:00
George Hotz
b561256a0e allow all reduces (#661)
* allow all reduces

* push permute tests

* explict permute reshape push

* contractw1s
2023-03-07 15:36:01 -08:00
George Hotz
7dbcc26582 fix up external tests 2023-03-06 06:52:28 -08:00
Alex Wang
64ecbd91b5 Refactor contraction and add integration test cases for push permute (#650)
* Refactor contraction and add unit tests

* Fix typo; Fix TestConv.test_elu failure due to some ones in old_shape

* Add push permute test cases

* Fix mypy type annotation check error

* Add contraction unit test; Reshape to higher dimension is not contraction
2023-03-06 06:36:55 -08:00
George Hotz
b1ba78ac38 move applegpu disassembler 2023-03-05 11:21:12 -08:00
George Hotz
16b03f3c3b wow, can't believe that was broken (#642)
* wow, can't believe that was broken

* remove namedtuple comment
2023-03-04 22:28:28 -08:00
George Hotz
4a607f7d65 more ext gpu tests 2023-03-04 21:00:08 -08:00
George Hotz
69198a73d2 test_1x1_24_6 2023-03-04 20:37:46 -08:00
George Hotz
b02a392d69 Improve local (#635)
* local is improving

* local is finding bugs

* new local should work
2023-03-04 09:30:49 -08:00
George Hotz
528cb3b3b9 fix ast test 2023-03-04 07:49:25 -08:00
George Hotz
8919ca8163 test cleanups 2023-03-03 06:36:06 -08:00