Commit Graph

1207 Commits

Author SHA1 Message Date
George Hotz
6ec0a24706 imagenet eval in 1 min 28 sec 2023-06-28 04:23:26 +00:00
Rayan Hatout
65cbaa3429 no need to slice A and B twice in LLaMa complex multiplication (#1054) 2023-06-26 14:42:58 -07:00
Kunwar Raj Singh
5d3310ce56 MaskRCNN Inference (#884)
* MaskRCNN weights loading

* backbone maybe works

* backbone works, but resnet body atol 1e-3

* RPN Call, but veryy wrong output

* fixed topk

* RPN maybe works, not sure about nms

* Fix cursed modules

* add back editorconfig

* Full call, wrong output

* Full call works

* fix mask

* use NMS from retinanet

* Removing extra funcs

* refactor

* readable

* Add example to run model

* remove filter

* Fix split, batched inference is worse

* Fix image sizes

* Matching reference

* merge master

* add filter on top detections

* cuda backend fixed

* add model eval and spec

* convert images to rgb

* fix eval

* simplify examples code

* remove extra code

* meshgrid using tinygrad

* removing numpy

* roi align, floor, ceil

* remove numpy from level_mapper

* remove numpy from pooler

* Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference"

This reverts commit 4b95a3cb49, reversing
changes made to 98f2b1fa2e.

* roi align gather

* fix master merge

* revert to old floor, ceil as ints present in domain

* use log2 op

* fix indexes

* weird bug with ints and gpu

* weird bug with ints and gpu

* refactors, add env var for gather

* floor with contiguous, where

* refactor topk, sort

* remove staticmethod

* refactor stride

* remove log2 mlop

* realize -> contiguous

* refactor forward

* remove num_classes, stride_in_1x1 from state

* refactor forward

* refactoring

* flake8

* removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk

* keep using tinygrad for smaller gathers

* fix empty tensors

* comms

* move from tensor.py

* resnet test passing

* add coco dataset back

* fix spaces

* add test for log2

* no need to create Tensors

* no need to create Tensors

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-06-25 15:37:51 -07:00
Yair Lifshitz
7f73d6a4da Fix input path in examples/compile_efficientnet.py, examples/efficientnet.py. (#1034) 2023-06-23 16:34:33 -07:00
Yann Huynh
ccb51ff5b0 "Fixed argument passing in example yolov8" (#1004)
"Fixed argument passing in example yolov8"
2023-06-18 14:29:39 -07:00
sehaj
775287ed91 Add yolov8 implementation (#806)
* added SPPF module from yolov8

* added conv_block, bottleneck modules

* cleaned modules

* c2f example

* spf changes

* C2f

* fixed and tested bottleneck

* improved detect class

* tested spf and conv

* checked c2f

* DFL structure

* fixed dfl

* added dist2bbox function

* added dist2bbox function

* added and tested make_anchors function for the head

* keeping functions above

* creating the detection head

* fixing head

* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou

* head works

* structure fixx

* added darknet (backbone)

* yolov8 neck, and intialize bias function while detection

* fixed spacing

* yolov8 class, init bias, and fixed c2f

* forward pass almost working

* fixed net structure

* init bias not needed, forward pass working

* load weights boilerplate

* load weights done?

* all variants loading!

* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)

* fix scale_boxes

* box_iou fixed and tested

* created the pre nms function

* fix nms

* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked

* added letterbox and pre_tranform for pre_process function

* fixed letterbox, pre_transform and added preprocess function

* custom NMS done, integrated prepare_boxes and nms, improved box_iou

* added postprocess function till parsing

* added draw_bounding_boxes_and_save function

* testing full flow

* using fetch for class names

* fixed make_anchors + all tinygrad now

* added command line arguments, weight downloading

* single image for now only

* made draw boxes more efficient

* made NMS functions efficient

* made compute_transform better

* v8 working now, inference is done

* prints objects detected in console now

* fixed image loading (pre processing)

* batch post processing

* created initial tests

* fixes bounding box thickness AND added get_detected_classes_with_frequency function

* cleaning for testing

* two tests

* added url option for image, removed need for specifiying arguments

* tests complete, but lots on things are printed on screen by ultralytics

* remove parse arguments

* fixed weight location

* fixed colours of classes, and black font when high brightness

* minor changes

* TODOs for later

* removed use of torch, using .npz weights

* fixed tests

* one path for fetch

* preprocess now in tinygrad, plus test fix for that

* updated tests

* fix tests

* no class labels needed

* Add files via upload

* Update showcase.md

* Update showcase.md

* added safe tensors as weights, and tests fix for that

* safe tensors test

* using safe_load

* using tinygrad functions now to load weights

* update tests

---------

Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
2023-06-16 18:55:19 -07:00
Diogo
2d4370b487 Adds tril & triu support (#936)
* triu & tril support

* lint and kernel count error

* switched shape indicies

* larger shape tests

* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
cloud11665
e8a23d4331 there is a better way to do that! (#950) 2023-06-06 15:23:30 -07:00
Diogo
3bb38c3518 limit split to 1 due to windows path containing : (#944) 2023-06-06 10:27:54 -07:00
George Hotz
b78addf2f8 Whisper (#919)
* no whispering yet

* whispering

* live whisper

* small support
2023-06-03 18:55:14 -07:00
George Hotz
ed1963b899 Fast DiskTensor to other Tensor (#916)
* make disktensors fast

* loading

* loader for sd and llama
2023-06-03 12:25:41 -07:00
George Hotz
d58586bb17 safetensors! (#903)
* safetensors test

* safe_save

* load back with real safetensors

* bugfix in device name. add simple torch_load

* it works for llama, but it's slower...

* mmap

* no intermediate

* load mmaped

* readinto speed

* not ready yet

* revert that
2023-06-02 13:41:09 -07:00
George Hotz
8a928ed2f3 nn init matches torch (#901) 2023-06-01 21:24:11 -07:00
Peter Ross
27845fd3a3 train_efficientnet: only import datasets.imagenet when IMAGENET is set (#899)
make it work out of the box for new users.

the default configuration of train_efficientnet is to use the smaller cifar
dataset. import datasets.imagenet tries to open imagenet_class_index.json
and will fail, unless user has already downloaded it.
2023-06-01 19:19:52 -07:00
George Hotz
1b42b4e1b8 fix examples/hlb_cifar10.py 2023-06-01 19:03:17 -07:00
wozeparrot
0fc4cf72a2 feat: add train scaffolding (#859) 2023-05-30 07:10:40 -07:00
Jacky Lee
5d212864b5 Add MLPerf UNet3D model (#775)
* Add ResNet inference test and cannon

* Test with ResNet50

* test_car works with resnet fix

* Add KiTS19 dataset

* KiTS19: Implement iterate

* No batch load for this dataset

* Save results on iterate

* Implement dice score

* Add data prep and eval functions

* Resolve shape issue

* Conversion works but wrong values

* Segfaults when load_from_pretrained is called

* Fix segfault and assign properly

* Final result generated, though very slow

* Store and load final result to save time

* Fix typo in finalize

* Score computes

* More bug fixes, dice score is very low

* Working broken code

* Assign output values to result

* Getting a much higher score now

* Fix dataset preprocessing

* Mean DICE score of 88.5

* Ugh, typo

* Attempt to reimplement model

* Rename layers

* Tiny model works, kinda

* Accuracy? gone

* Implement InstanceNorm and match torch

* Test instance norm 2d and 3d

* Combined input block with downsample block

* Tiny model works, support strided convtranspose

* Commands to download dataset

* Clean up a bit

* unet3d_v2 -> unet3d

* Remove duplicated code

* Oops, put tests back
2023-05-28 20:38:19 -07:00
Sohaib
65d09031f2 add retinanet with resnet backbone (#813)
* add retinanet with resnet backbone

* adds resnext to support loading retinanet pretrained on openimages
* object detection post processing with numpy
* data is downloaded and converted to coco format with fiftyone
* data loading and mAP evaluation with pycocotools

* remove fiftyone dep

* * eval freq

* fix model timing

* del jit for last batch

* faster accumulate
2023-05-28 20:20:16 -07:00
wozeparrot
67de3aa1de Add mlperf bert model (#803)
* feat: add mlperf bert model

* feat: switch to nn.Embedding

* clean+fix: fix formatting

* feat: add simple downloader

* feat: metrics

* feat: don't actually need exact match

* feat: doing a run

* feat: set eps on the layernorms

* clean+fix: cleaner impl + hopefully fixed

* feat: move dataset initialization into iterate

* feat: move tokenizer out of iterate

* clean+fix: cleaner + working

* clean: cleanup

* fix: fix metrics

* feat: need to use original bert gelu + download vocab

* feat: make directory if it doesn't exist yet

* feat: jit go brrr
2023-05-27 14:53:32 -07:00
wozeparrot
0dc333cfab Promote Embedding to nn (#798)
* feat: promote Embedding to nn

* fix: fix failing test

* feat: add test with jit

* feat: rewrite embedding to no longer need stacked for loops

* clean+fix: don't know how that happened
2023-05-25 18:39:45 -07:00
George Hotz
a968c4c3a4 Cleanup mlperf (#797)
* improve factorization

* cleanups
2023-05-25 11:36:43 -07:00
wozeparrot
01ae45a43c Add mlperf RNN-T model (#782)
* feat: initial rnn-t

* feat: working with BS>1

* feat: add lstm test

* feat: test passing hidden

* clean: cleanup

* feat: specify start

* feat: way faster lstm & model

* fix: default batch size

* feat: optimization

* fix: fix metrics

* fix: fix feature splicing

* feat: cleaner stacktime

* clean: remove unused import

* clean: remove extra prints

* fix: fix tests and happy llvm

* feat: have the librispeech dataset in its own dir

* clean: unused variable

* feat: no longer need numpy for the embedding + slightly more memory efficient lstm

* fix: forgot to remove something that broke tests

* feat: use relative paths

* feat: even faster

* feat: remove pointless transposes in StackTime

* fix: correct forward

* feat: switch to soundfile for loading and fix some leaks

* feat: add comment about initial dataset setup

* feat: jit more things

* feat: default batch size back to 1

larger than 1 is broken again :(
and even in the reference implementation it gives worse results
2023-05-25 00:41:21 -07:00
George Hotz
e0b2035023 fast imagenet eval, gets 76.14% across the set 2023-05-13 21:18:31 -07:00
George Hotz
b705510d5c getting 77% on imagenet eval 2023-05-13 07:46:27 -07:00
George Hotz
810f03dafa conv3d + unet3d (#772)
* conv3d, needs test

* test passes, padding wrong on unet

* unet3d

* no conv3d on images
2023-05-12 13:54:07 -07:00
George Hotz
46d419060b start on mlperf models 2023-05-10 16:30:49 -07:00
Jacky Lee
d13629cb26 ResNet: match implementation with Nvidia and PyTorch (#770)
* Match ResNet implementation with pytorch and nvidia

* Reduce number of Epochs
2023-05-10 09:01:22 -07:00
George Hotz
e4db0c820f hlb_cifar10 init from torch weights 2023-04-18 19:09:13 -07:00
George Hotz
732884653c osx in hlb_cifar10_torch 2023-04-14 13:12:08 -07:00
George Hotz
584ee6f616 don't graph consts 2023-04-14 03:32:20 -07:00
George Hotz
9a39ebefde hlb_cifar10_torch gets 80% 2023-04-14 02:47:03 -07:00
Jacky Lee
06ed958abd Fix train_resnet example (#744)
* Fix ResNet example

* Scientific notation
2023-04-12 13:48:39 +05:30
Jacky Lee
7a45b989a1 Device: make GPU default and METAL/CUDA if possible (#732)
* Make GPU the default device

* Compile EfficientNet with CPU

* don't print device

* use METAL and CUDA if possible

* Revert some changes to workflow

* Fix import error when checking device availability

* device lookup is now optional

* hopefully fix linter and tests

* fix workflow

* Skip device if not available

* don't change default if CPU=1

* simplify device selection

* Default to CPU if no GPU

* don't print device name...

* No need to change default in llama

* Make GPU the default device

* Compile EfficientNet with CPU

* don't print device

* use METAL and CUDA if possible

* Revert some changes to workflow

* Fix import error when checking device availability

* device lookup is now optional

* hopefully fix linter and tests

* fix workflow

* Skip device if not available

* don't change default if CPU=1

* simplify device selection

* Default to CPU if no GPU

* don't print device name...

* No need to change default in llama

* run github workflow

* Fix logic to select default

* pass if an error occurs

* use separate function for try except
2023-04-04 09:41:52 +05:30
Jacky Lee
156640e90d Permute examples (#731)
* examples: use permute instead of transpose

* Use transpose but change args
2023-03-29 05:07:06 +04:00
George Hotz
b12b60af20 fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
Fernando Vidal
73bd0b217b add int64 as supported dtype from numpy (#699)
* add int64 as supported dtype from numpy

Without this, examples/transformer.py didn't run. With this change it runs successfully.

* Update helpers.py

* Update transformer.py

* Update training.py
2023-03-18 17:15:04 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
Kirill
26a3888ab8 Fix llama 13B RAM usage (#710) 2023-03-18 13:50:09 -07:00
Kirill
0fe5014b1f Use pathlib (#711)
* Use pathlib in llama

* Use pathlib in stablediffusion
2023-03-18 13:49:21 -07:00
Kirill
0532025b04 Fix llama 13B weights loading (#700)
* Fix llama 13B weights loading

* refactor more

* add test

* test storage offset

* fix spacing

* fix strides

* llama 13B working?

* yolo?

* better test for seeks
2023-03-15 08:59:52 -07:00
Ayushman Kumar
e28bd11ff1 Cast Tensor data to float32 (#703)
* Cast Tensor data to float32

* astype('float32') --> Tensor.randn()
2023-03-14 23:09:41 -07:00
Jacky Lee
5e820818e9 Cast image to float32 (#702) 2023-03-14 08:13:19 -07:00
George Hotz
fe0e8a306f jittable llama 2023-03-12 14:15:04 -07:00
George Hotz
15e0b56e39 compile works (#688)
* compile works

* runtimes

* line count

* fix custom, to tg dtype

* meh, that's fine with lazy import
2023-03-12 11:01:25 -07:00
Kirill
af7745073f Add comments to SD (#686)
* Add explanation for empty lambdas

* Fix my_unpickle if pytorch_lightning is installed

* oops
2023-03-12 10:56:49 -07:00
George Hotz
046b3952c3 get_state_dict 2023-03-11 23:46:53 -08:00
George Hotz
803b0aef28 track memory for numpy/torch 2023-03-11 20:39:10 -08:00
George Hotz
61071f881a fix bug, and add unit test to catch failure 2023-03-11 16:57:25 -08:00
George Hotz
3ec457248c failing llama test 2023-03-11 16:28:10 -08:00
George Hotz
8aa63847c7 llama: up max tokens to 1000 2023-03-11 13:39:33 -08:00