Commit Graph

342 Commits

Author SHA1 Message Date
George Hotz
67781fcf5d fix fail fast in CI 2023-08-05 10:24:24 -07:00
Felix
97a6029cf7 Corrected a few misspelled words (#1435) 2023-08-04 16:51:08 -07:00
ian
c08ed1949f Fix plt output comment (#1428) 2023-08-03 23:35:52 -07:00
Paolo Gavazzi
9ffa1eb7e2 Removed dep of torch, torchaudio, kept librosa only (#1264) 2023-08-02 13:52:04 -04:00
Diogo
4dc8595069 simple exporting models (#1344)
* unified exporting

* json exporting

* ignore more

* simplified buffer export

* added dtypes

* added assert

* swift example

* fix tests

* linter

* remove whitespace

* fixed tests

* remove swift example

* remove unintended changes

* allow callable models to be used

* whitespace

* more readable json export

* name change

* whitespace

* whitespace
2023-08-01 09:35:48 -07:00
George Hotz
f27df835a6 delete dead stuff (#1382)
* delete bpe from repo

* remove yolo examples

* Revert "remove yolo examples"

This reverts commit cd1f49d466.

* no windows
2023-07-31 11:17:49 -07:00
George Hotz
37fa7e96fb Revert "update editorconfig, enforce via CI (#1343)" (#1380)
This reverts commit da2efecbe2.
2023-07-31 10:35:50 -07:00
Pavol Rusnak
da2efecbe2 update editorconfig, enforce via CI (#1343)
* update editorconfig to set unix-style newlines and trim whitespace

* add editorconfig github action to the CI

* fix whitespace
2023-07-30 18:44:30 -07:00
Francis Lam
9d142430cb Add option in llama.py to quantize weights to int8 at runtime (#1289)
* Add option in llama.py to quantize weights to int8 at runtime

Also added lm-eval to external

* Add support for llama-2 evaluation
2023-07-24 17:22:38 -07:00
Pavol Rusnak
cd60b8561c Add LLaMA-2 support (#1284)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2023-07-24 17:12:02 -04:00
Giles Bathgate
c4238b4ea0 Fix discriminator balancing in mnist_gan example (#1332) 2023-07-23 12:43:05 -07:00
Cole Sutyak
2d4e182294 change fetch to allow for local file selection (#1309) 2023-07-23 15:00:16 -04:00
Maxim Zakharov
48c4df1263 fix: prevent infinite "loading..." state (#1319)
* demo somewhy doesn't work on my device and throw eror "Error: GPUPipelineError: [Invalid ShaderModule] is invalid" inside setupNet func
* because of that, JS halts the execution of the rest of the code below and on the screen we see "loading..." forever
* added try catch here to communicate about the error in a proper way
2023-07-21 14:01:53 -07:00
Jacob Pradels
b112edd2c3 Add pylint trailing whitespace rule (#1314) 2023-07-21 13:37:55 -04:00
George Hotz
bfbb8d3d0f fix ones, BS=2 stable diffusion, caching optimizer (#1312)
* fix ones, BS=2 stable diffusion

* caching optimizer

* print search time

* minor bug fix
2023-07-21 09:55:49 -07:00
madt2709
d2c1e8409a Update arange to be (start, stop, step) (#1308) 2023-07-21 00:27:23 -04:00
George Hotz
f45013f0a3 stable diffusion: remove realizes we don't need 2023-07-20 19:53:07 -07:00
George Hotz
b58dd015e3 stable diffusion: remove import numpy as np 2023-07-20 19:35:44 -07:00
George Hotz
35bc46289c stable diffusion: use new tinygrad primitives 2023-07-20 19:25:49 -07:00
Stan
0a3d4f8103 Implementation of VITS TTS model (#1188)
* [WIP]: implementation of VITS TTS model

* Implemented VITS model, moved all code to examples/vits.py

* Added support for vctk model, auto download, and cleanups

* Invoke tensor.realize() before measuring inference time

* Added support for mmts-tts model, extracted TextMapper class, cleanups

* Removed IPY dep, added argument parser, cleanups

* Tiny fixes to wav writing

* Simplified the code in a few places, set diff log level for some prints

* Some refactoring, added support for uma_trilingual model (anime girls)

* Fixed bug where embeddings are loaded with same backing tensor, oops

* Added emotional embed support, added cjks + voistock models

- voistock is multilingual model with over 2k anime characters
- cjks is multilingual model with 24 speakers

both are kinda bad for english though :c

* Removed `Tensor.Training=False` (not needed and wrong oop)

* Changed default model and speaker to vctk with speaker 6

* Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy

* Removed accidentally pushed test/spline.py

* Some slight refactors

* Replaced masked_fill with tensor.where

* Added y_length estimating, plus installation instructions, plus some cleanups

* Fix overestimation log message.

* Changed default value of `--estimate_max_y_length` to False

This is only useful for larger inputs.

* Removed printing of the phonemes

* Changed default value of `--text_to_synthesize`
2023-07-20 17:37:14 -07:00
George Hotz
f7b0320d8b add cifar training regression test (#1287)
* add cifar training regression test

* clean up print
2023-07-19 14:17:09 -07:00
Francis Lam
3db57d3118 Fix llama.py to load and concatenate 13B, 30B, and 65B models (#1275) 2023-07-19 13:22:33 -04:00
Yixiang Gao
a8f2c16f8e add contiguous (#1246) 2023-07-15 08:36:34 -07:00
Diogo
a9a1df785f Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00
Hey
4f72eb823c Outdated repository URL (#1218)
* Update outdated repo url

* Update more outdated repo url's
2023-07-11 23:14:19 -07:00
AN Long
f75de602df fix typo in stable diffusion example (#1219) 2023-07-11 15:26:40 -07:00
Stan
f40f8cd055 Initialise numpy arrays as float32 in DDPG (#1171)
float64 is not supported by tinygrad
2023-07-07 12:05:31 -07:00
terafo
aa60feda48 Fix naming conflict with huggingface datasets (#1161)
* Rename in files

* Move files

* Moved to extra/datasets as suggested

* Changes to files

* Fixed stupid mistake

---------

Co-authored-by: terafo <terafo@protonmail.com>
2023-07-07 10:43:44 -07:00
Stan
9b6e57eccd helpers.py: improved test coverage + exception handling (#1165)
* Fixes + improved test coverage for helpers.py

- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread

* Made `_early_exec_process` catch any Exception

 Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`

* Fixed `from tinygrad.helpers import Timing` import

oops, for some reason my IDE cleaned that import from extra/helpers.

* Fixed import in llama.py

Another one that I skipped by accident, mybad

* Extracted a class for tests of early exec

* Normalize line endings, windows uses /r/n

* Made `cross_process` not a daemon
2023-07-07 10:26:05 -07:00
Kunwar Raj Singh
8391648822 Over 90% on CIFAR with examples/hlb_cifar10.py (#1073)
* fix eval, lr decay, best eval

* 82.27

* 82.64

* 82.79, reproducable

* add lr sched, 85.26

* 87.42

* 87.94

* 87.42

* tta with flip

* training flip aug

* refactor

* using Tensor for LR is faster

* 89.5

* refactor, flip only train set

* 90.01

* 90.64

* eval jit

* refactor

* only JIT model

* fix eval JIT

* fix eval JIT

* 90.82

* STEPS=900 reaches 90.22

* TTA envvar

* TTA default 0

* fully jit training

* refactor optim

* fix sched

* add label smoothing

* param changes

* patial gelu

* OneCycle with pause

* gelu maybe works

* 90.12

* remove pause lr

* maybe fix lr schedulers

* scheduler test passing

* comments

* try mixup

* shuffle!

* add back the missing last eval

* fix shuffle bugs

* add mixup prob

* fix mixup prob

* 90.19

* correct mixup

* correct mixup

* correct mixup

* 90.24

* 90.33

* refactor, add type hints

* add gradient clipping

* maybe fix test

* full JIT

* back to relu for now

* pass mixup prob as param

* add typehints

* maybe CI works

* try erf gelu

* CI, types

* remove useless import/

* refactor optim

* refactor optim

* try leakyrelu

* try celu

* gelu

* 90.67

* remove grad clip

* remove grad clip tests

* revert params

* add test for OneCycleLR

* 90.62

* fix eval timing

* fix eval timing again

* so where i calculate mixup_prob matters

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-06 20:46:22 -07:00
nimlgen
d363d25ee2 fix imports for examples/transformer.py (#1136) 2023-07-05 08:15:13 -07:00
George Hotz
87d21ea979 examples: simple conv bn 2023-07-04 13:50:26 -07:00
Reza Rezvan
8ae9a054ae Refactor nn.optim (#1091)
* Refactor: nn.optim.py

* Refactor: nn.optim.py; Fix all tests

* Refactor: Replace all optim.get_parameters()

* Refactor: Revert list comp.

* Refactor: Replace optim.get_state_dict

* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
Eli Frigo
10f1aeb144 fixed broken link (#1097) 2023-07-02 15:06:59 -07:00
nmarwell26
12ce68c1ee Renamed examples/yolo to examples/vgg7_helpers because that directory contains no yolo-related code and only helper code for vgg7. This was confusing to a new user when trying to understand the examples. (#1086) 2023-07-01 12:04:28 -07:00
George Hotz
6ec0a24706 imagenet eval in 1 min 28 sec 2023-06-28 04:23:26 +00:00
Rayan Hatout
65cbaa3429 no need to slice A and B twice in LLaMa complex multiplication (#1054) 2023-06-26 14:42:58 -07:00
Kunwar Raj Singh
5d3310ce56 MaskRCNN Inference (#884)
* MaskRCNN weights loading

* backbone maybe works

* backbone works, but resnet body atol 1e-3

* RPN Call, but veryy wrong output

* fixed topk

* RPN maybe works, not sure about nms

* Fix cursed modules

* add back editorconfig

* Full call, wrong output

* Full call works

* fix mask

* use NMS from retinanet

* Removing extra funcs

* refactor

* readable

* Add example to run model

* remove filter

* Fix split, batched inference is worse

* Fix image sizes

* Matching reference

* merge master

* add filter on top detections

* cuda backend fixed

* add model eval and spec

* convert images to rgb

* fix eval

* simplify examples code

* remove extra code

* meshgrid using tinygrad

* removing numpy

* roi align, floor, ceil

* remove numpy from level_mapper

* remove numpy from pooler

* Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference"

This reverts commit 4b95a3cb49, reversing
changes made to 98f2b1fa2e.

* roi align gather

* fix master merge

* revert to old floor, ceil as ints present in domain

* use log2 op

* fix indexes

* weird bug with ints and gpu

* weird bug with ints and gpu

* refactors, add env var for gather

* floor with contiguous, where

* refactor topk, sort

* remove staticmethod

* refactor stride

* remove log2 mlop

* realize -> contiguous

* refactor forward

* remove num_classes, stride_in_1x1 from state

* refactor forward

* refactoring

* flake8

* removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk

* keep using tinygrad for smaller gathers

* fix empty tensors

* comms

* move from tensor.py

* resnet test passing

* add coco dataset back

* fix spaces

* add test for log2

* no need to create Tensors

* no need to create Tensors

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-06-25 15:37:51 -07:00
Yair Lifshitz
7f73d6a4da Fix input path in examples/compile_efficientnet.py, examples/efficientnet.py. (#1034) 2023-06-23 16:34:33 -07:00
Yann Huynh
ccb51ff5b0 "Fixed argument passing in example yolov8" (#1004)
"Fixed argument passing in example yolov8"
2023-06-18 14:29:39 -07:00
sehaj
775287ed91 Add yolov8 implementation (#806)
* added SPPF module from yolov8

* added conv_block, bottleneck modules

* cleaned modules

* c2f example

* spf changes

* C2f

* fixed and tested bottleneck

* improved detect class

* tested spf and conv

* checked c2f

* DFL structure

* fixed dfl

* added dist2bbox function

* added dist2bbox function

* added and tested make_anchors function for the head

* keeping functions above

* creating the detection head

* fixing head

* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou

* head works

* structure fixx

* added darknet (backbone)

* yolov8 neck, and intialize bias function while detection

* fixed spacing

* yolov8 class, init bias, and fixed c2f

* forward pass almost working

* fixed net structure

* init bias not needed, forward pass working

* load weights boilerplate

* load weights done?

* all variants loading!

* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)

* fix scale_boxes

* box_iou fixed and tested

* created the pre nms function

* fix nms

* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked

* added letterbox and pre_tranform for pre_process function

* fixed letterbox, pre_transform and added preprocess function

* custom NMS done, integrated prepare_boxes and nms, improved box_iou

* added postprocess function till parsing

* added draw_bounding_boxes_and_save function

* testing full flow

* using fetch for class names

* fixed make_anchors + all tinygrad now

* added command line arguments, weight downloading

* single image for now only

* made draw boxes more efficient

* made NMS functions efficient

* made compute_transform better

* v8 working now, inference is done

* prints objects detected in console now

* fixed image loading (pre processing)

* batch post processing

* created initial tests

* fixes bounding box thickness AND added get_detected_classes_with_frequency function

* cleaning for testing

* two tests

* added url option for image, removed need for specifiying arguments

* tests complete, but lots on things are printed on screen by ultralytics

* remove parse arguments

* fixed weight location

* fixed colours of classes, and black font when high brightness

* minor changes

* TODOs for later

* removed use of torch, using .npz weights

* fixed tests

* one path for fetch

* preprocess now in tinygrad, plus test fix for that

* updated tests

* fix tests

* no class labels needed

* Add files via upload

* Update showcase.md

* Update showcase.md

* added safe tensors as weights, and tests fix for that

* safe tensors test

* using safe_load

* using tinygrad functions now to load weights

* update tests

---------

Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
2023-06-16 18:55:19 -07:00
Diogo
2d4370b487 Adds tril & triu support (#936)
* triu & tril support

* lint and kernel count error

* switched shape indicies

* larger shape tests

* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
cloud11665
e8a23d4331 there is a better way to do that! (#950) 2023-06-06 15:23:30 -07:00
Diogo
3bb38c3518 limit split to 1 due to windows path containing : (#944) 2023-06-06 10:27:54 -07:00
George Hotz
b78addf2f8 Whisper (#919)
* no whispering yet

* whispering

* live whisper

* small support
2023-06-03 18:55:14 -07:00
George Hotz
ed1963b899 Fast DiskTensor to other Tensor (#916)
* make disktensors fast

* loading

* loader for sd and llama
2023-06-03 12:25:41 -07:00
George Hotz
d58586bb17 safetensors! (#903)
* safetensors test

* safe_save

* load back with real safetensors

* bugfix in device name. add simple torch_load

* it works for llama, but it's slower...

* mmap

* no intermediate

* load mmaped

* readinto speed

* not ready yet

* revert that
2023-06-02 13:41:09 -07:00
George Hotz
8a928ed2f3 nn init matches torch (#901) 2023-06-01 21:24:11 -07:00
Peter Ross
27845fd3a3 train_efficientnet: only import datasets.imagenet when IMAGENET is set (#899)
make it work out of the box for new users.

the default configuration of train_efficientnet is to use the smaller cifar
dataset. import datasets.imagenet tries to open imagenet_class_index.json
and will fail, unless user has already downloaded it.
2023-06-01 19:19:52 -07:00
George Hotz
1b42b4e1b8 fix examples/hlb_cifar10.py 2023-06-01 19:03:17 -07:00