* [WIP]: implementation of VITS TTS model
* Implemented VITS model, moved all code to examples/vits.py
* Added support for vctk model, auto download, and cleanups
* Invoke tensor.realize() before measuring inference time
* Added support for mmts-tts model, extracted TextMapper class, cleanups
* Removed IPY dep, added argument parser, cleanups
* Tiny fixes to wav writing
* Simplified the code in a few places, set diff log level for some prints
* Some refactoring, added support for uma_trilingual model (anime girls)
* Fixed bug where embeddings are loaded with same backing tensor, oops
* Added emotional embed support, added cjks + voistock models
- voistock is multilingual model with over 2k anime characters
- cjks is multilingual model with 24 speakers
both are kinda bad for english though :c
* Removed `Tensor.Training=False` (not needed and wrong oop)
* Changed default model and speaker to vctk with speaker 6
* Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy
* Removed accidentally pushed test/spline.py
* Some slight refactors
* Replaced masked_fill with tensor.where
* Added y_length estimating, plus installation instructions, plus some cleanups
* Fix overestimation log message.
* Changed default value of `--estimate_max_y_length` to False
This is only useful for larger inputs.
* Removed printing of the phonemes
* Changed default value of `--text_to_synthesize`
* initial commit
* 81 passing
* 105 passing tests
* 148 passing
* CI tests
* install dep on ci
* try opencl pkgs
* try using vulkan
* down to only 6 failing
* refactor
* cleaning up
* another test skipped due to buffer limit
* linter
* segfault
* indent fix
* another segfault found
* small touchups
* Fix max and maxpool tests
* Add constant folding
* Add javascript export script
* better asserts in codegen
* manual upcasting
* reverted token type change
* skip safetensor test due to unsupported type
* FIx efficientnet and all other model tests
* Remove np copy
* fixed indent and missing import
* manually destroy the buffer
* revert back to length
* linter errors
* removed extra val
* skip broken tests
* skipping more tests
* Make the page pretty
* Save model weights as safetensor
* Fix imagenet to c test
* Fix second imagenet to c bug
* Async and paralel kernel compilation
* workgroup support
* reversed local size
* fixed non local bug
* correct local groups
* ci experiment
* removed typo
* Fix define local by using shared memory
* Refactor
* try running on mac
* match metal tests
* add more workers
* scope down tests
* trying windows runner
* fixed windows env
* see how many it can do
* merged master
* refactor
* missed refactor
* increase test suite coverage
* missing import
* whitespace in test_efficientnet.py
* getting there
* fixed reset
* fixed bufs
* switched to cstyle
* cleanup
* min/max rename
* one more linter issue
* fixed demo
* linter
* testing ci chrome
* add unsafe webgpu arg
* add build step
* remove WEBGPU from cmd line
* use module
* try forcing directx
* trying forced metal backend
* temp disable conv2d for CI
* disable conv_trasnpose2d
---------
Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* Rename in files
* Move files
* Moved to extra/datasets as suggested
* Changes to files
* Fixed stupid mistake
---------
Co-authored-by: terafo <terafo@protonmail.com>
* Fixes + improved test coverage for helpers.py
- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread
* Made `_early_exec_process` catch any Exception
Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`
* Fixed `from tinygrad.helpers import Timing` import
oops, for some reason my IDE cleaned that import from extra/helpers.
* Fixed import in llama.py
Another one that I skipped by accident, mybad
* Extracted a class for tests of early exec
* Normalize line endings, windows uses /r/n
* Made `cross_process` not a daemon
* MaskRCNN weights loading
* backbone maybe works
* backbone works, but resnet body atol 1e-3
* RPN Call, but veryy wrong output
* fixed topk
* RPN maybe works, not sure about nms
* Fix cursed modules
* add back editorconfig
* Full call, wrong output
* Full call works
* fix mask
* use NMS from retinanet
* Removing extra funcs
* refactor
* readable
* Add example to run model
* remove filter
* Fix split, batched inference is worse
* Fix image sizes
* Matching reference
* merge master
* add filter on top detections
* cuda backend fixed
* add model eval and spec
* convert images to rgb
* fix eval
* simplify examples code
* remove extra code
* meshgrid using tinygrad
* removing numpy
* roi align, floor, ceil
* remove numpy from level_mapper
* remove numpy from pooler
* Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference"
This reverts commit 4b95a3cb49, reversing
changes made to 98f2b1fa2e.
* roi align gather
* fix master merge
* revert to old floor, ceil as ints present in domain
* use log2 op
* fix indexes
* weird bug with ints and gpu
* weird bug with ints and gpu
* refactors, add env var for gather
* floor with contiguous, where
* refactor topk, sort
* remove staticmethod
* refactor stride
* remove log2 mlop
* realize -> contiguous
* refactor forward
* remove num_classes, stride_in_1x1 from state
* refactor forward
* refactoring
* flake8
* removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk
* keep using tinygrad for smaller gathers
* fix empty tensors
* comms
* move from tensor.py
* resnet test passing
* add coco dataset back
* fix spaces
* add test for log2
* no need to create Tensors
* no need to create Tensors
---------
Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
* added SPPF module from yolov8
* added conv_block, bottleneck modules
* cleaned modules
* c2f example
* spf changes
* C2f
* fixed and tested bottleneck
* improved detect class
* tested spf and conv
* checked c2f
* DFL structure
* fixed dfl
* added dist2bbox function
* added dist2bbox function
* added and tested make_anchors function for the head
* keeping functions above
* creating the detection head
* fixing head
* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou
* head works
* structure fixx
* added darknet (backbone)
* yolov8 neck, and intialize bias function while detection
* fixed spacing
* yolov8 class, init bias, and fixed c2f
* forward pass almost working
* fixed net structure
* init bias not needed, forward pass working
* load weights boilerplate
* load weights done?
* all variants loading!
* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)
* fix scale_boxes
* box_iou fixed and tested
* created the pre nms function
* fix nms
* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked
* added letterbox and pre_tranform for pre_process function
* fixed letterbox, pre_transform and added preprocess function
* custom NMS done, integrated prepare_boxes and nms, improved box_iou
* added postprocess function till parsing
* added draw_bounding_boxes_and_save function
* testing full flow
* using fetch for class names
* fixed make_anchors + all tinygrad now
* added command line arguments, weight downloading
* single image for now only
* made draw boxes more efficient
* made NMS functions efficient
* made compute_transform better
* v8 working now, inference is done
* prints objects detected in console now
* fixed image loading (pre processing)
* batch post processing
* created initial tests
* fixes bounding box thickness AND added get_detected_classes_with_frequency function
* cleaning for testing
* two tests
* added url option for image, removed need for specifiying arguments
* tests complete, but lots on things are printed on screen by ultralytics
* remove parse arguments
* fixed weight location
* fixed colours of classes, and black font when high brightness
* minor changes
* TODOs for later
* removed use of torch, using .npz weights
* fixed tests
* one path for fetch
* preprocess now in tinygrad, plus test fix for that
* updated tests
* fix tests
* no class labels needed
* Add files via upload
* Update showcase.md
* Update showcase.md
* added safe tensors as weights, and tests fix for that
* safe tensors test
* using safe_load
* using tinygrad functions now to load weights
* update tests
---------
Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
* safetensors test
* safe_save
* load back with real safetensors
* bugfix in device name. add simple torch_load
* it works for llama, but it's slower...
* mmap
* no intermediate
* load mmaped
* readinto speed
* not ready yet
* revert that
make it work out of the box for new users.
the default configuration of train_efficientnet is to use the smaller cifar
dataset. import datasets.imagenet tries to open imagenet_class_index.json
and will fail, unless user has already downloaded it.
* Add ResNet inference test and cannon
* Test with ResNet50
* test_car works with resnet fix
* Add KiTS19 dataset
* KiTS19: Implement iterate
* No batch load for this dataset
* Save results on iterate
* Implement dice score
* Add data prep and eval functions
* Resolve shape issue
* Conversion works but wrong values
* Segfaults when load_from_pretrained is called
* Fix segfault and assign properly
* Final result generated, though very slow
* Store and load final result to save time
* Fix typo in finalize
* Score computes
* More bug fixes, dice score is very low
* Working broken code
* Assign output values to result
* Getting a much higher score now
* Fix dataset preprocessing
* Mean DICE score of 88.5
* Ugh, typo
* Attempt to reimplement model
* Rename layers
* Tiny model works, kinda
* Accuracy? gone
* Implement InstanceNorm and match torch
* Test instance norm 2d and 3d
* Combined input block with downsample block
* Tiny model works, support strided convtranspose
* Commands to download dataset
* Clean up a bit
* unet3d_v2 -> unet3d
* Remove duplicated code
* Oops, put tests back
* add retinanet with resnet backbone
* adds resnext to support loading retinanet pretrained on openimages
* object detection post processing with numpy
* data is downloaded and converted to coco format with fiftyone
* data loading and mAP evaluation with pycocotools
* remove fiftyone dep
* * eval freq
* fix model timing
* del jit for last batch
* faster accumulate
* feat: add mlperf bert model
* feat: switch to nn.Embedding
* clean+fix: fix formatting
* feat: add simple downloader
* feat: metrics
* feat: don't actually need exact match
* feat: doing a run
* feat: set eps on the layernorms
* clean+fix: cleaner impl + hopefully fixed
* feat: move dataset initialization into iterate
* feat: move tokenizer out of iterate
* clean+fix: cleaner + working
* clean: cleanup
* fix: fix metrics
* feat: need to use original bert gelu + download vocab
* feat: make directory if it doesn't exist yet
* feat: jit go brrr
* feat: promote Embedding to nn
* fix: fix failing test
* feat: add test with jit
* feat: rewrite embedding to no longer need stacked for loops
* clean+fix: don't know how that happened
* feat: initial rnn-t
* feat: working with BS>1
* feat: add lstm test
* feat: test passing hidden
* clean: cleanup
* feat: specify start
* feat: way faster lstm & model
* fix: default batch size
* feat: optimization
* fix: fix metrics
* fix: fix feature splicing
* feat: cleaner stacktime
* clean: remove unused import
* clean: remove extra prints
* fix: fix tests and happy llvm
* feat: have the librispeech dataset in its own dir
* clean: unused variable
* feat: no longer need numpy for the embedding + slightly more memory efficient lstm
* fix: forgot to remove something that broke tests
* feat: use relative paths
* feat: even faster
* feat: remove pointless transposes in StackTime
* fix: correct forward
* feat: switch to soundfile for loading and fix some leaks
* feat: add comment about initial dataset setup
* feat: jit more things
* feat: default batch size back to 1
larger than 1 is broken again :(
and even in the reference implementation it gives worse results