* demo somewhy doesn't work on my device and throw eror "Error: GPUPipelineError: [Invalid ShaderModule] is invalid" inside setupNet func
* because of that, JS halts the execution of the rest of the code below and on the screen we see "loading..." forever
* added try catch here to communicate about the error in a proper way
* [WIP]: implementation of VITS TTS model
* Implemented VITS model, moved all code to examples/vits.py
* Added support for vctk model, auto download, and cleanups
* Invoke tensor.realize() before measuring inference time
* Added support for mmts-tts model, extracted TextMapper class, cleanups
* Removed IPY dep, added argument parser, cleanups
* Tiny fixes to wav writing
* Simplified the code in a few places, set diff log level for some prints
* Some refactoring, added support for uma_trilingual model (anime girls)
* Fixed bug where embeddings are loaded with same backing tensor, oops
* Added emotional embed support, added cjks + voistock models
- voistock is multilingual model with over 2k anime characters
- cjks is multilingual model with 24 speakers
both are kinda bad for english though :c
* Removed `Tensor.Training=False` (not needed and wrong oop)
* Changed default model and speaker to vctk with speaker 6
* Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy
* Removed accidentally pushed test/spline.py
* Some slight refactors
* Replaced masked_fill with tensor.where
* Added y_length estimating, plus installation instructions, plus some cleanups
* Fix overestimation log message.
* Changed default value of `--estimate_max_y_length` to False
This is only useful for larger inputs.
* Removed printing of the phonemes
* Changed default value of `--text_to_synthesize`
* initial commit
* 81 passing
* 105 passing tests
* 148 passing
* CI tests
* install dep on ci
* try opencl pkgs
* try using vulkan
* down to only 6 failing
* refactor
* cleaning up
* another test skipped due to buffer limit
* linter
* segfault
* indent fix
* another segfault found
* small touchups
* Fix max and maxpool tests
* Add constant folding
* Add javascript export script
* better asserts in codegen
* manual upcasting
* reverted token type change
* skip safetensor test due to unsupported type
* FIx efficientnet and all other model tests
* Remove np copy
* fixed indent and missing import
* manually destroy the buffer
* revert back to length
* linter errors
* removed extra val
* skip broken tests
* skipping more tests
* Make the page pretty
* Save model weights as safetensor
* Fix imagenet to c test
* Fix second imagenet to c bug
* Async and paralel kernel compilation
* workgroup support
* reversed local size
* fixed non local bug
* correct local groups
* ci experiment
* removed typo
* Fix define local by using shared memory
* Refactor
* try running on mac
* match metal tests
* add more workers
* scope down tests
* trying windows runner
* fixed windows env
* see how many it can do
* merged master
* refactor
* missed refactor
* increase test suite coverage
* missing import
* whitespace in test_efficientnet.py
* getting there
* fixed reset
* fixed bufs
* switched to cstyle
* cleanup
* min/max rename
* one more linter issue
* fixed demo
* linter
* testing ci chrome
* add unsafe webgpu arg
* add build step
* remove WEBGPU from cmd line
* use module
* try forcing directx
* trying forced metal backend
* temp disable conv2d for CI
* disable conv_trasnpose2d
---------
Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* Rename in files
* Move files
* Moved to extra/datasets as suggested
* Changes to files
* Fixed stupid mistake
---------
Co-authored-by: terafo <terafo@protonmail.com>
* Fixes + improved test coverage for helpers.py
- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread
* Made `_early_exec_process` catch any Exception
Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`
* Fixed `from tinygrad.helpers import Timing` import
oops, for some reason my IDE cleaned that import from extra/helpers.
* Fixed import in llama.py
Another one that I skipped by accident, mybad
* Extracted a class for tests of early exec
* Normalize line endings, windows uses /r/n
* Made `cross_process` not a daemon
* MaskRCNN weights loading
* backbone maybe works
* backbone works, but resnet body atol 1e-3
* RPN Call, but veryy wrong output
* fixed topk
* RPN maybe works, not sure about nms
* Fix cursed modules
* add back editorconfig
* Full call, wrong output
* Full call works
* fix mask
* use NMS from retinanet
* Removing extra funcs
* refactor
* readable
* Add example to run model
* remove filter
* Fix split, batched inference is worse
* Fix image sizes
* Matching reference
* merge master
* add filter on top detections
* cuda backend fixed
* add model eval and spec
* convert images to rgb
* fix eval
* simplify examples code
* remove extra code
* meshgrid using tinygrad
* removing numpy
* roi align, floor, ceil
* remove numpy from level_mapper
* remove numpy from pooler
* Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference"
This reverts commit 4b95a3cb49, reversing
changes made to 98f2b1fa2e.
* roi align gather
* fix master merge
* revert to old floor, ceil as ints present in domain
* use log2 op
* fix indexes
* weird bug with ints and gpu
* weird bug with ints and gpu
* refactors, add env var for gather
* floor with contiguous, where
* refactor topk, sort
* remove staticmethod
* refactor stride
* remove log2 mlop
* realize -> contiguous
* refactor forward
* remove num_classes, stride_in_1x1 from state
* refactor forward
* refactoring
* flake8
* removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk
* keep using tinygrad for smaller gathers
* fix empty tensors
* comms
* move from tensor.py
* resnet test passing
* add coco dataset back
* fix spaces
* add test for log2
* no need to create Tensors
* no need to create Tensors
---------
Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
* added SPPF module from yolov8
* added conv_block, bottleneck modules
* cleaned modules
* c2f example
* spf changes
* C2f
* fixed and tested bottleneck
* improved detect class
* tested spf and conv
* checked c2f
* DFL structure
* fixed dfl
* added dist2bbox function
* added dist2bbox function
* added and tested make_anchors function for the head
* keeping functions above
* creating the detection head
* fixing head
* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou
* head works
* structure fixx
* added darknet (backbone)
* yolov8 neck, and intialize bias function while detection
* fixed spacing
* yolov8 class, init bias, and fixed c2f
* forward pass almost working
* fixed net structure
* init bias not needed, forward pass working
* load weights boilerplate
* load weights done?
* all variants loading!
* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)
* fix scale_boxes
* box_iou fixed and tested
* created the pre nms function
* fix nms
* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked
* added letterbox and pre_tranform for pre_process function
* fixed letterbox, pre_transform and added preprocess function
* custom NMS done, integrated prepare_boxes and nms, improved box_iou
* added postprocess function till parsing
* added draw_bounding_boxes_and_save function
* testing full flow
* using fetch for class names
* fixed make_anchors + all tinygrad now
* added command line arguments, weight downloading
* single image for now only
* made draw boxes more efficient
* made NMS functions efficient
* made compute_transform better
* v8 working now, inference is done
* prints objects detected in console now
* fixed image loading (pre processing)
* batch post processing
* created initial tests
* fixes bounding box thickness AND added get_detected_classes_with_frequency function
* cleaning for testing
* two tests
* added url option for image, removed need for specifiying arguments
* tests complete, but lots on things are printed on screen by ultralytics
* remove parse arguments
* fixed weight location
* fixed colours of classes, and black font when high brightness
* minor changes
* TODOs for later
* removed use of torch, using .npz weights
* fixed tests
* one path for fetch
* preprocess now in tinygrad, plus test fix for that
* updated tests
* fix tests
* no class labels needed
* Add files via upload
* Update showcase.md
* Update showcase.md
* added safe tensors as weights, and tests fix for that
* safe tensors test
* using safe_load
* using tinygrad functions now to load weights
* update tests
---------
Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
* safetensors test
* safe_save
* load back with real safetensors
* bugfix in device name. add simple torch_load
* it works for llama, but it's slower...
* mmap
* no intermediate
* load mmaped
* readinto speed
* not ready yet
* revert that
make it work out of the box for new users.
the default configuration of train_efficientnet is to use the smaller cifar
dataset. import datasets.imagenet tries to open imagenet_class_index.json
and will fail, unless user has already downloaded it.