* fix syntax issues in imagenet_download.py
* use cloudpickle in cross_process to make it work in Python 3.9+
* add cross_process test
* prevent unpickling on every function call
* add cloudpickle to setup.py
* add support for args/kwargs
* Use generators in any(..) instead of lists for better best-case
* Use generators in all(...) instead of lists
* enable R1729 in .pylintrc
* revert import sorting
---------
Co-authored-by: Anselm Coogan <anselm@scandit.com>
* perf: remove extra function, include in cached getitem
* perf: only calculate hash once per node
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
* new upcast works
* float4 try
* fix unaligned float4
* disallow unaligned access
* upcast dim
* maybe good now
* fix gpu half
* vstore_half4
* fix deep image bugs
* improve symbolic to fix issues
* fix symbolic
* cl test
* this maybe
* gcd of 1 is 1
* real fix for old python
* improve fuzzer
* realize hotspots
* no str check
* minor changes
* make this an assert
* faster and more readable
* nicer self.buffers
* tests for weak op + LAZYCACHE=0
* refactor: print formatting for llama timing, report median and individual runs
* feat: back to mean
* fix: whitespace
* fix: add mean to print
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
* Dedup params in optimizer
* Passing the same tensor multiple times in the set of learnable params passed to optimizers can result in models completely failing to learn, but no errors are produced. This dedups tensors to avoid the problem.
* Fix types
* Use new variable to satisfy linter
* Use `helpers.dedup` instead of `set()` to dedup params
* Add test for duped params in optimizers
* MaskRCNN weights loading
* backbone maybe works
* backbone works, but resnet body atol 1e-3
* RPN Call, but veryy wrong output
* fixed topk
* RPN maybe works, not sure about nms
* Fix cursed modules
* add back editorconfig
* Full call, wrong output
* Full call works
* fix mask
* use NMS from retinanet
* Removing extra funcs
* refactor
* readable
* Add example to run model
* remove filter
* Fix split, batched inference is worse
* Fix image sizes
* Matching reference
* merge master
* add filter on top detections
* cuda backend fixed
* add model eval and spec
* convert images to rgb
* fix eval
* simplify examples code
* remove extra code
* meshgrid using tinygrad
* removing numpy
* roi align, floor, ceil
* remove numpy from level_mapper
* remove numpy from pooler
* Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference"
This reverts commit 4b95a3cb49, reversing
changes made to 98f2b1fa2e.
* roi align gather
* fix master merge
* revert to old floor, ceil as ints present in domain
* use log2 op
* fix indexes
* weird bug with ints and gpu
* weird bug with ints and gpu
* refactors, add env var for gather
* floor with contiguous, where
* refactor topk, sort
* remove staticmethod
* refactor stride
* remove log2 mlop
* realize -> contiguous
* refactor forward
* remove num_classes, stride_in_1x1 from state
* refactor forward
* refactoring
* flake8
* removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk
* keep using tinygrad for smaller gathers
* fix empty tensors
* comms
* move from tensor.py
* resnet test passing
* add coco dataset back
* fix spaces
* add test for log2
* no need to create Tensors
* no need to create Tensors
---------
Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
* test speed llama
* oops, put it back
* uses the real device codegen
* just do it on the mac
* pp
* is faster?
* Revert "is faster?"
This reverts commit 42db542010.
* disable docker again for less load on CI