* feat: initial tinyfs device
* feat: don't allow compute on tinyfs device
* feat: tensor helpers to load and store
* feat: bufferview for tinyfs
* fix: keep copy sizes correct
* fix: recv large
* clean: unneeded
* feat: comment
* clean: unneeded
* clean: remove
* clean: remove
* feat: get request tag
* feat: rename to cloud
* feat: send request_id
* feat: start computing tree
* feat: compute store tree on this side
* feat: jank chunked load
* feat: more debugging
* feat: rename to just load and store
* feat: correct chunk count
* fix: fix load for < 1mb
* feat: comments
* feat: don't truncate on block devices
* feat: better way of testing block device
* feat: don't need to pad that much
* feat: connect to nodes directly on load
* feat: cache connections
* feat: don't hard code chunk size
* feat: close mmap when closing file handle
* feat: don't overwrite stuff on disk if storing from disk
* clean: debug print
* fix: close mmap
* feat: await workers
* feat: fast copy from tinyfs to disk
* feat: don't copy to device on last
* feat: use single socket per device
* feat: raid in tinyfs
* clean: remove import
* clean: type
* feat: maintain single event loop
* feat: lower worker count
* feat: use connection pool
* feat: fetch mapping in its own process
* fix: release lock
* feat: don't fetch if exists
* feat: req id only on stores
* feat: always fetch
* fix: rangeify
* feat: allow specifying raid root
* fix: dealloc buffer
* feat: start support non 0 offset
* clean: use cleaner
* feat: don't pass to threadpool
* clean: typing
* Slice to unbind symbolic
* use vmax for now
* assert shape in reshape is valid
* update test_symbolic_ops to use shrink instead of reshape
* remove infer_with_bound_values for npw
* symbolic output doesnt have symbolic strides
* symbolic jit tests use shrink to unregister symbolic
* update test
* update more tests
* wrap vmax in int()
* only create a new st if the store is not an assigne
* unwrap st
* comments
* start cpu threading
* fix
* fix2
* fix
* hacks?
* threads
* minor
* no dsp
* dsp 2
* n
* more
* test
* xm
* cleaner
* readable
* f
* reorder
* when no threads
* rangeify
* typos
* not needed
* reapply
* remoev this
* linter
* fixed cpu count in ci
* fix
* fixes
* rm
* typo
* sort based on speed
* test if test works in ci
* Revert "test if test works in ci"
This reverts commit 1f05edb531.
* do not pad thread
* var_vals is str,int
* remove imports
* remove print
* fix test
* change var_vals in hcq
* update test_hcq
* fix multitensor _device_num var
* fix syminfer test
* shorten line
* p.vars stays list[Variable]
* shorten line
* vars is back to tuple[Variable, ...]
* change var_vals in extra
* change var_vals from shapetracker
* var_vals is str:int
* fix signature
* POSTOPT=2 work
* bugfixes
* add chain in one place
* tensor cores match
* better hcopt check
* match from old
* Change POSTOPT ContextVar value to 0
* we didn't need to check that
* ** simple kernel to replace Kernel for postopt
* support old
* fix beam
* beaming
* beam on old
* bring tensor cores back
* raise
* postbeam
* test ops passes on mac
* skip that
* postopt default
* gate that
* fix tensor cores
* a few test fixes
* dsp fix
* tc fix
* loop
* support swap
* test_gemv
* fix beam for variable
* test opts from high level stuff
* range annoying
* compile slow
* metal slow
* better beam
* no POSTBEAM
* fix nolocals
* hc opt mostly works
* put that back
* lil
* some work
* fix that
* POSTOPT 2
* fix tests
* no postopt 2
* work
* back
* padded tensors cores
* shift_to
* postopt 0 passes?
* write PADTO
* fix padded tensor cores
* compare hcopt
* 18000 lines
* should pass tests
* fix rangeify
* put types back
* Modify tests and start work towards removing symbolic reshape
* Refactor symbolic reshape
* fix small error
* much cleaner + fix more tests
* Can remove this now
* Update test_symbolic_ops and test_tiny
* Couple more tests
* Unused import
* More tests and add EXPAND to Tensor.empty
* Fix test beam search
* all int
* Fix rangeify by adding shrink
* Remove OOB check and so fix test_symbolic_jit
* test_symbolic_jit doesn't need OOB Context anymore either
* Should remove that test now
* Cleanups part 1
* fix linters
* Final cleanups
* Don't reassign inside for loop
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* BOOM
* cache extra/huggingface/models/
* why max buffer size is not 0
* override MAX_BUFFER_SIZE
* less models
* remove more models and change cache dir to already cached dir
* only metal
* less is more?
* remove check ops
* why is this not setting the ENVVAR
* ughhhhh just test in models
* only cpu and gpu
* only cpu actually
* just override it idk
* final
* move extra dependencies up top
* simplification
* fix print
* make README better
* revert ops_disk fix for now
* clean up test_onnx
* remove testing fashion clip model cuz sloooowwwwww
* actually let METAL run this
* fix comment mistake
* fix download path in run_models
* does this work?
* cleanup setup and teardown
* contextvar like this?
* prove model is cached
* do I need to increment DOWNLOAD_CACHE_VERSION?
* see if cached with incremented DOWNLOAD_CACHE_VERSION
* use warnings to see if the model exists
* revert DOWNLOAD_CACHE_VERSION stuff and clean up
* add retry to download
* nit