* update the backend to fix torch deprecation warning
* use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients
* fix indentation
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* nak works
* TestOps::test_add works
* testop has no crashes
* fix bool casts
* fix typo
* add disassemble
* RANGE and locals/regs
* simplify NAKCompiler
* disass cleanup
* cleanup nir codegen
* almost all tests passing
* cleanup notes in extra/
* old notes
* only import nak if NIR=1
* fix new SPECIAL syntax
* fix local/shared memory
* more tests passing
* add DEFINE_VAR support
* llvmpipe kinda works
* diskcache
* some mypy stuff
* lvp passing test_ops.py
* fix imports
* actually fix imports
* remove 'stdout'
* fix llvm import
* fix mypy issues
* nicer errors
* simpler test_dtype skips
* test lvp in CI
* fix github action syntax
* fix more actions typos
* switch to mesa 25.1.0
* diskcache_put
* better generation for lvp nir_options
* b64encode shader blobs
* Revert diskcache changes
This reverts commits 930fa3de8a and 8428c694b3.
* general cleanup
* better error messages
* fix llvm import
* fix windows tests
* link with libm and libgcc_s
* fix some errors
* dont check for 'float4'
* NIR uses pointer arithmetic
* use tinymesa
* bump tinymesa
* bump tinymesa again
* update lvp nir_options
* print nir shader with DEBUG
* simplify LVPCompiler
* more tests
* "gated" STORE
* NAK is cacheable
* more tests
* all tests pass locally for NAK
* test autogen in CI
* autogen deps
* more deps
* fix uop_gc
* fix macos
* mypy
* save 2 lines
* save two more lines
* save 1 line
* save 4 lines
* save more lines
* Revert "save more lines"
This reverts commit dd3a720c5a.
* save more lines
* fix LVP on windows
* refactor
* reorganize some code
* refactor lib_gpu
* move LVP check
* out of order loads
* remove support.mesa
* bump tinymesa version
* simplify LVP jit
* macos
* macos ci
* shell: bash
* testing
* more testing
* compute brew prefix
* stupid typo
* actually fix
* lib
* stdout on macos
* inline gallivm_compile_module
* Revert "inline gallivm_compile_module"
This reverts commit b65983b151.
* elf macos
* semicolon
* inherit from CPULLVMCompiler
* ruff
* disas test
* fix libm linking
* default is fine actually
* arm works
* add elf loader link test
* fix NAK beam
* pylint is too smart by half
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* tbgpu
* works
* cleaner
* this
* zero size
* h
* fix
* simpler
* prio over usb
* c
* not needed
* linter
* this way
* mappings
* mypy
* mypy
* mypy 2
* nn
* feat: initial tinyfs device
* feat: don't allow compute on tinyfs device
* feat: tensor helpers to load and store
* feat: bufferview for tinyfs
* fix: keep copy sizes correct
* fix: recv large
* clean: unneeded
* feat: comment
* clean: unneeded
* clean: remove
* clean: remove
* feat: get request tag
* feat: rename to cloud
* feat: send request_id
* feat: start computing tree
* feat: compute store tree on this side
* feat: jank chunked load
* feat: more debugging
* feat: rename to just load and store
* feat: correct chunk count
* fix: fix load for < 1mb
* feat: comments
* feat: don't truncate on block devices
* feat: better way of testing block device
* feat: don't need to pad that much
* feat: connect to nodes directly on load
* feat: cache connections
* feat: don't hard code chunk size
* feat: close mmap when closing file handle
* feat: don't overwrite stuff on disk if storing from disk
* clean: debug print
* fix: close mmap
* feat: await workers
* feat: fast copy from tinyfs to disk
* feat: don't copy to device on last
* feat: use single socket per device
* feat: raid in tinyfs
* clean: remove import
* clean: type
* feat: maintain single event loop
* feat: lower worker count
* feat: use connection pool
* feat: fetch mapping in its own process
* fix: release lock
* feat: don't fetch if exists
* feat: req id only on stores
* feat: always fetch
* fix: rangeify
* feat: allow specifying raid root
* fix: dealloc buffer
* feat: start support non 0 offset
* clean: use cleaner
* feat: don't pass to threadpool
* clean: typing
* Slice to unbind symbolic
* use vmax for now
* assert shape in reshape is valid
* update test_symbolic_ops to use shrink instead of reshape
* remove infer_with_bound_values for npw
* symbolic output doesnt have symbolic strides
* symbolic jit tests use shrink to unregister symbolic
* update test
* update more tests
* wrap vmax in int()
* only create a new st if the store is not an assigne
* unwrap st
* comments
* start cpu threading
* fix
* fix2
* fix
* hacks?
* threads
* minor
* no dsp
* dsp 2
* n
* more
* test
* xm
* cleaner
* readable
* f
* reorder
* when no threads
* rangeify
* typos
* not needed
* reapply
* remoev this
* linter
* fixed cpu count in ci
* fix
* fixes
* rm
* typo
* sort based on speed
* test if test works in ci
* Revert "test if test works in ci"
This reverts commit 1f05edb531.
* do not pad thread
* var_vals is str,int
* remove imports
* remove print
* fix test
* change var_vals in hcq
* update test_hcq
* fix multitensor _device_num var
* fix syminfer test
* shorten line
* p.vars stays list[Variable]
* shorten line
* vars is back to tuple[Variable, ...]
* change var_vals in extra
* change var_vals from shapetracker
* var_vals is str:int
* fix signature
* POSTOPT=2 work
* bugfixes
* add chain in one place
* tensor cores match
* better hcopt check
* match from old
* Change POSTOPT ContextVar value to 0
* we didn't need to check that
* ** simple kernel to replace Kernel for postopt
* support old
* fix beam
* beaming
* beam on old
* bring tensor cores back
* raise
* postbeam
* test ops passes on mac
* skip that
* postopt default
* gate that
* fix tensor cores
* a few test fixes
* dsp fix
* tc fix
* loop
* support swap
* test_gemv
* fix beam for variable
* test opts from high level stuff
* range annoying
* compile slow
* metal slow
* better beam
* no POSTBEAM
* fix nolocals
* hc opt mostly works
* put that back
* lil
* some work
* fix that
* POSTOPT 2
* fix tests
* no postopt 2
* work
* back
* padded tensors cores
* shift_to
* postopt 0 passes?
* write PADTO
* fix padded tensor cores
* compare hcopt
* 18000 lines
* should pass tests
* fix rangeify
* put types back
* Modify tests and start work towards removing symbolic reshape
* Refactor symbolic reshape
* fix small error
* much cleaner + fix more tests
* Can remove this now
* Update test_symbolic_ops and test_tiny
* Couple more tests
* Unused import
* More tests and add EXPAND to Tensor.empty
* Fix test beam search
* all int
* Fix rangeify by adding shrink
* Remove OOB check and so fix test_symbolic_jit
* test_symbolic_jit doesn't need OOB Context anymore either
* Should remove that test now
* Cleanups part 1
* fix linters
* Final cleanups
* Don't reassign inside for loop
---------
Co-authored-by: chenyu <chenyu@fastmail.com>