* start LLM app, tons of clean up required. target is 200 line ollama
* kind of works
* simpler
* add k/v cache
* with SYM=1, it loops
* no rope cache
* simpler
* more cleanups
* cleanups
* works
* argparse and comments
* from gguf
* generate is a function
* no copy from cpu
* fix max context pass in
* test
* improve test
* ai2_arc
* fix 8B, use less ram
* 136 lines
* merge view infinite loop test
* adjust condition in `x//d -> x//(-d)*-1`
* Fix division by zero in add views
* adjust offset end
* fix typo in comment
* add target to test_merge_views_variable
* fix view incorrectly being masked
* ssimplify strides and offset of the new view to canonicalize
* remove print in test
---------
Co-authored-by: qazal <qazal.software@gmail.com>
* print inputs to get_program in process replay [pr]
* colors
* keep dataclass default escapes
* Revert "keep dataclass default escapes"
This reverts commit c6db7e8a7a.
* note for ast_repr
* add that back
* inital commit
* add qr + expand svd to full matrix
* add odd number support
* add linalg tests
* qr supports dims of arbitrary size
* add qr tests
* svd supports dims of arbitrary size
* small cleanip
* improvements over svd batch handling
* improve linalg tests
* make u_pad match q shape
* add nonfull matrix tests
* little less verbose nonfull svd test
* added dtypes on svd + return vt instead of vt
* lint
* more lint
* lint + set seed
* small fix
* small lint
* lint
* add int casting to indices and shapes
* remove int from shape tuple in svd
* small cleanup
* add return types
* reuse inverse_permute
* refactoring
* whitespace
* remove regularization term to prevent bad outputs on ill conditioned matrices
* remove seed
* refactor
* lint
* refactor
* spacing
* remove clone
* line reduction
* smarter heuristic for iterations_per_round
* add big test
* lint
* turns out no constant needed?
* wrap tests
* some small matrices need the constant
* remove realize
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* WebGPU on Windows
* Fix dawn-python install
* New test
* pydeps
* Minor fix
* Only install dawn-python on windows webgpu
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* fix process replay diff in PYTHON device [pr]
The PYTHON backend pickles and encodes UOps, the encoded binary can't be
directly diffed in process replay.
* note
* try
* ruff check --fix
* no skip test
* hmmmmmmm I don't get this D:
* run CI again
* why is PYTHON device faster than CPU?
* run ci again and fix lint
* actually doesn't PYTHON device make sense here?
* see cpu speed again
* Revert "see cpu speed again"
This reverts commit 1e366f2256.
* trigger CI
* pretty good
---------
Co-authored-by: chenyu <chenyu@fastmail.com>