* add mem_layout
* ui
* cleanup
* work
* debugLine work and expander
* tooltip style
* real expand device
* wheel does one thing
* diff
* shows llama oom
* add y axis
* mypy chill
* work
* unittests for the memory layout
* kernel.py no longer permutes reduce axis [pr]
* delete tests that handcode uops
* regen of sops is broken...
* put import back
* just remove that
* disable those tests
* fix extract_dataset + tests
* add CI
* sops.gz itself is same as master
* yml + gzip -c + ge
* don't commit that
* bump limit to 1000
* axis=7
* test_tiny
* refactor count_float4 to take uops as input instead of kernel
* remove some calls to linearize in test_linearizer
* remove some more calls
* remove one more call
* squash commits
* temp fix for const tensor
* actually realizing float16 can only happen in raw_data
* .float -> cast(float) to rerun CI
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* proposal: add option to override opts in the get_program API
* update test_linearizer_rewrite
* state in uops
* update process_replay and names
* empty isn't none
* fix process replay
* minor cleanup on test_tensor_core_opts tests
Tests now notify when skipped
Before, they silently skipped if backend didn't had half precision and
accumulation
Also cleaned up atol and rtol setup
* refactor test_tensor_core_opts_group
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* change clang -march flag to -mcpu with fp16 disassembly test
* fix
* add capstone to macos dependencies
* just check no cast in test
* rm import
* woops
* lets check
* move check
* llvm init before cpu chcek
* try this
* bump autogen llvm version
* also update libclang?
* revert
* add comment
* skip llvm test and add comment
* linter
* move index validation to load/stores
* add name
* add linearizer_failure
* add validate_store with implicit gates
* linearizer_failure_58 is fixed!
* add test_uop_graph test
* rename cond to gate
* test gated load/stores
* use or_casted()
* viz: non blocking UOp tracing
* u.arg
* no if Ops.KENREL
* drop replace
* switch to weakref.WeakKeyDictionary
* back
* remove ram usage skips, viz works here
* cache on reconstruct
* fix
* add early verbose demo test
* is this how to write tests :s
* is definition drift even a thing? gemini says it is
* clean up
* better
* even better
* try add to CI
* doesn't work quite yet
* much more work to be done
* whoops
* partition the test heh
* skipif
* some nits for better names
* add webgpu test for onnxrunner
* fix reference links
* flush for now