* var_vals is str,int
* remove imports
* remove print
* fix test
* change var_vals in hcq
* update test_hcq
* fix multitensor _device_num var
* fix syminfer test
* shorten line
* p.vars stays list[Variable]
* shorten line
* vars is back to tuple[Variable, ...]
* change var_vals in extra
* change var_vals from shapetracker
* var_vals is str:int
* fix signature
* fix extract_dataset + tests
* add CI
* sops.gz itself is same as master
* yml + gzip -c + ge
* don't commit that
* bump limit to 1000
* axis=7
* test_tiny
* Kernel.apply_opts [pr]
updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization
* not you yet
* most of the work from the uops2 branch
* schedule
* realize
* kernel
* lowerer
* search
* green
* merge uops with ops
* Revert "merge uops with ops"
This reverts commit 1408a59f12.
* fix benchmark
* remove extra dedup
* test/external/fuzz_linearizer: fix for new AST changes
also add beautiful_mnist failures
* add CLANG and LLVM to test_failure_35 failed_platforms
* fix test_linearizer_failure names
* search: add BEAM_VERIFY option to validate search results
refactor fuzz_linearizer comparison to allow it to be used in for
BEAM_VERIFY in device.py
* search: fix to verify the beam_search result and not the fastest
* search: fix typing and clean up
* device: remove imports from test and add LOGKERN options
LOGKERN output can be used with test/external/verify_kernel.py
to validate correctness
* fix example in verify_kernel.py
* cleanup fixes
* fix to use f-strings
* these asserts should pass
* fix that assert
* ALU dtypes
* acc dtype for group_for_reduce
* cast image ALUs to the base dtype
* remove all casts from linearizer
* fix argmax
* fix multinomial
* fix __getitem__
* Revert "fix __getitem__"
This reverts commit 62ad719bfa.
* fix MemBuffer outputs being wrong when there is an arange + ALU with a different dtype
eg. fancy slicing (int, float), bert embeddings (int, long)
this should be fixed in lazy instead of having to break the kernel
* cleanup argmax fix
* fix matmul in ints
cast in the end
* fix llama
* skip wrong hardcoded asts in the worlds dataset
* fix llama p2
* cleanup missing parts of the diff
---------
Co-authored-by: George Hotz <geohot@gmail.com>
* stable diffusion < 324ms
* revert swap action
* fix tests due to more sum splitting
* REDUCEOP_SPLIT_THRESHOLD env var
* added from unaligned np test (#2134)
* align cpu buffer before copy into cl buffer (#2135)
* remove shelve from handcode_resnet50_opt.py (#2139)
* Add dictionary keys to reduce db size (#2131)
* work
* ignore beam cache
* dictionary keys are generic
* minor db cleanups
* fix baseline and extract dataset
* fix training
* log likelihood
* more lin to feats
* sts
* training policynet
* net sort of works
* dedup
* refactor, stupid new actions
* fix uops deduping
* BEAM_ESTIMATE
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: imaolo <56898718+imaolo@users.noreply.github.com>