* refine the gemm tuning scripts to reduce tuning space and better perf numbers
* added code to support tuning in full tuning space
* add a function to get best tuning config
* refine the matmul tutorial example to print out best tuning config for each input
* added even_k to gemm kernel heuristic for better performance
* address review comments
* Remove adding multiple architectures to isa head
* Add mask for gpu memory load in scripts for tuning gemm 'script/amd/gemm/matmul.py'
* Move the scripts to a better place 'scripts/amd/gemm/'
This is a combination of 7 commits.
use pyt nightly with root
repro with pytorch unit test
hardcode isROCM to true
set is_cuda to False
ignore cc arg
clean up
match triton-mlir branch
This is a combination of 6 commits.
use local bitcode
This is a combination of 3 commits.
add bit code to repo
update test
change bit code path
move bit code
update path
update scripts
update test
fix path issue