* Move regression tests to evaluation/
* use pythnon instead of docker in the script
* add model para
* change python to python3
* bug fix
* add python path
* add readme
* a starting point for SWE-Bench evaluation with docker
* fix the swe-bench uid issue
* typo fixed
* fix conda missing issue
* move files based on new PR
* Update doc and gitignore using devin prediction file from #81
* fix typo
* add a sentence
* fix typo in path
* fix path
---------
Co-authored-by: Binyuan Hui <binyuan.hby@alibaba-inc.com>
* adding code to fetch and convert devin's output for evaluation
* update README.md
* update code for fetching and processing devin's outputs
* update code for fetching and processing devin's outputs