* Time travel for evaluation
* Fix source script path
* Exit script if given version doesn't exist
* Exit on failure
* Update README
* Change scripts of all other benchmarks
* Modify README files
* Fix logic_reasoning README
* adding draft evaluation code for EDA, using chatgpt as the temporal agent for now
* Update README.md
* Delete frontend/package.json
* reverse the irrelevant changes
* reverse package.json
* use chatgpt as the codeactagent
* integrate with opendevin
* Update evaluation/EDA/README.md
* Update evaluation/EDA/README.md
* Use poetry to manage packages
* integrate with opendevin
* minor update
* minor update
* update poetry
* update README
* clean-up infer scripts
* add run_infer script and improve readme
* log final success and final message & ground truth
---------
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>