OpenHands

mirror of https://github.com/All-Hands-AI/OpenHands.git synced 2026-01-14 17:27:59 -05:00

Author	SHA1	Message	Date
Xingyao Wang	50c13aad98	[Eval] Improve SWE-Bench Eval harness: multi-run support & entry script simplification (#4396 )	2024-10-15 21:34:52 +08:00
Xingyao Wang	0c2a35b256	[eval] update aider bench scripts (#4203 )	2024-10-04 02:23:06 +00:00
tobitege	c875a5fb77	(feat) Add Aider bench output visualizer (#3643 ) * aider-bench: add visualization to summarize script and readme * added example cost and actions histogram images for readme * moved dependencies to evaluation section	2024-08-29 05:03:44 +00:00
tobitege	9c39f07430	(enh) Aider-Bench: make resumable with skip_num arg (#3626 ) * added optional START_ID env flag to resume from that instance id * prepare_dataset: fix comparisons by using instance id's as int * aider bench complete_runtime: close runtime to close container * added matrix display of instance id for logging * fix typo in summarize_results.py saying summarise_results * changed start_id to skip_num to skip rows from dataset (start_id wasn't supportable) * doc changes about huggingface spaces to temporarily point back to OD	2024-08-28 15:42:01 +00:00
Raj Maheshwari	0cdeb83b17	Enabling of unittests in aider benchmark should be optional. (#3620 )	2024-08-27 17:25:55 +00:00
tobitege	8fcf0817d4	(eval) Aider_bench: add eval_ids arg to run specific instance id's (#3592 ) * add eval_ids arg to run specific instance id's; fix/extend README * fix description in parser for --eval-ids * fix test_arg_parser.py to account for added arg * fix typo in README to say "summarize" instead of "summarise" for script	2024-08-27 00:49:26 +08:00
Raj Maheshwari	80f88e14cd	[Feat] Aider Benchmark (#3507 ) * [Feat] Aider Benchmark * [Add] README.md	2024-08-21 18:05:41 +00:00

7 Commits