OpenHands

mirror of https://github.com/All-Hands-AI/OpenHands.git synced 2026-04-29 03:00:45 -04:00

Author	SHA1	Message	Date
Engel Nyst	eeb2342509	Refactor history/event stream (#3808 )	2024-11-05 03:36:14 +01:00
Xingyao Wang	1f23dc89b6	fix(eval): add runtime.connect to all eval harness (#4565 )	2024-10-26 00:41:30 +08:00
Xingyao Wang	2d5b360505	refactor: re-organize different runtime implementations into an impl folder (#4346 ) Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-10-23 10:10:03 +00:00
Xingyao Wang	da548d308c	[agent] LLM-based editing (#3985 ) Co-authored-by: Tim O'Farrell <tofarr@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-10-22 04:51:44 +08:00
Xingyao Wang	50c13aad98	[Eval] Improve SWE-Bench Eval harness: multi-run support & entry script simplification (#4396 )	2024-10-15 21:34:52 +08:00
Xingyao Wang	b23c7aab5a	[eval] stop set sid in eval (#4311 )	2024-10-10 11:47:27 +08:00
Aditya Bharat Soni	0809d26f4d	fix: Allow evaluation benchmarks to pass image urls in run_controller() instead of simply passing strings (#4100 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-10-07 15:37:08 -04:00
Xingyao Wang	80a631361b	eval: update aiderbench readme (#4209 )	2024-10-04 09:26:12 -04:00
Xingyao Wang	0c2a35b256	[eval] update aider bench scripts (#4203 )	2024-10-04 02:23:06 +00:00
tofarr	152f99c64f	Chore Bump python version (#3545 )	2024-10-03 13:40:55 -04:00
tobitege	dbb671a8a5	logname fix; improve test calling instruction (#3666 )	2024-08-30 17:15:31 +02:00
Xingyao Wang	090c911a50	(refactor) Make `Runtime` class synchronous (#3661 ) * change runtime to be synchronous * fix test runtime with the new interface * fix arg * fix eval * fix missing config attribute * fix plugins * fix on_event by revert it back to async * update upload_file endpoint * fix argument to upload file * remove unncessary async for eval; fix evaluation run in parallel * use asyncio to run controller for eval * revert file upload * truncate eval test result output	2024-08-30 01:37:03 +00:00
tobitege	c875a5fb77	(feat) Add Aider bench output visualizer (#3643 ) * aider-bench: add visualization to summarize script and readme * added example cost and actions histogram images for readme * moved dependencies to evaluation section	2024-08-29 05:03:44 +00:00
tobitege	9c39f07430	(enh) Aider-Bench: make resumable with skip_num arg (#3626 ) * added optional START_ID env flag to resume from that instance id * prepare_dataset: fix comparisons by using instance id's as int * aider bench complete_runtime: close runtime to close container * added matrix display of instance id for logging * fix typo in summarize_results.py saying summarise_results * changed start_id to skip_num to skip rows from dataset (start_id wasn't supportable) * doc changes about huggingface spaces to temporarily point back to OD	2024-08-28 15:42:01 +00:00
Raj Maheshwari	0cdeb83b17	Enabling of unittests in aider benchmark should be optional. (#3620 )	2024-08-27 17:25:55 +00:00
Raj Maheshwari	789f15a5db	Allow the Agent to run uniittests for verification. (#3609 ) * Allow the Agent to run uniittests for verification. * minor bugfix - removed artifact	2024-08-27 06:22:01 +00:00
tobitege	8fcf0817d4	(eval) Aider_bench: add eval_ids arg to run specific instance id's (#3592 ) * add eval_ids arg to run specific instance id's; fix/extend README * fix description in parser for --eval-ids * fix test_arg_parser.py to account for added arg * fix typo in README to say "summarize" instead of "summarise" for script	2024-08-27 00:49:26 +08:00
Graham Neubig	f9088766e8	Allow setting of runtime container image (#3573 ) * Add runtime container image setting * Fix typo in test * Fix sandbox base container image * Update variables * Update to base_container_image * Update tests/unit/test_config.py Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * Fixed eval * Fixed container_image * Fix typo --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-08-25 23:05:41 +00:00
Raj Maheshwari	11d8d05b1a	[Fix] Metrics should be updated when agent reaches max iterations. (#3549 )	2024-08-23 02:28:16 +00:00
Raj Maheshwari	80f88e14cd	[Feat] Aider Benchmark (#3507 ) * [Feat] Aider Benchmark * [Add] README.md	2024-08-21 18:05:41 +00:00

20 Commits