OpenHands

mirror of https://github.com/All-Hands-AI/OpenHands.git synced 2026-01-13 16:58:07 -05:00

Author	SHA1	Message	Date
Xingyao Wang	4ce3b9094a	Revert "(feat): Prompt engineering to remind o1 to generate a patch" (#4846 )	2024-11-08 16:12:57 +00:00
Alejandro Cuadron Lafuente	a6810fa6ad	(feat): Prompt engineering to remind o1 to generate a patch (#4807 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: mamoodi <mamoodiha@gmail.com> Co-authored-by: tofarr <tofarr@gmail.com> Co-authored-by: Robert Brennan <contact@rbren.io>	2024-11-08 03:10:18 +00:00
Xingyao Wang	53390d9885	Fix issue #4583 : [Bug]: Unable to pull the full SWE-Bench test set (#4813 ) Co-authored-by: openhands <openhands@all-hands.dev>	2024-11-07 22:35:20 +08:00
OpenHands	025dac5d8f	Fix issue #4776 : [Bug]: Files are not uploaded to the environment (SWE-Bench) (#4795 )	2024-11-06 19:05:06 +00:00
Engel Nyst	eeb2342509	Refactor history/event stream (#3808 )	2024-11-05 03:36:14 +01:00
Xingyao Wang	1d2a616be7	Fix issue #4739 : '[Bug]: The agent doesn'"'"'t know its name' (#4740 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-11-04 21:24:35 +00:00
Xingyao Wang	966da7b7c8	feat(agent, CodeAct 2.2): native CodeAct support for Browsing (#4667 ) Co-authored-by: tofarr <tofarr@gmail.com>	2024-11-05 00:27:27 +08:00
Abhijeetsingh Meena	8857f02083	[Eval] DiscoveryBench OpenHands Integration (#4627 ) Signed-off-by: Abhijeetsingh Meena <abhijeet040403@gmail.com> Co-authored-by: Harshit Surana <surana.h@gmail.com>	2024-11-02 07:24:34 -04:00
Ziru "Ron" Chen	db4e1dbbec	[eval] Add ScienceAgentBench. (#4645 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-11-01 02:30:55 +08:00
Xingyao Wang	9c2b48ff5d	fix(eval): SWE-Bench instance with upper-case instance id (#4649 )	2024-10-30 21:24:18 +00:00
Xingyao Wang	6d19c93d19	[eval] add evaluation workflow (#4489 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-10-29 13:52:25 +00:00
Xingyao Wang	ae13171194	feat(agent): CodeAct with function calling (#4537 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: tofarr <tofarr@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-10-29 11:06:33 +08:00
Xingyao Wang	1f23dc89b6	fix(eval): add runtime.connect to all eval harness (#4565 )	2024-10-26 00:41:30 +08:00
Xingyao Wang	7340b78962	feat(eval): rewrite log_completions to save completions to directory (#4566 )	2024-10-25 16:36:11 +00:00
tofarr	c4f5c07be1	Refactor: shorter syntax (#4558 )	2024-10-25 06:45:28 -06:00
Graham Neubig	ce2430180f	Update README.md to fix miniwob name (#4534 )	2024-10-23 18:24:43 +00:00
Xingyao Wang	2d5b360505	refactor: re-organize different runtime implementations into an impl folder (#4346 ) Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-10-23 10:10:03 +00:00
Graham Neubig	54250e3fe2	Update evaluation README.md structure (#4516 )	2024-10-22 14:42:22 +00:00
Xingyao Wang	da548d308c	[agent] LLM-based editing (#3985 ) Co-authored-by: Tim O'Farrell <tofarr@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-10-22 04:51:44 +08:00
Alejandro Cuadron Lafuente	a9a593bb21	[Fix] Added support to specify the platform on which the runtime image should be built. (#4402 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: mamoodi <mamoodiha@gmail.com> Co-authored-by: tofarr <tofarr@gmail.com> Co-authored-by: Robert Brennan <contact@rbren.io>	2024-10-20 09:19:05 +08:00
Xingyao Wang	91308ba4dc	feat: clean-up retries RemoteRuntime & add FatalErrorObservation (#4485 )	2024-10-18 17:23:13 +00:00
Jiayi Pan	c1b323a076	Show actual dataset name in swebench log directory (#4417 )	2024-10-17 10:32:38 +08:00
Xingyao Wang	84a578ad20	[test] remove integration tests from CI & move them into evaluation (#4447 )	2024-10-17 05:38:23 +08:00
mamoodi	6f2e678028	Fix eval output path in case of @ char (#4416 )	2024-10-15 22:45:08 +00:00
Abhijeetsingh Meena	173018eb58	fix: Resolves HumanEval Inference by replacing task_id with instance_id (#4364 ) Co-authored-by: Harshit Surana <surana.h@gmail.com>	2024-10-15 15:18:38 +00:00
Xingyao Wang	50c13aad98	[Eval] Improve SWE-Bench Eval harness: multi-run support & entry script simplification (#4396 )	2024-10-15 21:34:52 +08:00
Xingyao Wang	25f9413965	[Eval] Fix eval stuck when `result` is too large for pbar (#4361 )	2024-10-14 22:08:34 +08:00
Xingyao Wang	4dfc7a7ef0	[Eval] Add a more lightweight / easier-to-use SWE-Bench output visualizer (#4360 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-10-14 02:09:01 +00:00
Xingyao Wang	b23c7aab5a	[eval] stop set sid in eval (#4311 )	2024-10-10 11:47:27 +08:00
Robert Brennan	45fb4fb9bc	allow reconnecting to a runtime (#4223 )	2024-10-09 16:37:52 +00:00
Engel Nyst	e6847e9e61	Move agenthub within openhands (#4130 )	2024-10-08 00:34:18 +00:00
Alejandro Cuadron Lafuente	a3571ec510	[Fix] Error when trying to pull all docker evaluation containers (#4244 )	2024-10-08 05:03:36 +08:00
Aditya Bharat Soni	0809d26f4d	fix: Allow evaluation benchmarks to pass image urls in run_controller() instead of simply passing strings (#4100 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-10-07 15:37:08 -04:00
Xingyao Wang	01ae54a69d	fix swebench repo/version being string (#4241 )	2024-10-07 22:01:42 +08:00
Xingyao Wang	245334e89d	[eval] improve update output script for swe-bench (#4180 )	2024-10-04 15:10:03 +00:00
Xingyao Wang	80a631361b	eval: update aiderbench readme (#4209 )	2024-10-04 09:26:12 -04:00
Xingyao Wang	9cc9b19958	eval: improve swebench infer error handling and retry (#4205 )	2024-10-04 07:09:56 -05:00
Xingyao Wang	0c2a35b256	[eval] update aider bench scripts (#4203 )	2024-10-04 02:23:06 +00:00
tofarr	152f99c64f	Chore Bump python version (#3545 )	2024-10-03 13:40:55 -04:00
Xingyao Wang	53a015f718	fix: make llm_completions optional to fix `eval_infer.py` (#4148 )	2024-10-02 03:55:03 +08:00
mamoodi	0144caaf1f	Update eval doc for remote runtime (#4145 )	2024-10-01 13:14:36 -04:00
Xingyao Wang	1109637efb	Update instruction for new version of eval runtime-api (#4128 )	2024-09-30 23:48:38 +00:00
Xingyao Wang	8d6eda3623	fix eval_infer.sh to correctly copy SWE-Bench logs (#4111 )	2024-09-29 18:39:18 -05:00
tobitege	c3bbe604eb	(fix) Fix logging in shared eval file to prevent key disclosure (#4108 )	2024-09-28 19:33:16 +00:00
Xingyao Wang	81b3cd71b3	[eval] log evaluating warnings directly to console (#4026 )	2024-09-26 03:42:32 +08:00
Xingyao Wang	1b1d8f0b02	[eval] Use `imap_unorderd` for parallizing evaluation (#4040 )	2024-09-24 20:47:27 +00:00
Xingyao Wang	a66e738957	[eval] use mp Pool instead ProcessPoolExecutor (#4025 )	2024-09-24 23:59:06 +08:00
Ikko Eltociear Ashimine	c84495830e	[eval] update swe_bench/README.md (#3990 )	2024-09-23 11:03:09 +02:00
Xingyao Wang	714e46f29a	[eval] save eventstream & llm completions for SWE-Bench run_infer (#3923 )	2024-09-22 04:39:13 +00:00
Xingyao Wang	b13ed017d8	[eval] add git patch post-processing for SWE-Bench eval_infer (#3980 )	2024-09-20 15:33:53 +00:00

1 2 3 4 5

228 Commits