OpenHands

mirror of https://github.com/All-Hands-AI/OpenHands.git synced 2026-01-09 14:57:59 -05:00

Author	SHA1	Message	Date
Xingyao Wang	da548d308c	[agent] LLM-based editing (#3985 ) Co-authored-by: Tim O'Farrell <tofarr@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-10-22 04:51:44 +08:00
Alejandro Cuadron Lafuente	a9a593bb21	[Fix] Added support to specify the platform on which the runtime image should be built. (#4402 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: mamoodi <mamoodiha@gmail.com> Co-authored-by: tofarr <tofarr@gmail.com> Co-authored-by: Robert Brennan <contact@rbren.io>	2024-10-20 09:19:05 +08:00
Xingyao Wang	91308ba4dc	feat: clean-up retries RemoteRuntime & add FatalErrorObservation (#4485 )	2024-10-18 17:23:13 +00:00
Jiayi Pan	c1b323a076	Show actual dataset name in swebench log directory (#4417 )	2024-10-17 10:32:38 +08:00
Xingyao Wang	84a578ad20	[test] remove integration tests from CI & move them into evaluation (#4447 )	2024-10-17 05:38:23 +08:00
mamoodi	6f2e678028	Fix eval output path in case of @ char (#4416 )	2024-10-15 22:45:08 +00:00
Abhijeetsingh Meena	173018eb58	fix: Resolves HumanEval Inference by replacing task_id with instance_id (#4364 ) Co-authored-by: Harshit Surana <surana.h@gmail.com>	2024-10-15 15:18:38 +00:00
Xingyao Wang	50c13aad98	[Eval] Improve SWE-Bench Eval harness: multi-run support & entry script simplification (#4396 )	2024-10-15 21:34:52 +08:00
Xingyao Wang	25f9413965	[Eval] Fix eval stuck when `result` is too large for pbar (#4361 )	2024-10-14 22:08:34 +08:00
Xingyao Wang	4dfc7a7ef0	[Eval] Add a more lightweight / easier-to-use SWE-Bench output visualizer (#4360 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-10-14 02:09:01 +00:00
Xingyao Wang	b23c7aab5a	[eval] stop set sid in eval (#4311 )	2024-10-10 11:47:27 +08:00
Robert Brennan	45fb4fb9bc	allow reconnecting to a runtime (#4223 )	2024-10-09 16:37:52 +00:00
Engel Nyst	e6847e9e61	Move agenthub within openhands (#4130 )	2024-10-08 00:34:18 +00:00
Alejandro Cuadron Lafuente	a3571ec510	[Fix] Error when trying to pull all docker evaluation containers (#4244 )	2024-10-08 05:03:36 +08:00
Aditya Bharat Soni	0809d26f4d	fix: Allow evaluation benchmarks to pass image urls in run_controller() instead of simply passing strings (#4100 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-10-07 15:37:08 -04:00
Xingyao Wang	01ae54a69d	fix swebench repo/version being string (#4241 )	2024-10-07 22:01:42 +08:00
Xingyao Wang	245334e89d	[eval] improve update output script for swe-bench (#4180 )	2024-10-04 15:10:03 +00:00
Xingyao Wang	80a631361b	eval: update aiderbench readme (#4209 )	2024-10-04 09:26:12 -04:00
Xingyao Wang	9cc9b19958	eval: improve swebench infer error handling and retry (#4205 )	2024-10-04 07:09:56 -05:00
Xingyao Wang	0c2a35b256	[eval] update aider bench scripts (#4203 )	2024-10-04 02:23:06 +00:00
tofarr	152f99c64f	Chore Bump python version (#3545 )	2024-10-03 13:40:55 -04:00
Xingyao Wang	53a015f718	fix: make llm_completions optional to fix `eval_infer.py` (#4148 )	2024-10-02 03:55:03 +08:00
mamoodi	0144caaf1f	Update eval doc for remote runtime (#4145 )	2024-10-01 13:14:36 -04:00
Xingyao Wang	1109637efb	Update instruction for new version of eval runtime-api (#4128 )	2024-09-30 23:48:38 +00:00
Xingyao Wang	8d6eda3623	fix eval_infer.sh to correctly copy SWE-Bench logs (#4111 )	2024-09-29 18:39:18 -05:00
tobitege	c3bbe604eb	(fix) Fix logging in shared eval file to prevent key disclosure (#4108 )	2024-09-28 19:33:16 +00:00
Xingyao Wang	81b3cd71b3	[eval] log evaluating warnings directly to console (#4026 )	2024-09-26 03:42:32 +08:00
Xingyao Wang	1b1d8f0b02	[eval] Use `imap_unorderd` for parallizing evaluation (#4040 )	2024-09-24 20:47:27 +00:00
Xingyao Wang	a66e738957	[eval] use mp Pool instead ProcessPoolExecutor (#4025 )	2024-09-24 23:59:06 +08:00
Ikko Eltociear Ashimine	c84495830e	[eval] update swe_bench/README.md (#3990 )	2024-09-23 11:03:09 +02:00
Xingyao Wang	714e46f29a	[eval] save eventstream & llm completions for SWE-Bench run_infer (#3923 )	2024-09-22 04:39:13 +00:00
Xingyao Wang	b13ed017d8	[eval] add git patch post-processing for SWE-Bench eval_infer (#3980 )	2024-09-20 15:33:53 +00:00
Engel Nyst	8fdfece059	Refactor messages serialization (#3832 ) Co-authored-by: Robert Brennan <accounts@rbren.io>	2024-09-18 23:48:58 +02:00
tofarr	ad0b549d8b	Feat Tightening up Timeouts and interrupt conditions. (#3926 )	2024-09-18 20:50:42 +00:00
Xingyao Wang	5d7f2fd4ae	[eval] Allow evaluation of SWE-Bench patches on `RemoteRuntime` (#3927 ) Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-18 16:07:34 -04:00
Engel Nyst	ef09f0fe37	Small fix in readme (#3912 )	2024-09-17 14:33:25 +00:00
Xingyao Wang	f996b31d64	[eval] Fix multi-processing bug (again^3) & allow set EXP_NAME for each `run_infer` (#3907 ) Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-09-17 14:07:58 +00:00
tobitege	52c5abccbf	(enh) Dockerfile.j2: improve env vars for bash and activate in .bashrc (#3871 )	2024-09-17 08:49:04 +02:00
Graham Neubig	243cb492aa	Run pre-commit on all files (#3884 )	2024-09-16 11:07:08 -04:00
Xingyao Wang	2b3925278d	[eval] refactor process instance logic into `update_progress` (#3875 )	2024-09-15 18:47:15 -04:00
Engel Nyst	379f2b6f23	Fix queue length on Macs (#3867 )	2024-09-14 01:11:29 +00:00
Xingyao Wang	3a1b8c093b	[eval] yet another eval fixes on multi-processing (#3854 ) Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-13 15:51:22 +00:00
Xingyao Wang	78c5f58adc	refactor & improve retry for the reliability of `RemoteRuntime` & evaluation (#3846 )	2024-09-13 07:37:07 -04:00
Xingyao Wang	797f02ff6f	rename huggingface evaluation benchmark (#3845 )	2024-09-12 18:50:26 +00:00
Xingyao Wang	47d9621742	[eval] SWE-Bench eval usability fixes (#3836 ) * [eval] increase timeout for swebench eval init/complete * allow CmdRunAction to optionally block when .timeout is setted * fix unit test for serialization * fix unit tests for security analyzer * fix integration tests * add more timeout * only check P2P when instances are non-empty; convert P2P and F2P columns to string instead of list --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-12 16:33:51 +00:00
Xingyao Wang	2fe2f4c530	[eval] increase timeout for SWEBench eval init/complete (#3829 ) * [eval] increase timeout for swebench eval init/complete * allow CmdRunAction to optionally block when .timeout is setted * fix unit test for serialization * fix unit tests for security analyzer * fix integration tests * add more timeout	2024-09-12 15:20:58 +00:00
Jiayi Pan	43c4a7fff4	Allow Generalized SWE-Bench format for evaluation (#3752 ) * allow generalized swe-bench format * Update run_infer.py * fix linter --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-09-06 13:05:00 +00:00
Xingyao Wang	688068a44e	Fix issues for running `RemoteRuntime` in parallel on SWE-Bench (#3716 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * increase timeout for remote runtime * add push script * handle the case when ret push is an generator * update pbar * set SWE-Bench default to run SWE-Bench lite * add script to cleanup remote runtime * fix the cases when tag is too long * update README * update readme for cleanup * rename od to oh * Update evaluation/swe_bench/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * gets API key and Runtime from env var --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-05 10:34:31 +08:00
Xingyao Wang	d8a87d7ccb	[Eval] Make SWE-Bench run_infer.sh to default to run SWE-Bench Lite (#3704 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * increase timeout for remote runtime * add push script * handle the case when ret push is an generator * update pbar * set SWE-Bench default to run SWE-Bench lite	2024-09-04 00:58:14 +08:00
Xingyao Wang	d283420ac2	feat: add SWE-bench fullset support (#3477 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * add push script * handle the case when ret push is an generator * update pbar	2024-09-02 20:28:52 -04:00

... 2 3 4 5 6 ...

360 Commits