OpenHands

mirror of https://github.com/All-Hands-AI/OpenHands.git synced 2026-04-29 03:00:45 -04:00

Author	SHA1	Message	Date
Xingyao Wang	4ce3b9094a	Revert "(feat): Prompt engineering to remind o1 to generate a patch" (#4846 )	2024-11-08 16:12:57 +00:00
Alejandro Cuadron Lafuente	a6810fa6ad	(feat): Prompt engineering to remind o1 to generate a patch (#4807 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: mamoodi <mamoodiha@gmail.com> Co-authored-by: tofarr <tofarr@gmail.com> Co-authored-by: Robert Brennan <contact@rbren.io>	2024-11-08 03:10:18 +00:00
Xingyao Wang	53390d9885	Fix issue #4583 : [Bug]: Unable to pull the full SWE-Bench test set (#4813 ) Co-authored-by: openhands <openhands@all-hands.dev>	2024-11-07 22:35:20 +08:00
OpenHands	025dac5d8f	Fix issue #4776 : [Bug]: Files are not uploaded to the environment (SWE-Bench) (#4795 )	2024-11-06 19:05:06 +00:00
Engel Nyst	eeb2342509	Refactor history/event stream (#3808 )	2024-11-05 03:36:14 +01:00
Xingyao Wang	966da7b7c8	feat(agent, CodeAct 2.2): native CodeAct support for Browsing (#4667 ) Co-authored-by: tofarr <tofarr@gmail.com>	2024-11-05 00:27:27 +08:00
Xingyao Wang	9c2b48ff5d	fix(eval): SWE-Bench instance with upper-case instance id (#4649 )	2024-10-30 21:24:18 +00:00
Xingyao Wang	6d19c93d19	[eval] add evaluation workflow (#4489 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-10-29 13:52:25 +00:00
Xingyao Wang	ae13171194	feat(agent): CodeAct with function calling (#4537 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: tofarr <tofarr@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-10-29 11:06:33 +08:00
Xingyao Wang	1f23dc89b6	fix(eval): add runtime.connect to all eval harness (#4565 )	2024-10-26 00:41:30 +08:00
Xingyao Wang	7340b78962	feat(eval): rewrite log_completions to save completions to directory (#4566 )	2024-10-25 16:36:11 +00:00
Xingyao Wang	2d5b360505	refactor: re-organize different runtime implementations into an impl folder (#4346 ) Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-10-23 10:10:03 +00:00
Xingyao Wang	da548d308c	[agent] LLM-based editing (#3985 ) Co-authored-by: Tim O'Farrell <tofarr@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-10-22 04:51:44 +08:00
Alejandro Cuadron Lafuente	a9a593bb21	[Fix] Added support to specify the platform on which the runtime image should be built. (#4402 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: mamoodi <mamoodiha@gmail.com> Co-authored-by: tofarr <tofarr@gmail.com> Co-authored-by: Robert Brennan <contact@rbren.io>	2024-10-20 09:19:05 +08:00
Xingyao Wang	91308ba4dc	feat: clean-up retries RemoteRuntime & add FatalErrorObservation (#4485 )	2024-10-18 17:23:13 +00:00
Jiayi Pan	c1b323a076	Show actual dataset name in swebench log directory (#4417 )	2024-10-17 10:32:38 +08:00
Xingyao Wang	50c13aad98	[Eval] Improve SWE-Bench Eval harness: multi-run support & entry script simplification (#4396 )	2024-10-15 21:34:52 +08:00
Xingyao Wang	4dfc7a7ef0	[Eval] Add a more lightweight / easier-to-use SWE-Bench output visualizer (#4360 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-10-14 02:09:01 +00:00
Xingyao Wang	b23c7aab5a	[eval] stop set sid in eval (#4311 )	2024-10-10 11:47:27 +08:00
Robert Brennan	45fb4fb9bc	allow reconnecting to a runtime (#4223 )	2024-10-09 16:37:52 +00:00
Engel Nyst	e6847e9e61	Move agenthub within openhands (#4130 )	2024-10-08 00:34:18 +00:00
Alejandro Cuadron Lafuente	a3571ec510	[Fix] Error when trying to pull all docker evaluation containers (#4244 )	2024-10-08 05:03:36 +08:00
Aditya Bharat Soni	0809d26f4d	fix: Allow evaluation benchmarks to pass image urls in run_controller() instead of simply passing strings (#4100 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-10-07 15:37:08 -04:00
Xingyao Wang	01ae54a69d	fix swebench repo/version being string (#4241 )	2024-10-07 22:01:42 +08:00
Xingyao Wang	245334e89d	[eval] improve update output script for swe-bench (#4180 )	2024-10-04 15:10:03 +00:00
Xingyao Wang	9cc9b19958	eval: improve swebench infer error handling and retry (#4205 )	2024-10-04 07:09:56 -05:00
tofarr	152f99c64f	Chore Bump python version (#3545 )	2024-10-03 13:40:55 -04:00
mamoodi	0144caaf1f	Update eval doc for remote runtime (#4145 )	2024-10-01 13:14:36 -04:00
Xingyao Wang	1109637efb	Update instruction for new version of eval runtime-api (#4128 )	2024-09-30 23:48:38 +00:00
Xingyao Wang	8d6eda3623	fix eval_infer.sh to correctly copy SWE-Bench logs (#4111 )	2024-09-29 18:39:18 -05:00
Ikko Eltociear Ashimine	c84495830e	[eval] update swe_bench/README.md (#3990 )	2024-09-23 11:03:09 +02:00
Xingyao Wang	714e46f29a	[eval] save eventstream & llm completions for SWE-Bench run_infer (#3923 )	2024-09-22 04:39:13 +00:00
Xingyao Wang	b13ed017d8	[eval] add git patch post-processing for SWE-Bench eval_infer (#3980 )	2024-09-20 15:33:53 +00:00
tofarr	ad0b549d8b	Feat Tightening up Timeouts and interrupt conditions. (#3926 )	2024-09-18 20:50:42 +00:00
Xingyao Wang	5d7f2fd4ae	[eval] Allow evaluation of SWE-Bench patches on `RemoteRuntime` (#3927 ) Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-18 16:07:34 -04:00
Engel Nyst	ef09f0fe37	Small fix in readme (#3912 )	2024-09-17 14:33:25 +00:00
Xingyao Wang	f996b31d64	[eval] Fix multi-processing bug (again^3) & allow set EXP_NAME for each `run_infer` (#3907 ) Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-09-17 14:07:58 +00:00
tobitege	52c5abccbf	(enh) Dockerfile.j2: improve env vars for bash and activate in .bashrc (#3871 )	2024-09-17 08:49:04 +02:00
Xingyao Wang	3a1b8c093b	[eval] yet another eval fixes on multi-processing (#3854 ) Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-13 15:51:22 +00:00
Xingyao Wang	78c5f58adc	refactor & improve retry for the reliability of `RemoteRuntime` & evaluation (#3846 )	2024-09-13 07:37:07 -04:00
Xingyao Wang	797f02ff6f	rename huggingface evaluation benchmark (#3845 )	2024-09-12 18:50:26 +00:00
Xingyao Wang	47d9621742	[eval] SWE-Bench eval usability fixes (#3836 ) * [eval] increase timeout for swebench eval init/complete * allow CmdRunAction to optionally block when .timeout is setted * fix unit test for serialization * fix unit tests for security analyzer * fix integration tests * add more timeout * only check P2P when instances are non-empty; convert P2P and F2P columns to string instead of list --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-12 16:33:51 +00:00
Xingyao Wang	2fe2f4c530	[eval] increase timeout for SWEBench eval init/complete (#3829 ) * [eval] increase timeout for swebench eval init/complete * allow CmdRunAction to optionally block when .timeout is setted * fix unit test for serialization * fix unit tests for security analyzer * fix integration tests * add more timeout	2024-09-12 15:20:58 +00:00
Jiayi Pan	43c4a7fff4	Allow Generalized SWE-Bench format for evaluation (#3752 ) * allow generalized swe-bench format * Update run_infer.py * fix linter --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-09-06 13:05:00 +00:00
Xingyao Wang	688068a44e	Fix issues for running `RemoteRuntime` in parallel on SWE-Bench (#3716 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * increase timeout for remote runtime * add push script * handle the case when ret push is an generator * update pbar * set SWE-Bench default to run SWE-Bench lite * add script to cleanup remote runtime * fix the cases when tag is too long * update README * update readme for cleanup * rename od to oh * Update evaluation/swe_bench/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * gets API key and Runtime from env var --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-05 10:34:31 +08:00
Xingyao Wang	d8a87d7ccb	[Eval] Make SWE-Bench run_infer.sh to default to run SWE-Bench Lite (#3704 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * increase timeout for remote runtime * add push script * handle the case when ret push is an generator * update pbar * set SWE-Bench default to run SWE-Bench lite	2024-09-04 00:58:14 +08:00
Xingyao Wang	d283420ac2	feat: add SWE-bench fullset support (#3477 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * add push script * handle the case when ret push is an generator * update pbar	2024-09-02 20:28:52 -04:00
Xingyao Wang	090c911a50	(refactor) Make `Runtime` class synchronous (#3661 ) * change runtime to be synchronous * fix test runtime with the new interface * fix arg * fix eval * fix missing config attribute * fix plugins * fix on_event by revert it back to async * update upload_file endpoint * fix argument to upload file * remove unncessary async for eval; fix evaluation run in parallel * use asyncio to run controller for eval * revert file upload * truncate eval test result output	2024-08-30 01:37:03 +00:00
Xingyao Wang	8b1f207d39	feat: support remote runtime (#3406 ) * feat: refactor building logic into runtime builder * return image name * fix testcases * use runtime builder for eventstream runtime * have runtime builder return str * add api_key to sandbox config * draft remote runtime * remove extra if clause * initialize runtime based on box class * add build logic * use base64 for file upload * get runtime image prefix from API * replace ___ with _s_ to make it a valid image name * use /build to start build and /build_status to check the build progress * update logging * fix exit code * always use port * add remote runtime * rename runtime * fix tests import * make dir first if work_dir does not exists; * update debug print to remote runtime * fix exit close_sync * update logging * add retry for stop * use all box class for test keep prompt * fix test browsing * add retry stop * merge init commands to save startup time * fix await * remove sandbox url * support execute through specific runtime url * fix file ops * simplify close * factor out runtime retry code * fix exception handling * fix content type error (e.g., bad gateway when runtime is not ready) * add retry for wait until alive; add retry for check image exists * Revert "add retry for wait until alive;" This reverts commit `dd013cd268`. * retry when wait until alive * clean up msg * directly save sdist to temp dir for _put_source_code_to_dir * support running testcases in parallel * tweak logging; try to close session * try to close session even on exception * update poetry lock * support remote to run integration tests * add warning for workspace base on remote runtime * set default runtime api * remove server runtime * update poetry lock * support running swe-bench (n=1) eval on remoteruntime * add a timeout of 30 min * add todo for docker namespace * update poetry loc	2024-08-29 15:53:37 +00:00
tobitege	9c39f07430	(enh) Aider-Bench: make resumable with skip_num arg (#3626 ) * added optional START_ID env flag to resume from that instance id * prepare_dataset: fix comparisons by using instance id's as int * aider bench complete_runtime: close runtime to close container * added matrix display of instance id for logging * fix typo in summarize_results.py saying summarise_results * changed start_id to skip_num to skip rows from dataset (start_id wasn't supportable) * doc changes about huggingface spaces to temporarily point back to OD	2024-08-28 15:42:01 +00:00

1 2 3

127 Commits