OpenHands

mirror of https://github.com/All-Hands-AI/OpenHands.git synced 2026-01-09 14:57:59 -05:00

Author	SHA1	Message	Date
Robert Brennan	b5e00f577c	Replace All-Hands-AI references with OpenHands (#11287 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <engel.nyst@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-10-26 01:52:45 +02:00
Tim O'Farrell	4b303ec9b4	Fixes to unblock frontend (#11488 ) Co-authored-by: Ray Myers <ray.myers@gmail.com>	2025-10-23 14:43:45 -06:00
Xingyao Wang	b082ccc0fb	feat(llm): add support for deepseek and gpt-5-mini, util for token count (#10626 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-08-27 11:03:35 +08:00
Xingyao Wang	c2f46200c0	chore(lint): Apply comprehensive linting and formatting fixes (#10287 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-08-13 21:13:19 +02:00
Ibragim Badertdinov	19a6b6b618	feat(eval): Support evaluation on SWE-rebench (#10251 )	2025-08-12 14:05:43 +00:00
Xingyao Wang	c4f303a07b	chore(eval): Remove eval_infer_remote.sh script and related references (#10157 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-08-07 20:46:59 +00:00
xhguo7	9388fef0ef	feat(eval): loc acc evaluation (#8515 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: mamoodi <mamoodiha@gmail.com>	2025-07-11 03:22:35 +08:00
Linghao Zhang	a93b0457c6	feat(eval): Support evaluation on SWE-bench-Live (#9137 )	2025-06-15 12:30:47 +00:00
Xuhui Zhou	14498c5e25	Feature/swe run interact (#8714 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-05-27 19:35:21 +00:00
Zhaoling Chen	efe287ce34	integrate LocAgent into OpenHands (#7371 ) Co-authored-by: czlll <gangda@huaihe.usc.edu> Co-authored-by: Hoang Tran <descience.thh10@gmail.com>	2025-05-23 22:42:58 +07:00
Ryan H. Tran	3980ba53c9	Add option to run patch evaluation on Modal (#8607 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-05-23 00:45:45 +07:00
Engel Nyst	637cb0726a	specify condenser config for evals (#8177 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-21 22:08:57 +02:00
Graham Neubig	f317c03b1b	Fix inconsistent max_iterations in SWE-bench evaluation (#8467 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-13 02:07:57 +00:00
Graham Neubig	689d3c9046	Update pre-commit hook versions to most recent versions (#8343 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-08 03:59:13 +00:00
Michael Panchenko	14564b25d6	Fix linting (#7965 )	2025-04-21 06:34:40 +08:00
Engel Nyst	9b9b1291fc	[chore] Just linting on swe-bench files (#7918 )	2025-04-18 22:12:01 +08:00
Niels Mündler	4b124d5906	Add inference for SWT-Bench (#7201 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Calvin Smith <email@cjsmith.io>	2025-04-17 14:49:42 -06:00
Xingyao Wang	ddda30d9b7	fix(eval): iterative evaluation improvements; SWE-Bench multimodal fixes (#7739 ) Co-authored-by: Juan Michelini <juan@juan.com.uy> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev>	2025-04-09 02:44:03 +08:00
Xingyao Wang	9b9e728cf6	Iterative evaluation with rule-based critic (#7293 )	2025-03-17 18:37:35 +00:00
Xingyao Wang	a4d632498c	SWE-Gym rollout stability fix & using a validated SWE-Gym set (#7182 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-03-17 21:15:01 +08:00
Xingyao Wang	9f720a9d69	[eval] SWE-Gym Integration (#6651 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-03-05 20:15:02 +00:00
Xingyao Wang	bbf40c6576	docs: cleanup and update SWE-Bench documentation; and remove the support of non-instance-level image (#7118 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-03-06 03:18:40 +08:00
Xingyao Wang	33780f97d0	[eval] Upgrade SWE-Bench to use official image and latest harness (#6838 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-02-27 08:15:05 -05:00
Mateusz Kwiatkowski	6562297615	Replace shebang with /usr/bin/env bash for improved portability (#6876 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-02-24 18:07:28 +00:00
Xingyao Wang	391200510c	fix: revert #5506 for SWE-Bench performance regression (#6491 ) Co-authored-by: Robert Brennan <accounts@rbren.io>	2025-01-28 22:52:57 +08:00
Xingyao Wang	72af7bbba2	feat(eval): misc SWE-Bench improvement - use different resources for different instances (#6313 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-01-17 02:48:41 +08:00
Xingyao Wang	ec70af9412	refactor: Replace pexpect with libtmux in BashSession (#4881 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Robert Brennan <accounts@rbren.io>	2025-01-04 05:22:13 +08:00
Xingyao Wang	61ebec9ff7	feat(eval): better visualization for comparing two swe-bench runs (#5993 )	2025-01-03 02:36:51 +00:00
OpenHands	8975fcd714	Fix issue #5748 : Rename "Ran a Jupyter Command" to "Ran a Python Command" in UI (#5749 ) Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-12-26 23:30:19 +08:00
OpenHands	bfb191b5c7	Fix issue #5739 : [Bug]: Move ./evaluation/swe_bench/scripts/cleanup_remote_runtime.sh to general eval utils (#5740 )	2024-12-25 17:17:06 -05:00
Xingyao Wang	c333938384	feat(eval): add standard error to swebench summarize outputs (#5700 ) Co-authored-by: openhands <openhands@all-hands.dev>	2024-12-20 08:39:43 +08:00
Xingyao Wang	9cdb8d06c0	fix(eval): Use cp -r instead of mv for SWE-Bench Initialization (#5659 )	2024-12-17 21:21:27 +00:00
Ryan H. Tran	8ae2fb636e	Remove symlink use for swebench setup (#5549 )	2024-12-13 22:18:14 +08:00
Engel Nyst	b11e905988	Verify costs script (#5469 )	2024-12-10 14:20:53 +01:00
Engel Nyst	455e667739	add cost to summary (#5473 )	2024-12-10 03:14:03 +08:00
Xingyao Wang	9908e1b285	[Evaluation]: Log openhands version in eval output folder, instead of agent version (#5394 )	2024-12-04 03:33:43 +00:00
Xingyao Wang	990f277132	misc: Support folder-level exp analysis for SWE-Bench `summarize_outputs.py`; Handle CrashLoopBackoff for RemoteRuntime (#5385 )	2024-12-03 15:37:21 +00:00
OpenHands	678436da30	Fix issue #5222 : [Refactor]: Refactor the evaluation directory (#5223 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-11-25 08:35:52 -05:00

38 Commits