Xingyao Wang
|
cff5697456
|
eval: remove gemini-specific swebench template (#9623)
|
2025-07-08 18:34:23 +00:00 |
|
Linghao Zhang
|
a93b0457c6
|
feat(eval): Support evaluation on SWE-bench-Live (#9137)
|
2025-06-15 12:30:47 +00:00 |
|
Graham Neubig
|
0c307ea12e
|
Lint all files in the repo (#9131)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
|
2025-06-14 16:25:59 +00:00 |
|
Engel Nyst
|
fd3b4ac8e6
|
Refactor SWE-bench instruction (#8010)
|
2025-06-13 23:27:52 +02:00 |
|
Leander Maben
|
d84befe28f
|
Adding LLM Based Editing capability (#8677)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Engel Nyst <engel.nyst@gmail.com>
|
2025-06-09 21:57:20 +08:00 |
|
Robert Brennan
|
205f0234e8
|
Rename Conversation to ServerConversation and AppConfig to OpenHandsConfig (#8754)
Co-authored-by: openhands <openhands@all-hands.dev>
|
2025-05-28 21:48:34 +02:00 |
|
Engel Nyst
|
637cb0726a
|
specify condenser config for evals (#8177)
Co-authored-by: openhands <openhands@all-hands.dev>
|
2025-05-21 22:08:57 +02:00 |
|
Xingyao Wang
|
2ecc39ffcc
|
[eval]: disable MCP for SWE-Bench evaluation (#8574)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Engel Nyst <engel.nyst@gmail.com>
|
2025-05-19 01:32:46 +00:00 |
|
Yueqi Song
|
3ca585b79f
|
Update run_infer.py to incorporate selection of task based on repo (#8509)
|
2025-05-15 12:27:28 +08:00 |
|
Graham Neubig
|
689d3c9046
|
Update pre-commit hook versions to most recent versions (#8343)
Co-authored-by: openhands <openhands@all-hands.dev>
|
2025-05-08 03:59:13 +00:00 |
|
Engel Nyst
|
9b9b1291fc
|
[chore] Just linting on swe-bench files (#7918)
|
2025-04-18 22:12:01 +08:00 |
|
Niels Mündler
|
4b124d5906
|
Add inference for SWT-Bench (#7201)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Calvin Smith <email@cjsmith.io>
|
2025-04-17 14:49:42 -06:00 |
|
Engel Nyst
|
5e5bf23f9c
|
[Evaluation] Fix KeyError when the instance failed prematurely (#7864)
|
2025-04-15 15:19:31 +00:00 |
|
Engel Nyst
|
d05a6f30e1
|
[Refactor] Rename codeact_* agent options to simple name (#7853)
|
2025-04-15 00:14:13 +02:00 |
|
sp.wack
|
72b5e18898
|
fix(backend): Return 400 if trying to open a binary file (#7825)
|
2025-04-11 22:47:57 +00:00 |
|
Engel Nyst
|
bb98d94b35
|
[evaluation] fix missing metadata (#7819)
|
2025-04-11 16:58:59 +00:00 |
|
juanmichelini
|
53c0c5a07b
|
SWE-bench_verified instruction baseline improvements to 60% (#7546)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
|
2025-04-10 16:08:27 +00:00 |
|
Xingyao Wang
|
0087082643
|
Improve binary file handling and patch generation in SWE-bench evaluation (#7762)
Co-authored-by: openhands <openhands@all-hands.dev>
|
2025-04-08 22:57:33 +00:00 |
|
Xingyao Wang
|
ddda30d9b7
|
fix(eval): iterative evaluation improvements; SWE-Bench multimodal fixes (#7739)
Co-authored-by: Juan Michelini <juan@juan.com.uy>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: openhands <openhands@all-hands.dev>
|
2025-04-09 02:44:03 +08:00 |
|
Xingyao Wang
|
648c8ffb21
|
(llm): Support OpenHands LM (#7598)
Co-authored-by: mamoodi <mamoodiha@gmail.com>
|
2025-03-31 17:29:31 +00:00 |
|
Xingyao Wang
|
54236f9617
|
[eval] Support SWE-Bench Multimodal (#7122)
Co-authored-by: openhands <openhands@all-hands.dev>
|
2025-03-31 07:42:44 -04:00 |
|
Xingyao Wang
|
9b9e728cf6
|
Iterative evaluation with rule-based critic (#7293)
|
2025-03-17 18:37:35 +00:00 |
|
Xingyao Wang
|
a4d632498c
|
SWE-Gym rollout stability fix & using a validated SWE-Gym set (#7182)
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
|
2025-03-17 21:15:01 +08:00 |
|
Elena Chistova
|
38e866cde4
|
Fix official SWE-Bench docker image prefix (#7214)
|
2025-03-12 18:23:19 +00:00 |
|
Xingyao Wang
|
a4908f9a75
|
[agent] system message + SWE-Bench instruction improvements (#7018)
|
2025-03-08 00:27:02 +08:00 |
|
Xingyao Wang
|
9f720a9d69
|
[eval] SWE-Gym Integration (#6651)
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
|
2025-03-05 20:15:02 +00:00 |
|
Xingyao Wang
|
bbf40c6576
|
docs: cleanup and update SWE-Bench documentation; and remove the support of non-instance-level image (#7118)
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
|
2025-03-06 03:18:40 +08:00 |
|
Engel Nyst
|
395c1ea9e3
|
[Refactor] split runtime initialization (create, connect, init) in cli scripts (#7036)
|
2025-03-03 00:19:25 +01:00 |
|
Magic Mai
|
8a58e724c6
|
fix: Remove nested git repositories before adding files in SWE-bench (#6536)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
|
2025-02-28 01:19:33 +00:00 |
|
Xingyao Wang
|
33780f97d0
|
[eval] Upgrade SWE-Bench to use official image and latest harness (#6838)
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
|
2025-02-27 08:15:05 -05:00 |
|
Engel Nyst
|
4f98bce6df
|
Add selected_repo to command line (#6949)
|
2025-02-26 20:42:59 +01:00 |
|
Xingyao Wang
|
1a7003a705
|
Add sysbox support to remote runtime for eval; Add memory monitor, stress tests to help debug memory issue (#6684)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
|
2025-02-18 20:02:28 +00:00 |
|
tofarr
|
bbfdc62139
|
Fix for issue where retries continue on a closed runtime (#6564)
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
|
2025-02-03 08:44:09 -07:00 |
|
Xingyao Wang
|
1a9971b1bf
|
misc: make RemoteRuntime API timeout configurable (#6518)
Co-authored-by: Robert Brennan <accounts@rbren.io>
|
2025-01-30 06:30:18 +08:00 |
|
Xingyao Wang
|
391200510c
|
fix: revert #5506 for SWE-Bench performance regression (#6491)
Co-authored-by: Robert Brennan <accounts@rbren.io>
|
2025-01-28 22:52:57 +08:00 |
|
Engel Nyst
|
5b7fcfbe1a
|
Disable prompt extensions in SWE-bench (#6391)
|
2025-01-21 17:18:30 +00:00 |
|
Xingyao Wang
|
899c1f8360
|
fix(bash): also show timeout reminder when no_change_timeout is triggered (#6318)
Co-authored-by: Robert Brennan <accounts@rbren.io>
|
2025-01-18 03:31:23 +08:00 |
|
Xingyao Wang
|
72af7bbba2
|
feat(eval): misc SWE-Bench improvement - use different resources for different instances (#6313)
Co-authored-by: openhands <openhands@all-hands.dev>
|
2025-01-17 02:48:41 +08:00 |
|
Xingyao Wang
|
0bed17758f
|
fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command (#6280)
|
2025-01-17 01:27:00 +08:00 |
|
Calvin Smith
|
6e4ff56934
|
feature: Condenser Interface and Defaults (#5306)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Calvin Smith <calvin@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
|
2025-01-08 04:36:30 +08:00 |
|
Xingyao Wang
|
ec70af9412
|
refactor: Replace pexpect with libtmux in BashSession (#4881)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Robert Brennan <accounts@rbren.io>
|
2025-01-04 05:22:13 +08:00 |
|
Robert Brennan
|
0e4e1b3316
|
Factor out ActionExecutionClient (#5796)
|
2024-12-30 15:32:13 +00:00 |
|
Xingyao Wang
|
581d5ec7a8
|
feat(eval): increase resource factor for remote runtime when previous run failed due to resource (#5709)
|
2024-12-21 01:47:06 +08:00 |
|
Xingyao Wang
|
e9cafb0372
|
chore: Cleanup runtime exception handling (#5696)
|
2024-12-19 17:28:29 +00:00 |
|
Engel Nyst
|
3297e4d5a8
|
Use litellm's modify params (#5636)
|
2024-12-17 21:32:49 +01:00 |
|
OpenHands
|
4998b5de32
|
Fix issue #5559: The turn limit should be measured from the last user interaction (#5560)
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
|
2024-12-16 16:28:23 -05:00 |
|
Engel Nyst
|
b295f5775c
|
Revert "Fix issue #5609: Use litellm's modify_params with default True" (#5631)
|
2024-12-16 20:39:57 +00:00 |
|
OpenHands
|
09735c7869
|
Fix issue #5609: Use litellm's modify_params with default True (#5611)
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
|
2024-12-16 20:18:45 +01:00 |
|
Engel Nyst
|
4716955960
|
Remove unused codeact-SWE agent (#5600)
Co-authored-by: openhands <openhands@all-hands.dev>
|
2024-12-14 20:49:44 +01:00 |
|
OpenHands
|
678436da30
|
Fix issue #5222: [Refactor]: Refactor the evaluation directory (#5223)
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
|
2024-11-25 08:35:52 -05:00 |
|