chore: add claude 4 to verified mode & global replace 3.7 to claude 4 (#8665)

Co-authored-by: openhands <openhands@all-hands.dev>
This commit is contained in:
Xingyao Wang
2025-05-24 01:35:30 +08:00
committed by GitHub
parent 5e43dbadcb
commit 31ad7fc175
22 changed files with 45 additions and 27 deletions

View File

@@ -24,7 +24,7 @@ on:
LLM_MODEL: LLM_MODEL:
required: false required: false
type: string type: string
default: "anthropic/claude-3-7-sonnet-20250219" default: "anthropic/claude-sonnet-4-20250514"
LLM_API_VERSION: LLM_API_VERSION:
required: false required: false
type: string type: string

View File

@@ -67,7 +67,7 @@ docker run -it --rm --pull=always \
You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)! You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)!
When you open the application, you'll be asked to choose an LLM provider and add an API key. When you open the application, you'll be asked to choose an LLM provider and add an API key.
[Anthropic's Claude 3.7 Sonnet](https://www.anthropic.com/api) (`anthropic/claude-3-7-sonnet-20250219`) [Anthropic's Claude Sonnet 4](https://www.anthropic.com/api) (`anthropic/claude-sonnet-4-20250514`)
works best, but you have [many options](https://docs.all-hands.dev/modules/usage/llms). works best, but you have [many options](https://docs.all-hands.dev/modules/usage/llms).
## 💡 Other ways to run OpenHands ## 💡 Other ways to run OpenHands

View File

@@ -52,4 +52,4 @@ $ poetry run python docs/translation_updater.py
# ... # ...
``` ```
This process uses `claude-3-7-sonnet-20250219` as base model and each language consumes at least ~30k input tokens and ~35k output tokens. This process uses `claude-sonnet-4-20250514` as base model and each language consumes at least ~30k input tokens and ~35k output tokens.

View File

@@ -13,7 +13,7 @@ recommandations pour la sélection de modèles. Nos derniers résultats d'évalu
Sur la base de ces résultats et des retours de la communauté, les modèles suivants ont été vérifiés comme fonctionnant raisonnablement bien avec OpenHands : Sur la base de ces résultats et des retours de la communauté, les modèles suivants ont été vérifiés comme fonctionnant raisonnablement bien avec OpenHands :
- [anthropic/claude-3-7-sonnet-20250219](https://www.anthropic.com/api) (recommandé) - [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommandé)
- [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/) - [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) - [deepseek/deepseek-chat](https://api-docs.deepseek.com/)
- [openai/o3-mini](https://openai.com/index/openai-o3-mini/) - [openai/o3-mini](https://openai.com/index/openai-o3-mini/)

View File

@@ -13,7 +13,7 @@ OpenHandsはLiteLLMでサポートされているあらゆるLLMに接続でき
これらの調査結果とコミュニティからのフィードバックに基づき、以下のモデルはOpenHandsでうまく動作することが確認されています これらの調査結果とコミュニティからのフィードバックに基づき、以下のモデルはOpenHandsでうまく動作することが確認されています
- [anthropic/claude-3-7-sonnet-20250219](https://www.anthropic.com/api) (推奨) - [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (推奨)
- [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/) - [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) - [deepseek/deepseek-chat](https://api-docs.deepseek.com/)
- [openai/o3-mini](https://openai.com/index/openai-o3-mini/) - [openai/o3-mini](https://openai.com/index/openai-o3-mini/)

View File

@@ -13,7 +13,7 @@ recomendações para seleção de modelos. Nossos resultados de benchmarking mai
Com base nessas descobertas e feedback da comunidade, os seguintes modelos foram verificados e funcionam razoavelmente bem com o OpenHands: Com base nessas descobertas e feedback da comunidade, os seguintes modelos foram verificados e funcionam razoavelmente bem com o OpenHands:
- [anthropic/claude-3-7-sonnet-20250219](https://www.anthropic.com/api) (recomendado) - [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recomendado)
- [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/) - [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) - [deepseek/deepseek-chat](https://api-docs.deepseek.com/)
- [openai/o3-mini](https://openai.com/index/openai-o3-mini/) - [openai/o3-mini](https://openai.com/index/openai-o3-mini/)

View File

@@ -12,7 +12,7 @@ OpenHands 可以连接到任何 LiteLLM 支持的 LLM。但是它需要一个
基于这些发现和社区反馈,以下模型已被验证可以与 OpenHands 合理地配合使用: 基于这些发现和社区反馈,以下模型已被验证可以与 OpenHands 合理地配合使用:
- [anthropic/claude-3-7-sonnet-20250219](https://www.anthropic.com/api)(推荐) - [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api)(推荐)
- [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/) - [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) - [deepseek/deepseek-chat](https://api-docs.deepseek.com/)
- [openai/o3-mini](https://openai.com/index/openai-o3-mini/) - [openai/o3-mini](https://openai.com/index/openai-o3-mini/)

View File

@@ -23,7 +23,7 @@ This command opens an interactive prompt where you can type tasks or commands an
1. Set the following environment variables in your terminal: 1. Set the following environment variables in your terminal:
- `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](../runtimes/docker#using-sandbox_volumes)) - `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](../runtimes/docker#using-sandbox_volumes))
- `LLM_MODEL` - the LLM model to use (e.g. `export LLM_MODEL="anthropic/claude-3-7-sonnet-20250219"`) - `LLM_MODEL` - the LLM model to use (e.g. `export LLM_MODEL="anthropic/claude-sonnet-4-20250514"`)
- `LLM_API_KEY` - your API key (e.g. `export LLM_API_KEY="sk_test_12345"`) - `LLM_API_KEY` - your API key (e.g. `export LLM_API_KEY="sk_test_12345"`)
2. Run the following command: 2. Run the following command:

View File

@@ -23,7 +23,7 @@ To run OpenHands in Headless mode with Docker:
1. Set the following environment variables in your terminal: 1. Set the following environment variables in your terminal:
- `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](../runtimes/docker#using-sandbox_volumes)) - `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](../runtimes/docker#using-sandbox_volumes))
- `LLM_MODEL` - the LLM model to use (e.g. `export LLM_MODEL="anthropic/claude-3-7-sonnet-20250219"`) - `LLM_MODEL` - the LLM model to use (e.g. `export LLM_MODEL="anthropic/claude-sonnet-4-20250514"`)
- `LLM_API_KEY` - your API key (e.g. `export LLM_API_KEY="sk_test_12345"`) - `LLM_API_KEY` - your API key (e.g. `export LLM_API_KEY="sk_test_12345"`)
2. Run the following Docker command: 2. Run the following Docker command:

View File

@@ -13,7 +13,7 @@ recommendations for model selection. Our latest benchmarking results can be foun
Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands: Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands:
- [anthropic/claude-3-7-sonnet-20250219](https://www.anthropic.com/api) (recommended) - [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended)
- [openai/o4-mini](https://openai.com/index/introducing-o3-and-o4-mini/) - [openai/o4-mini](https://openai.com/index/introducing-o3-and-o4-mini/)
- [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/) - [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) - [deepseek/deepseek-chat](https://api-docs.deepseek.com/)

View File

@@ -57,7 +57,7 @@ def translate_content(content, target_lang):
system_prompt = f'You are a professional translator. Translate the following content into {target_lang}. Preserve all Markdown formatting, code blocks, and front matter. Keep any {{% jsx %}} tags and similar intact. Do not translate code examples, URLs, or technical terms.' system_prompt = f'You are a professional translator. Translate the following content into {target_lang}. Preserve all Markdown formatting, code blocks, and front matter. Keep any {{% jsx %}} tags and similar intact. Do not translate code examples, URLs, or technical terms.'
message = client.messages.create( message = client.messages.create(
model='claude-3-7-sonnet-20250219', model='claude-sonnet-4-20250514',
max_tokens=4096, max_tokens=4096,
temperature=0, temperature=0,
system=system_prompt, system=system_prompt,

View File

@@ -48,7 +48,7 @@ describe("Content", () => {
await waitFor(() => { await waitFor(() => {
expect(provider).toHaveValue("Anthropic"); expect(provider).toHaveValue("Anthropic");
expect(model).toHaveValue("claude-3-7-sonnet-20250219"); expect(model).toHaveValue("claude-sonnet-4-20250514");
expect(apiKey).toHaveValue(""); expect(apiKey).toHaveValue("");
expect(apiKey).toHaveProperty("placeholder", ""); expect(apiKey).toHaveProperty("placeholder", "");
@@ -135,7 +135,7 @@ describe("Content", () => {
); );
const condensor = screen.getByTestId("enable-memory-condenser-switch"); const condensor = screen.getByTestId("enable-memory-condenser-switch");
expect(model).toHaveValue("anthropic/claude-3-7-sonnet-20250219"); expect(model).toHaveValue("anthropic/claude-sonnet-4-20250514");
expect(baseUrl).toHaveValue(""); expect(baseUrl).toHaveValue("");
expect(apiKey).toHaveValue(""); expect(apiKey).toHaveValue("");
expect(apiKey).toHaveProperty("placeholder", ""); expect(apiKey).toHaveProperty("placeholder", "");
@@ -542,7 +542,7 @@ describe("Form submission", () => {
// select model // select model
await userEvent.click(model); await userEvent.click(model);
const modelOption = screen.getByText("claude-3-7-sonnet-20250219"); const modelOption = screen.getByText("claude-sonnet-4-20250514");
await userEvent.click(modelOption); await userEvent.click(modelOption);
const submitButton = screen.getByTestId("submit-button"); const submitButton = screen.getByTestId("submit-button");
@@ -550,7 +550,7 @@ describe("Form submission", () => {
expect(saveSettingsSpy).toHaveBeenCalledWith( expect(saveSettingsSpy).toHaveBeenCalledWith(
expect.objectContaining({ expect.objectContaining({
llm_model: "anthropic/claude-3-7-sonnet-20250219", llm_model: "anthropic/claude-sonnet-4-20250514",
llm_base_url: "", llm_base_url: "",
confirmation_mode: false, confirmation_mode: false,
}), }),

View File

@@ -71,6 +71,18 @@ describe("extractModelAndProvider", () => {
separator: "/", separator: "/",
}); });
expect(extractModelAndProvider("claude-sonnet-4-20250514")).toEqual({
provider: "anthropic",
model: "claude-sonnet-4-20250514",
separator: "/",
});
expect(extractModelAndProvider("claude-opus-4-20250514")).toEqual({
provider: "anthropic",
model: "claude-opus-4-20250514",
separator: "/",
});
expect(extractModelAndProvider("claude-3-haiku-20240307")).toEqual({ expect(extractModelAndProvider("claude-3-haiku-20240307")).toEqual({
provider: "anthropic", provider: "anthropic",
model: "claude-3-haiku-20240307", model: "claude-3-haiku-20240307",

View File

@@ -100,7 +100,7 @@ const openHandsHandlers = [
"gpt-4o", "gpt-4o",
"gpt-4o-mini", "gpt-4o-mini",
"anthropic/claude-3.5", "anthropic/claude-3.5",
"anthropic/claude-3-7-sonnet-20250219", "anthropic/claude-sonnet-4-20250514",
]), ]),
), ),

View File

@@ -279,7 +279,7 @@ function LlmSettingsScreen() {
<ModelSelector <ModelSelector
models={modelsAndProviders} models={modelsAndProviders}
currentModel={ currentModel={
settings.LLM_MODEL || "anthropic/claude-3-5-sonnet-20241022" settings.LLM_MODEL || "anthropic/claude-sonnet-4-20250514"
} }
onChange={handleModelIsDirty} onChange={handleModelIsDirty}
/> />
@@ -342,9 +342,9 @@ function LlmSettingsScreen() {
name="llm-custom-model-input" name="llm-custom-model-input"
label={t(I18nKey.SETTINGS$CUSTOM_MODEL)} label={t(I18nKey.SETTINGS$CUSTOM_MODEL)}
defaultValue={ defaultValue={
settings.LLM_MODEL || "anthropic/claude-3-7-sonnet-20250219" settings.LLM_MODEL || "anthropic/claude-sonnet-4-20250514"
} }
placeholder="anthropic/claude-3-7-sonnet-20250219" placeholder="anthropic/claude-sonnet-4-20250514"
type="text" type="text"
className="w-[680px]" className="w-[680px]"
onChange={handleCustomModelIsDirty} onChange={handleCustomModelIsDirty}

View File

@@ -3,7 +3,7 @@ import { Settings } from "#/types/settings";
export const LATEST_SETTINGS_VERSION = 5; export const LATEST_SETTINGS_VERSION = 5;
export const DEFAULT_SETTINGS: Settings = { export const DEFAULT_SETTINGS: Settings = {
LLM_MODEL: "anthropic/claude-3-7-sonnet-20250219", LLM_MODEL: "anthropic/claude-sonnet-4-20250514",
LLM_BASE_URL: "", LLM_BASE_URL: "",
AGENT: "CodeActAgent", AGENT: "CodeActAgent",
LANGUAGE: "en", LANGUAGE: "en",

View File

@@ -6,6 +6,8 @@ export const VERIFIED_MODELS = [
"o4-mini-2025-04-16", "o4-mini-2025-04-16",
"claude-3-5-sonnet-20241022", "claude-3-5-sonnet-20241022",
"claude-3-7-sonnet-20250219", "claude-3-7-sonnet-20250219",
"claude-sonnet-4-20250514",
"claude-opus-4-20250514",
"deepseek-chat", "deepseek-chat",
]; ];
@@ -39,4 +41,6 @@ export const VERIFIED_ANTHROPIC_MODELS = [
"claude-3-opus-20240229", "claude-3-opus-20240229",
"claude-3-sonnet-20240229", "claude-3-sonnet-20240229",
"claude-3-7-sonnet-20250219", "claude-3-7-sonnet-20250219",
"claude-sonnet-4-20250514",
"claude-opus-4-20250514",
]; ];

View File

@@ -167,6 +167,8 @@ VERIFIED_ANTHROPIC_MODELS = [
'claude-3-opus-20240229', 'claude-3-opus-20240229',
'claude-3-sonnet-20240229', 'claude-3-sonnet-20240229',
'claude-3-7-sonnet-20250219', 'claude-3-7-sonnet-20250219',
'claude-sonnet-4-20250514',
'claude-opus-4-20250514',
] ]

View File

@@ -47,7 +47,7 @@ class LLMConfig(BaseModel):
seed: The seed to use for the LLM. seed: The seed to use for the LLM.
""" """
model: str = Field(default='claude-3-7-sonnet-20250219') model: str = Field(default='claude-sonnet-4-20250514')
api_key: SecretStr | None = Field(default=None) api_key: SecretStr | None = Field(default=None)
base_url: str | None = Field(default=None) base_url: str | None = Field(default=None)
api_version: str | None = Field(default=None) api_version: str | None = Field(default=None)

View File

@@ -109,7 +109,7 @@ export GIT_USERNAME="your-gitlab-username" # Optional, defaults to token owner
# LLM configuration # LLM configuration
export LLM_MODEL="anthropic/claude-3-7-sonnet-20250219" # Recommended export LLM_MODEL="anthropic/claude-sonnet-4-20250514" # Recommended
export LLM_API_KEY="your-llm-api-key" export LLM_API_KEY="your-llm-api-key"
export LLM_BASE_URL="your-api-url" # Optional, for API proxies export LLM_BASE_URL="your-api-url" # Optional, for API proxies
``` ```

View File

@@ -24,7 +24,7 @@ jobs:
macro: ${{ vars.OPENHANDS_MACRO || '@openhands-agent' }} macro: ${{ vars.OPENHANDS_MACRO || '@openhands-agent' }}
max_iterations: ${{ fromJson(vars.OPENHANDS_MAX_ITER || 50) }} max_iterations: ${{ fromJson(vars.OPENHANDS_MAX_ITER || 50) }}
base_container_image: ${{ vars.OPENHANDS_BASE_CONTAINER_IMAGE || '' }} base_container_image: ${{ vars.OPENHANDS_BASE_CONTAINER_IMAGE || '' }}
LLM_MODEL: ${{ vars.LLM_MODEL || 'anthropic/claude-3-7-sonnet-20250219' }} LLM_MODEL: ${{ vars.LLM_MODEL || 'anthropic/claude-sonnet-4-20250514' }}
target_branch: ${{ vars.TARGET_BRANCH || 'main' }} target_branch: ${{ vars.TARGET_BRANCH || 'main' }}
runner: ${{ vars.TARGET_RUNNER }} runner: ${{ vars.TARGET_RUNNER }}
secrets: secrets:

View File

@@ -354,11 +354,11 @@ class TestModelAndProviderFunctions:
assert result['separator'] == '/' assert result['separator'] == '/'
def test_extract_model_and_provider_anthropic_implicit(self): def test_extract_model_and_provider_anthropic_implicit(self):
model = 'claude-3-7-sonnet-20250219' model = 'claude-sonnet-4-20250514'
result = extract_model_and_provider(model) result = extract_model_and_provider(model)
assert result['provider'] == 'anthropic' assert result['provider'] == 'anthropic'
assert result['model'] == 'claude-3-7-sonnet-20250219' assert result['model'] == 'claude-sonnet-4-20250514'
assert result['separator'] == '/' assert result['separator'] == '/'
def test_extract_model_and_provider_versioned(self): def test_extract_model_and_provider_versioned(self):
@@ -380,7 +380,7 @@ class TestModelAndProviderFunctions:
def test_organize_models_and_providers(self): def test_organize_models_and_providers(self):
models = [ models = [
'openai/gpt-4o', 'openai/gpt-4o',
'anthropic/claude-3-7-sonnet-20250219', 'anthropic/claude-sonnet-4-20250514',
'o3-mini', 'o3-mini',
'anthropic.claude-3-5', # Should be ignored as it uses dot separator for anthropic 'anthropic.claude-3-5', # Should be ignored as it uses dot separator for anthropic
'unknown-model', 'unknown-model',
@@ -397,7 +397,7 @@ class TestModelAndProviderFunctions:
assert 'o3-mini' in result['openai']['models'] assert 'o3-mini' in result['openai']['models']
assert len(result['anthropic']['models']) == 1 assert len(result['anthropic']['models']) == 1
assert 'claude-3-7-sonnet-20250219' in result['anthropic']['models'] assert 'claude-sonnet-4-20250514' in result['anthropic']['models']
assert len(result['other']['models']) == 1 assert len(result['other']['models']) == 1
assert 'unknown-model' in result['other']['models'] assert 'unknown-model' in result['other']['models']