Compare commits

...

13 Commits

Author SHA1 Message Date
Engel Nyst
b460d64dfc gemini supports temp, top_p 2025-08-05 22:52:12 +02:00
Engel Nyst
7724df79bc Merge branch 'main' of github.com:All-Hands-AI/OpenHands into gemini-think 2025-08-05 22:40:32 +02:00
Engel Nyst
bac55154e0 Consolidate Gemini performance optimization tests into test_llm.py
- Move all 10 Gemini performance optimization tests from separate file to main test_llm.py
- Tests cover LLMConfig reasoning_effort defaults and LLM thinking budget optimization
- Remove duplicate test file to maintain single source of truth for LLM tests
- All tests passing and integrated with existing test structure
2025-07-30 01:15:23 +02:00
Engel Nyst
d34e29eb08 Apply pre-commit formatting fixes
- Fix trailing whitespace
- Add missing newline at end of file
- Apply ruff formatting for consistent code style
2025-07-30 00:43:07 +02:00
Engel Nyst
23221c11e6 Update openhands/llm/llm.py 2025-07-30 00:31:23 +02:00
Engel Nyst
b2afc984df Refactor and fix Gemini performance optimization tests
- Reduced test count from 18 to 10 by removing redundant tests
- Fixed failing tests that expected old behavior for medium/high reasoning_effort
- Updated tests to match PR #9913 behavior: medium/high reasoning_effort passes through to litellm
- Consolidated similar test cases and removed unnecessary fixtures
- All tests now pass and correctly validate the Gemini thinking budget optimization

Co-authored-by: openhands <openhands@all-hands.dev>
2025-07-29 17:00:39 +02:00
Engel Nyst
9293e2c452 Remove debug logging statements from PR #9913
- Remove tool calls debug logging from conversation_memory.py
- Remove LINE debug logging from cli_runtime.py
- These are cleanup changes from PR #9913

Co-authored-by: openhands <openhands@all-hands.dev>
2025-07-29 16:51:38 +02:00
Engel Nyst
10d7f10fcf Apply final changes from PR #9913
- Update llm.py to use allowed_openai_params and proper kwargs.pop() for reasoning_effort
- Replace test_completion_with_two_positional_args with test_llm_gemini_thinking_parameter
- Ensure Gemini thinking budget optimization works correctly with 128 token budget
2025-07-29 16:47:05 +02:00
Engel Nyst
e860359440 Apply Gemini performance optimizations from PR #9913
- Update reasoning_effort documentation to apply to all reasoning models
- Add debug print statement for model tracking in LLM class

Changes picked from upstream PR #9913 for Gemini 2.5 Pro performance improvements.
2025-07-27 05:32:34 +02:00
Engel Nyst
0f703ab77a Add comprehensive unit tests for Gemini performance optimizations
- Created 18 test cases covering LLMConfig reasoning_effort defaults and LLM thinking budget optimization
- Tests verify that Gemini models use thinking budget (128 tokens) instead of reasoning_effort
- Tests verify that non-Gemini models use reasoning_effort parameter
- Tests cover various model variants and edge cases
- All tests use proper mocking to avoid real API calls
- Tests follow existing pytest patterns with unittest mocks
2025-07-27 05:29:38 +02:00
Engel Nyst
4eb442be3d Remove debug print statement
Clean up debugging code that was accidentally included in the PR.
2025-07-27 04:50:04 +02:00
Engel Nyst
0331106526 Update Gemini performance fixes to match latest PR 9913 changes
- Use 'gemini-2.5-pro' in model check instead of exact equality
- Add debug logging for reasoning_effort configuration
- Handle 'none' reasoning_effort value in addition to None and 'low'
- Force thinking budget for all Gemini reasoning_effort values (FIXME comment)
- Set reasoning_effort=None when using thinking budget to avoid conflicts

These changes match the latest implementation in PR 9913 which includes
additional debugging and more comprehensive handling of reasoning_effort values.
2025-07-27 04:44:11 +02:00
Engel Nyst
9bc3663adf Apply Gemini 2.5 Pro performance fixes from PR 9913
- Change default reasoning_effort from 'high' to None in llm_config.py
- Add logic to set reasoning_effort to 'high' for non-Gemini models
- Implement optimized thinking budget for Gemini 2.5 Pro (128 tokens)
- Use thinking budget when reasoning_effort is None or 'low' for Gemini
- Pass through other reasoning_effort values to API for Gemini
- Add debug print for model name

These changes achieve ~2.4x speedup for Gemini 2.5 Pro by using
optimized thinking budget instead of full reasoning effort.

Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
2025-07-27 02:05:46 +02:00

View File

@@ -212,13 +212,15 @@ class LLM(RetryMixin, DebugMixin):
logger.debug(
f'Gemini model {self.config.model} with reasoning_effort {self.config.reasoning_effort} mapped to thinking {kwargs.get("thinking")}'
)
kwargs['top_p'] = 1
# kwargs['temperature'] = 0
else:
kwargs['reasoning_effort'] = self.config.reasoning_effort
kwargs.pop(
'temperature'
) # temperature is not supported for reasoning models
kwargs.pop('top_p') # reasoning model like o3 doesn't support top_p
kwargs.pop(
'temperature'
) # temperature is not supported for reasoning models
kwargs.pop('top_p') # reasoning model like o3 doesn't support top_p
# Azure issue: https://github.com/All-Hands-AI/OpenHands/issues/6777
if self.config.model.startswith('azure'):
kwargs['max_tokens'] = self.config.max_output_tokens