gemini supports temp, top_p

Merge branch 'main' of github.com:All-Hands-AI/OpenHands into gemini-think
Consolidate Gemini performance optimization tests into test_llm.py
2026-04-29 03:00:45 -04:00 · 2025-08-05 22:52:12 +02:00 · 2025-08-05 22:40:32 +02:00 · 2025-07-30 01:15:23 +02:00 · 2025-07-30 00:43:07 +02:00 · 2025-07-30 00:31:23 +02:00
1 changed files with 6 additions and 4 deletions
--- a/openhands/llm/llm.py
+++ b/openhands/llm/llm.py
@@ -212,13 +212,15 @@ class LLM(RetryMixin, DebugMixin):
                logger.debug(
                    f'Gemini model {self.config.model} with reasoning_effort {self.config.reasoning_effort} mapped to thinking {kwargs.get("thinking")}'
                )
+                kwargs['top_p'] = 1
+                # kwargs['temperature'] = 0

            else:
                kwargs['reasoning_effort'] = self.config.reasoning_effort
-            kwargs.pop(
-                'temperature'
-            )  # temperature is not supported for reasoning models
-            kwargs.pop('top_p')  # reasoning model like o3 doesn't support top_p
+                kwargs.pop(
+                    'temperature'
+                )  # temperature is not supported for reasoning models
+                kwargs.pop('top_p')  # reasoning model like o3 doesn't support top_p
        # Azure issue: https://github.com/All-Hands-AI/OpenHands/issues/6777
        if self.config.model.startswith('azure'):
            kwargs['max_tokens'] = self.config.max_output_tokens
Author	SHA1	Message	Date
Engel Nyst	b460d64dfc	gemini supports temp, top_p	2025-08-05 22:52:12 +02:00
Engel Nyst	7724df79bc	Merge branch 'main' of github.com:All-Hands-AI/OpenHands into gemini-think	2025-08-05 22:40:32 +02:00
Engel Nyst	bac55154e0	Consolidate Gemini performance optimization tests into test_llm.py - Move all 10 Gemini performance optimization tests from separate file to main test_llm.py - Tests cover LLMConfig reasoning_effort defaults and LLM thinking budget optimization - Remove duplicate test file to maintain single source of truth for LLM tests - All tests passing and integrated with existing test structure	2025-07-30 01:15:23 +02:00
Engel Nyst	d34e29eb08	Apply pre-commit formatting fixes - Fix trailing whitespace - Add missing newline at end of file - Apply ruff formatting for consistent code style	2025-07-30 00:43:07 +02:00
Engel Nyst	23221c11e6	Update openhands/llm/llm.py	2025-07-30 00:31:23 +02:00
Engel Nyst	b2afc984df	Refactor and fix Gemini performance optimization tests - Reduced test count from 18 to 10 by removing redundant tests - Fixed failing tests that expected old behavior for medium/high reasoning_effort - Updated tests to match PR #9913 behavior: medium/high reasoning_effort passes through to litellm - Consolidated similar test cases and removed unnecessary fixtures - All tests now pass and correctly validate the Gemini thinking budget optimization Co-authored-by: openhands <openhands@all-hands.dev>	2025-07-29 17:00:39 +02:00
Engel Nyst	9293e2c452	Remove debug logging statements from PR #9913 - Remove tool calls debug logging from conversation_memory.py - Remove LINE debug logging from cli_runtime.py - These are cleanup changes from PR #9913 Co-authored-by: openhands <openhands@all-hands.dev>	2025-07-29 16:51:38 +02:00
Engel Nyst	10d7f10fcf	Apply final changes from PR #9913 - Update llm.py to use allowed_openai_params and proper kwargs.pop() for reasoning_effort - Replace test_completion_with_two_positional_args with test_llm_gemini_thinking_parameter - Ensure Gemini thinking budget optimization works correctly with 128 token budget	2025-07-29 16:47:05 +02:00
Engel Nyst	e860359440	Apply Gemini performance optimizations from PR #9913 - Update reasoning_effort documentation to apply to all reasoning models - Add debug print statement for model tracking in LLM class Changes picked from upstream PR #9913 for Gemini 2.5 Pro performance improvements.	2025-07-27 05:32:34 +02:00
Engel Nyst	0f703ab77a	Add comprehensive unit tests for Gemini performance optimizations - Created 18 test cases covering LLMConfig reasoning_effort defaults and LLM thinking budget optimization - Tests verify that Gemini models use thinking budget (128 tokens) instead of reasoning_effort - Tests verify that non-Gemini models use reasoning_effort parameter - Tests cover various model variants and edge cases - All tests use proper mocking to avoid real API calls - Tests follow existing pytest patterns with unittest mocks	2025-07-27 05:29:38 +02:00
Engel Nyst	4eb442be3d	Remove debug print statement Clean up debugging code that was accidentally included in the PR.	2025-07-27 04:50:04 +02:00
Engel Nyst	0331106526	Update Gemini performance fixes to match latest PR 9913 changes - Use 'gemini-2.5-pro' in model check instead of exact equality - Add debug logging for reasoning_effort configuration - Handle 'none' reasoning_effort value in addition to None and 'low' - Force thinking budget for all Gemini reasoning_effort values (FIXME comment) - Set reasoning_effort=None when using thinking budget to avoid conflicts These changes match the latest implementation in PR 9913 which includes additional debugging and more comprehensive handling of reasoning_effort values.	2025-07-27 04:44:11 +02:00
Engel Nyst	9bc3663adf	Apply Gemini 2.5 Pro performance fixes from PR 9913 - Change default reasoning_effort from 'high' to None in llm_config.py - Add logic to set reasoning_effort to 'high' for non-Gemini models - Implement optimized thinking budget for Gemini 2.5 Pro (128 tokens) - Use thinking budget when reasoning_effort is None or 'low' for Gemini - Pass through other reasoning_effort values to API for Gemini - Add debug print for model name These changes achieve ~2.4x speedup for Gemini 2.5 Pro by using optimized thinking budget instead of full reasoning effort. Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>	2025-07-27 02:05:46 +02:00