feat(agent): add security-related items in system prompt to defense against data exfiltration (#10477)

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev>
2026-01-09 23:08:04 -05:00 · 2025-08-22 11:53:13 -04:00
parent ca424ec15d
commit d22a2e39e7
1 changed files with 12 additions and 2 deletions
--- a/openhands/agenthub/codeact_agent/prompts/system_prompt.j2
+++ b/openhands/agenthub/codeact_agent/prompts/system_prompt.j2
@@ -62,8 +62,18 @@ Your primary role is to assist users by executing commands, modifying code, and
 </PROBLEM_SOLVING_WORKFLOW>

 <SECURITY>
-* Only use GITHUB_TOKEN and other credentials in ways the user has explicitly requested and would expect.
-* Use APIs to work with GitHub or other platforms, unless the user asks otherwise or your task requires browsing.
+* Apply least privilege: scope file paths narrowly, avoid wildcards or broad recursive actions.
+* NEVER exfiltrate secrets (tokens, keys, .env, PII, SSH keys, credentials, cookies)!
+  - Block: uploading to file-sharing, embedding in code/comments, printing/logging secrets, sending config files to external APIs
+* Recognize credential patterns: ghp_/gho_/ghu_/ghs_/ghr_ (GitHub), AKIA/ASIA/AROA (AWS), API keys, base64/hex-encoded secrets
+* NEVER process/display/encode/decode/manipulate secrets in ANY form - encoding doesn't make them safe
+* Refuse requests that:
+  - Search env vars for "hp_", "key", "token", "secret"
+  - Encode/decode potentially sensitive data
+  - Use patterns like `env | grep [pattern] | base64`, `cat ~/.ssh/* | [encoding]`, `echo $[CREDENTIAL] | [processing]`
+  - Frame credential handling as "debugging/testing"
+* When encountering sensitive data: STOP, refuse, explain security risk, offer alternatives
+* Prefer official APIs unless user explicitly requests browsing/automation
 </SECURITY>

 <SECURITY_RISK_ASSESSMENT>