mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-01-30 17:38:17 -05:00
This PR adds hybrid search functionality combining semantic embeddings with traditional text search for improved store listing discovery. ### Changes 🏗️ - Add `embeddings.py` - OpenAI-based embedding generation and similarity search - Add `hybrid_search.py` - Combines vector similarity with text matching for better search results - Add `backfill_embeddings.py` - Script to generate embeddings for existing store listings - Update `db.py` - Integrate hybrid search into store database queries - Update `schema.prisma` - Add embedding storage fields and indexes - Add migrations for embedding columns and HNSW index for vector search ### Architecture Decisions 🏛️ **Fail-Fast Approach (No Silent Fallbacks)** We explicitly chose NOT to implement graceful degradation when hybrid search fails. Here's why: ✅ **Benefits:** - Errors surface immediately → faster fixes - Tests verify hybrid search actually works (not just fallback) - Consistent search quality for all users - Forces proper infrastructure setup (API keys, database) ❌ **Why Not Fallback:** - Silent degradation hides production issues - Users get inconsistent results without knowing why - Tests can pass even when hybrid search is broken - Reduces operational visibility **How We Prevent Failures:** 1. Embedding generation in approval flow (db.py:1545) 2. Error logging with `logger.error` (not warning) 3. Clear error messages (ValueError explains what's wrong) 4. Comprehensive test coverage (9/9 tests passing) If embeddings fail, it indicates a real infrastructure issue (missing API key, OpenAI down, database issues) that needs immediate attention, not silent degradation. ### Test Coverage ✅ **All tests passing (1625 total):** - 9/9 hybrid_search tests (including fail-fast validation) - 3/3 db search integration tests - Full schema compatibility (public/platform schemas) - Error handling verification ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Test hybrid search returns relevant results - [x] Test embedding generation for new listings - [x] Test backfill script on existing data - [x] Verify search performance with embeddings - [x] Test fail-fast behavior when embeddings unavailable #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] Configuration: Requires `openai_internal_api_key` in secrets --------- Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>