mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-01-28 16:38:17 -05:00
## Summary This PR extends the embedding system to support **blocks** and **documentation** content types in addition to store agents, and introduces **unified hybrid search** across all content types using a single `UnifiedContentEmbedding` table. ### Key Changes 1. **Unified Hybrid Search Architecture** - Added `search` tsvector column to `UnifiedContentEmbedding` table - New `unified_hybrid_search()` function searches across all content types (agents, blocks, docs) - Updated `hybrid_search()` for store agents to use `UnifiedContentEmbedding.search` - Removed deprecated `search` column from `StoreListingVersion` table 2. **Pluggable Content Handler Architecture** - Created abstract `ContentHandler` base class for extensibility - Implemented handlers: `StoreAgentHandler`, `BlockHandler`, `DocumentationHandler` - Registry pattern for easy addition of new content types 3. **Block Embeddings** - Discovers all blocks using `get_blocks()` - Extracts searchable text from: name, description, categories, input/output schemas 4. **Documentation Embeddings** - Scans `/docs/` directory for `.md` and `.mdx` files - Extracts title from first `#` heading or uses filename as fallback 5. **Hybrid Search Graceful Degradation** - Falls back to lexical-only search if query embedding generation fails - Redistributes semantic weight proportionally to other components - Logs warning instead of throwing error 6. **Database Migrations** - `20260115200000_add_unified_search_tsvector`: Adds search column to UnifiedContentEmbedding with auto-update trigger - `20260115210000_remove_storelistingversion_search`: Removes deprecated search column and updates StoreAgent view 7. **Orphan Cleanup** - `cleanup_orphaned_embeddings()` removes embeddings for deleted content - Always runs after backfill, even at 100% coverage ### Review Comments Addressed - ✅ SQL parameter index bug when user_id provided (embeddings.py) - ✅ Early return skipping cleanup at 100% coverage (scheduler.py) - ✅ Inconsistent return structure across code paths (scheduler.py) - ✅ SQL UNION syntax error - added parentheses for ORDER BY/LIMIT (hybrid_search.py) - ✅ Version numeric ordering in aggregations (migration) - ✅ Embedding dimension uses EMBEDDING_DIM constant ### Files Changed - `backend/api/features/store/content_handlers.py` (NEW): Handler architecture - `backend/api/features/store/embeddings.py`: Refactored to use handlers - `backend/api/features/store/hybrid_search.py`: Unified search + graceful degradation - `backend/executor/scheduler.py`: Process all content types, consistent returns - `migrations/20260115200000_add_unified_search_tsvector/`: Add tsvector to unified table - `migrations/20260115210000_remove_storelistingversion_search/`: Remove old search column - `schema.prisma`: Updated UnifiedContentEmbedding and StoreListingVersion models - `*_test.py`: Added tests for unified_hybrid_search ## Test Plan 1. ✅ All tests passing on Python 3.11, 3.12, 3.13 2. ✅ Types check passing 3. ✅ CodeRabbit and Sentry reviews addressed 4. Deploy to staging and verify: - Backfill job processes all content types - Search results include blocks and docs - Search works without OpenAI API (graceful degradation) 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Swifty <craigswift13@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
70 lines
1.9 KiB
Plaintext
70 lines
1.9 KiB
Plaintext
# Ignore everything by default, selectively add things to context
|
|
*
|
|
|
|
# Documentation (for embeddings/search)
|
|
!docs/
|
|
|
|
# Platform - Libs
|
|
!autogpt_platform/autogpt_libs/autogpt_libs/
|
|
!autogpt_platform/autogpt_libs/pyproject.toml
|
|
!autogpt_platform/autogpt_libs/poetry.lock
|
|
!autogpt_platform/autogpt_libs/README.md
|
|
|
|
# Platform - Backend
|
|
!autogpt_platform/backend/backend/
|
|
!autogpt_platform/backend/test/e2e_test_data.py
|
|
!autogpt_platform/backend/migrations/
|
|
!autogpt_platform/backend/schema.prisma
|
|
!autogpt_platform/backend/pyproject.toml
|
|
!autogpt_platform/backend/poetry.lock
|
|
!autogpt_platform/backend/README.md
|
|
!autogpt_platform/backend/.env
|
|
!autogpt_platform/backend/gen_prisma_types_stub.py
|
|
|
|
# Platform - Market
|
|
!autogpt_platform/market/market/
|
|
!autogpt_platform/market/scripts.py
|
|
!autogpt_platform/market/schema.prisma
|
|
!autogpt_platform/market/pyproject.toml
|
|
!autogpt_platform/market/poetry.lock
|
|
!autogpt_platform/market/README.md
|
|
|
|
# Platform - Frontend
|
|
!autogpt_platform/frontend/src/
|
|
!autogpt_platform/frontend/public/
|
|
!autogpt_platform/frontend/scripts/
|
|
!autogpt_platform/frontend/package.json
|
|
!autogpt_platform/frontend/pnpm-lock.yaml
|
|
!autogpt_platform/frontend/tsconfig.json
|
|
!autogpt_platform/frontend/README.md
|
|
## config
|
|
!autogpt_platform/frontend/*.config.*
|
|
!autogpt_platform/frontend/.env.*
|
|
!autogpt_platform/frontend/.env
|
|
|
|
# Classic - AutoGPT
|
|
!classic/original_autogpt/autogpt/
|
|
!classic/original_autogpt/pyproject.toml
|
|
!classic/original_autogpt/poetry.lock
|
|
!classic/original_autogpt/README.md
|
|
!classic/original_autogpt/tests/
|
|
|
|
# Classic - Benchmark
|
|
!classic/benchmark/agbenchmark/
|
|
!classic/benchmark/pyproject.toml
|
|
!classic/benchmark/poetry.lock
|
|
!classic/benchmark/README.md
|
|
|
|
# Classic - Forge
|
|
!classic/forge/
|
|
!classic/forge/pyproject.toml
|
|
!classic/forge/poetry.lock
|
|
!classic/forge/README.md
|
|
|
|
# Classic - Frontend
|
|
!classic/frontend/build/web/
|
|
|
|
# Explicitly re-ignore some folders
|
|
.*
|
|
**/__pycache__
|