mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-01-29 08:58:07 -05:00
## Summary This PR extends the embedding system to support **blocks** and **documentation** content types in addition to store agents, and introduces **unified hybrid search** across all content types using a single `UnifiedContentEmbedding` table. ### Key Changes 1. **Unified Hybrid Search Architecture** - Added `search` tsvector column to `UnifiedContentEmbedding` table - New `unified_hybrid_search()` function searches across all content types (agents, blocks, docs) - Updated `hybrid_search()` for store agents to use `UnifiedContentEmbedding.search` - Removed deprecated `search` column from `StoreListingVersion` table 2. **Pluggable Content Handler Architecture** - Created abstract `ContentHandler` base class for extensibility - Implemented handlers: `StoreAgentHandler`, `BlockHandler`, `DocumentationHandler` - Registry pattern for easy addition of new content types 3. **Block Embeddings** - Discovers all blocks using `get_blocks()` - Extracts searchable text from: name, description, categories, input/output schemas 4. **Documentation Embeddings** - Scans `/docs/` directory for `.md` and `.mdx` files - Extracts title from first `#` heading or uses filename as fallback 5. **Hybrid Search Graceful Degradation** - Falls back to lexical-only search if query embedding generation fails - Redistributes semantic weight proportionally to other components - Logs warning instead of throwing error 6. **Database Migrations** - `20260115200000_add_unified_search_tsvector`: Adds search column to UnifiedContentEmbedding with auto-update trigger - `20260115210000_remove_storelistingversion_search`: Removes deprecated search column and updates StoreAgent view 7. **Orphan Cleanup** - `cleanup_orphaned_embeddings()` removes embeddings for deleted content - Always runs after backfill, even at 100% coverage ### Review Comments Addressed - ✅ SQL parameter index bug when user_id provided (embeddings.py) - ✅ Early return skipping cleanup at 100% coverage (scheduler.py) - ✅ Inconsistent return structure across code paths (scheduler.py) - ✅ SQL UNION syntax error - added parentheses for ORDER BY/LIMIT (hybrid_search.py) - ✅ Version numeric ordering in aggregations (migration) - ✅ Embedding dimension uses EMBEDDING_DIM constant ### Files Changed - `backend/api/features/store/content_handlers.py` (NEW): Handler architecture - `backend/api/features/store/embeddings.py`: Refactored to use handlers - `backend/api/features/store/hybrid_search.py`: Unified search + graceful degradation - `backend/executor/scheduler.py`: Process all content types, consistent returns - `migrations/20260115200000_add_unified_search_tsvector/`: Add tsvector to unified table - `migrations/20260115210000_remove_storelistingversion_search/`: Remove old search column - `schema.prisma`: Updated UnifiedContentEmbedding and StoreListingVersion models - `*_test.py`: Added tests for unified_hybrid_search ## Test Plan 1. ✅ All tests passing on Python 3.11, 3.12, 3.13 2. ✅ Types check passing 3. ✅ CodeRabbit and Sentry reviews addressed 4. Deploy to staging and verify: - Backfill job processes all content types - Search results include blocks and docs - Search works without OpenAI API (graceful degradation) 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Swifty <craigswift13@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
109 lines
3.8 KiB
Docker
109 lines
3.8 KiB
Docker
FROM debian:13-slim AS builder
|
|
|
|
# Set environment variables
|
|
ENV PYTHONDONTWRITEBYTECODE=1
|
|
ENV PYTHONUNBUFFERED=1
|
|
ENV DEBIAN_FRONTEND=noninteractive
|
|
|
|
WORKDIR /app
|
|
|
|
RUN echo 'Acquire::http::Pipeline-Depth 0;\nAcquire::http::No-Cache true;\nAcquire::BrokenProxy true;\n' > /etc/apt/apt.conf.d/99fixbadproxy
|
|
|
|
# Install Node.js repository key and setup
|
|
RUN apt-get update --allow-releaseinfo-change --fix-missing \
|
|
&& apt-get install -y curl ca-certificates gnupg \
|
|
&& mkdir -p /etc/apt/keyrings \
|
|
&& curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg \
|
|
&& echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_20.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list
|
|
|
|
# Update package list and install Python, Node.js, and build dependencies
|
|
RUN apt-get update \
|
|
&& apt-get install -y \
|
|
python3.13 \
|
|
python3.13-dev \
|
|
python3.13-venv \
|
|
python3-pip \
|
|
build-essential \
|
|
libpq5 \
|
|
libz-dev \
|
|
libssl-dev \
|
|
postgresql-client \
|
|
nodejs \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
ENV POETRY_HOME=/opt/poetry
|
|
ENV POETRY_NO_INTERACTION=1
|
|
ENV POETRY_VIRTUALENVS_CREATE=true
|
|
ENV POETRY_VIRTUALENVS_IN_PROJECT=true
|
|
ENV PATH=/opt/poetry/bin:$PATH
|
|
|
|
RUN pip3 install poetry --break-system-packages
|
|
|
|
# Copy and install dependencies
|
|
COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
|
|
COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml /app/autogpt_platform/backend/
|
|
WORKDIR /app/autogpt_platform/backend
|
|
RUN poetry install --no-ansi --no-root
|
|
|
|
# Generate Prisma client
|
|
COPY autogpt_platform/backend/schema.prisma ./
|
|
COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
|
|
COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
|
|
RUN poetry run prisma generate && poetry run gen-prisma-stub
|
|
|
|
FROM debian:13-slim AS server_dependencies
|
|
|
|
WORKDIR /app
|
|
|
|
ENV POETRY_HOME=/opt/poetry \
|
|
POETRY_NO_INTERACTION=1 \
|
|
POETRY_VIRTUALENVS_CREATE=true \
|
|
POETRY_VIRTUALENVS_IN_PROJECT=true \
|
|
DEBIAN_FRONTEND=noninteractive
|
|
ENV PATH=/opt/poetry/bin:$PATH
|
|
|
|
# Install Python without upgrading system-managed packages
|
|
RUN apt-get update && apt-get install -y \
|
|
python3.13 \
|
|
python3-pip \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# Copy only necessary files from builder
|
|
COPY --from=builder /app /app
|
|
COPY --from=builder /usr/local/lib/python3* /usr/local/lib/python3*
|
|
COPY --from=builder /usr/local/bin/poetry /usr/local/bin/poetry
|
|
# Copy Node.js installation for Prisma
|
|
COPY --from=builder /usr/bin/node /usr/bin/node
|
|
COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
|
|
COPY --from=builder /usr/bin/npm /usr/bin/npm
|
|
COPY --from=builder /usr/bin/npx /usr/bin/npx
|
|
COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
|
|
|
|
ENV PATH="/app/autogpt_platform/backend/.venv/bin:$PATH"
|
|
|
|
RUN mkdir -p /app/autogpt_platform/autogpt_libs
|
|
RUN mkdir -p /app/autogpt_platform/backend
|
|
|
|
COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
|
|
|
|
COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml /app/autogpt_platform/backend/
|
|
|
|
WORKDIR /app/autogpt_platform/backend
|
|
|
|
FROM server_dependencies AS migrate
|
|
|
|
# Migration stage only needs schema and migrations - much lighter than full backend
|
|
COPY autogpt_platform/backend/schema.prisma /app/autogpt_platform/backend/
|
|
COPY autogpt_platform/backend/backend/data/partial_types.py /app/autogpt_platform/backend/backend/data/partial_types.py
|
|
COPY autogpt_platform/backend/migrations /app/autogpt_platform/backend/migrations
|
|
|
|
FROM server_dependencies AS server
|
|
|
|
COPY autogpt_platform/backend /app/autogpt_platform/backend
|
|
COPY docs /app/docs
|
|
RUN poetry install --no-ansi --only-root
|
|
|
|
ENV PORT=8000
|
|
|
|
CMD ["poetry", "run", "rest"]
|