Files
AutoGPT/autogpt_platform/backend/backend/data/analytics.py
Zamil Majdy f31c160043 feat(platform): add endedAt field and fix execution analytics timestamps (#11759)
## Summary

This PR adds proper execution end time tracking and fixes timestamp
handling throughout the execution analytics system.

### Key Changes

1. **Added `endedAt` field to database schema** - Executions now have a
dedicated field for tracking when they finish
2. **Fixed timestamp nullable handling** - `started_at` and `ended_at`
are now properly nullable in types
3. **Fixed chart aggregation** - Reduced threshold from ≥3 to ≥1
executions per day
4. **Improved timestamp display** - Moved timestamps to expandable
details section in analytics table
5. **Fixed nullable timestamp bugs** - Updated all frontend code to
handle null timestamps correctly

## Problem Statement

### Issue 1: Missing Execution End Times
Previously, executions used `updatedAt` (last DB update) as a proxy for
"end time". This broke when adding correctness scores retroactively -
the end time would change to whenever the score was added, not when the
execution actually finished.

### Issue 2: Chart Shows Only One Data Point
The accuracy trends chart showed only one data point despite having
executions across multiple days. Root cause: aggregation required ≥3
executions per day.

### Issue 3: Incorrect Type Definitions
Manually maintained types defined `started_at` and `ended_at` as
non-nullable `Date`, contradicting reality where QUEUED executions
haven't started yet.

## Solution

### Database Schema (`schema.prisma`)
```prisma
model AgentGraphExecution {
  // ...
  startedAt DateTime?
  endedAt   DateTime?  // NEW FIELD
  // ...
}
```

### Execution Lifecycle
- **QUEUED**: `startedAt = null`, `endedAt = null` (not started)
- **RUNNING**: `startedAt = set`, `endedAt = null` (in progress)  
- **COMPLETED/FAILED/TERMINATED**: `startedAt = set`, `endedAt = set`
(finished)

### Migration Strategy
```sql
-- Add endedAt column
ALTER TABLE "AgentGraphExecution" ADD COLUMN "endedAt" TIMESTAMP(3);

-- Backfill ONLY terminal executions (prevents marking RUNNING executions as ended)
UPDATE "AgentGraphExecution"
SET "endedAt" = "updatedAt"
WHERE "endedAt" IS NULL
  AND "executionStatus" IN ('COMPLETED', 'FAILED', 'TERMINATED');
```

## Changes by Component

### Backend

**`schema.prisma`**
- Added `endedAt` field to `AgentGraphExecution`

**`execution.py`**
- Made `started_at` and `ended_at` optional with Field descriptions
- Updated `from_db()` to use `endedAt` instead of `updatedAt`
- `update_graph_execution_stats()` sets `endedAt` when status becomes
terminal

**`execution_analytics_routes.py`**
- Removed `created_at`/`updated_at` from `ExecutionAnalyticsResult` (DB
metadata, not execution data)
- Kept only `started_at`/`ended_at` (actual execution runtime)
- Made settings global (avoid recreation)
- Moved OpenAI key validation to `_process_batch` (only check when LLM
actually runs)

**`analytics.py`**
- Fixed aggregation: `COUNT(*) >= 1` (was 3) - include all days with ≥1
execution
- Uses `createdAt` for chart grouping (when execution was queued)

**`late_execution_monitor.py`**
- Handle optional `started_at` with fallback to `datetime.min` for
sorting
- Display "Not started" when `started_at` is null

### Frontend

**Type Definitions**
- Fixed manually maintained `types.ts`: `started_at: Date | null` (was
non-nullable)
- Generated types were already correct

**Analytics Components**
- `AnalyticsResultsTable.tsx`: Show only `started_at`/`ended_at` in
2-column expandable grid
- `ExecutionAnalyticsForm.tsx`: Added filter explanation UI

**Monitoring Components** - Fixed null handling bugs:
- `OldAgentLibraryView.tsx`: Handle null in reduce function
- `agent-runs-selector-list.tsx`: Safe sorting with `?.getTime() ?? 0`
- `AgentFlowList.tsx`: Filter/sort with null checks
- `FlowRunsStatus.tsx`: Filter null timestamps
- `FlowRunsTimeline.tsx`: Filter executions with null timestamps before
rendering
- `monitoring/page.tsx`: Safe sorting
- `ActivityItem.tsx`: Fallback to "recently" for null timestamps

## Benefits

 **Accurate End Times**: `endedAt` is frozen when execution finishes,
not updated later
 **Type Safety**: Nullable types match reality, exposing real bugs  
 **Better UX**: Chart shows all days with data (not just days with ≥3
executions)
 **Bug Fixes**: 7+ frontend components now handle null timestamps
correctly
 **Documentation**: Field descriptions explain when timestamps are null

## Testing

### Backend
```bash
cd autogpt_platform/backend
poetry run format  #  All checks passed
poetry run lint    #  All checks passed
```

### Frontend  
```bash
cd autogpt_platform/frontend
pnpm format        #  All checks passed
pnpm lint          #  All checks passed
pnpm types         #  All type errors fixed
```

### Test Data Generation
Created script to generate 35 test executions across 7 days with
correctness scores:
```bash
poetry run python scripts/generate_test_analytics_data.py
```

## Migration Notes

⚠️ **Important**: The migration only backfills `endedAt` for executions
with terminal status (COMPLETED, FAILED, TERMINATED). Active executions
(QUEUED, RUNNING) correctly keep `endedAt = null`.

## Breaking Changes

None - this is backward compatible:
- `endedAt` is nullable, existing code that doesn't use it is unaffected
- Frontend already used generated types which were correct
- Migration safely backfills historical data

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduces explicit execution end-time tracking and normalizes
timestamp handling across backend and frontend.
> 
> - Adds `endedAt` to `AgentGraphExecution` (schema + migration);
backfills terminal executions; sets `endedAt` on terminal status updates
> - Makes `GraphExecutionMeta.started_at/ended_at` optional; updates
`from_db()` to use DB `endedAt`; exposes timestamps in
`ExecutionAnalyticsResult`
> - Moves OpenAI key validation into batch processing; instantiates
`Settings` once
> - Accuracy trends: reduce daily aggregation threshold to `>= 1`;
optional historical series
> - Monitoring/analytics UI: results table shows/export
`started_at`/`ended_at`; adds chart filter explainer
> - Frontend null-safety: update types (`Date | null`) and fix
sorting/filtering/rendering for nullable timestamps across monitoring
and library views
> - Late execution monitor: safe sorting/display when `started_at` is
null
> - OpenAPI specs updated for new/nullable fields
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
1d987ca6e5. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-01-16 21:44:24 +00:00

293 lines
8.9 KiB
Python

import logging
from datetime import datetime, timedelta, timezone
from typing import Optional
import prisma.types
from pydantic import BaseModel
from backend.data.db import query_raw_with_schema
from backend.util.json import SafeJson
logger = logging.getLogger(__name__)
class AccuracyAlertData(BaseModel):
"""Alert data when accuracy drops significantly."""
graph_id: str
user_id: Optional[str]
drop_percent: float
three_day_avg: float
seven_day_avg: float
detected_at: datetime
class AccuracyLatestData(BaseModel):
"""Latest execution accuracy data point."""
date: datetime
daily_score: Optional[float]
three_day_avg: Optional[float]
seven_day_avg: Optional[float]
fourteen_day_avg: Optional[float]
class AccuracyTrendsResponse(BaseModel):
"""Response model for accuracy trends and alerts."""
latest_data: AccuracyLatestData
alert: Optional[AccuracyAlertData]
historical_data: Optional[list[AccuracyLatestData]] = None
async def log_raw_analytics(
user_id: str,
type: str,
data: dict,
data_index: str,
):
details = await prisma.models.AnalyticsDetails.prisma().create(
data=prisma.types.AnalyticsDetailsCreateInput(
userId=user_id,
type=type,
data=SafeJson(data),
dataIndex=data_index,
)
)
return details
async def log_raw_metric(
user_id: str,
metric_name: str,
metric_value: float,
data_string: str,
):
if metric_value < 0:
raise ValueError("metric_value must be non-negative")
result = await prisma.models.AnalyticsMetrics.prisma().create(
data=prisma.types.AnalyticsMetricsCreateInput(
value=metric_value,
analyticMetric=metric_name,
userId=user_id,
dataString=data_string,
)
)
return result
async def get_accuracy_trends_and_alerts(
graph_id: str,
days_back: int = 30,
user_id: Optional[str] = None,
drop_threshold: float = 10.0,
include_historical: bool = False,
) -> AccuracyTrendsResponse:
"""Get accuracy trends and detect alerts for a specific graph."""
query_template = """
WITH daily_scores AS (
SELECT
DATE(e."createdAt") as execution_date,
AVG(CASE
WHEN e.stats IS NOT NULL
AND e.stats::json->>'correctness_score' IS NOT NULL
AND e.stats::json->>'correctness_score' != 'null'
THEN (e.stats::json->>'correctness_score')::float * 100
ELSE NULL
END) as daily_score
FROM {schema_prefix}"AgentGraphExecution" e
WHERE e."agentGraphId" = $1::text
AND e."isDeleted" = false
AND e."createdAt" >= $2::timestamp
AND e."executionStatus" IN ('COMPLETED', 'FAILED', 'TERMINATED')
{user_filter}
GROUP BY DATE(e."createdAt")
HAVING COUNT(*) >= 1 -- Include all days with at least 1 execution
),
trends AS (
SELECT
execution_date,
daily_score,
AVG(daily_score) OVER (
ORDER BY execution_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as three_day_avg,
AVG(daily_score) OVER (
ORDER BY execution_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as seven_day_avg,
AVG(daily_score) OVER (
ORDER BY execution_date
ROWS BETWEEN 13 PRECEDING AND CURRENT ROW
) as fourteen_day_avg
FROM daily_scores
)
SELECT *,
CASE
WHEN three_day_avg IS NOT NULL AND seven_day_avg IS NOT NULL AND seven_day_avg > 0
THEN ((seven_day_avg - three_day_avg) / seven_day_avg * 100)
ELSE NULL
END as drop_percent
FROM trends
ORDER BY execution_date DESC
{limit_clause}
"""
start_date = datetime.now(timezone.utc) - timedelta(days=days_back)
params = [graph_id, start_date]
user_filter = ""
if user_id:
user_filter = 'AND e."userId" = $3::text'
params.append(user_id)
# Determine limit clause
limit_clause = "" if include_historical else "LIMIT 1"
final_query = query_template.format(
schema_prefix="{schema_prefix}",
user_filter=user_filter,
limit_clause=limit_clause,
)
result = await query_raw_with_schema(final_query, *params)
if not result:
return AccuracyTrendsResponse(
latest_data=AccuracyLatestData(
date=datetime.now(timezone.utc),
daily_score=None,
three_day_avg=None,
seven_day_avg=None,
fourteen_day_avg=None,
),
alert=None,
)
latest = result[0]
alert = None
if (
latest["drop_percent"] is not None
and latest["drop_percent"] >= drop_threshold
and latest["three_day_avg"] is not None
and latest["seven_day_avg"] is not None
):
alert = AccuracyAlertData(
graph_id=graph_id,
user_id=user_id,
drop_percent=float(latest["drop_percent"]),
three_day_avg=float(latest["three_day_avg"]),
seven_day_avg=float(latest["seven_day_avg"]),
detected_at=datetime.now(timezone.utc),
)
# Prepare historical data if requested
historical_data = None
if include_historical:
historical_data = []
for row in result:
historical_data.append(
AccuracyLatestData(
date=row["execution_date"],
daily_score=(
float(row["daily_score"])
if row["daily_score"] is not None
else None
),
three_day_avg=(
float(row["three_day_avg"])
if row["three_day_avg"] is not None
else None
),
seven_day_avg=(
float(row["seven_day_avg"])
if row["seven_day_avg"] is not None
else None
),
fourteen_day_avg=(
float(row["fourteen_day_avg"])
if row["fourteen_day_avg"] is not None
else None
),
)
)
return AccuracyTrendsResponse(
latest_data=AccuracyLatestData(
date=latest["execution_date"],
daily_score=(
float(latest["daily_score"])
if latest["daily_score"] is not None
else None
),
three_day_avg=(
float(latest["three_day_avg"])
if latest["three_day_avg"] is not None
else None
),
seven_day_avg=(
float(latest["seven_day_avg"])
if latest["seven_day_avg"] is not None
else None
),
fourteen_day_avg=(
float(latest["fourteen_day_avg"])
if latest["fourteen_day_avg"] is not None
else None
),
),
alert=alert,
historical_data=historical_data,
)
class MarketplaceGraphData(BaseModel):
"""Data structure for marketplace graph monitoring."""
graph_id: str
user_id: Optional[str]
execution_count: int
async def get_marketplace_graphs_for_monitoring(
days_back: int = 30,
min_executions: int = 10,
) -> list[MarketplaceGraphData]:
"""Get published marketplace graphs with recent executions for monitoring."""
query_template = """
WITH marketplace_graphs AS (
SELECT DISTINCT
slv."agentGraphId" as graph_id,
slv."agentGraphVersion" as graph_version
FROM {schema_prefix}"StoreListing" sl
JOIN {schema_prefix}"StoreListingVersion" slv ON sl."activeVersionId" = slv."id"
WHERE sl."hasApprovedVersion" = true
AND sl."isDeleted" = false
)
SELECT DISTINCT
mg.graph_id,
NULL as user_id, -- Marketplace graphs don't have a specific user_id for monitoring
COUNT(*) as execution_count
FROM marketplace_graphs mg
JOIN {schema_prefix}"AgentGraphExecution" e ON e."agentGraphId" = mg.graph_id
WHERE e."createdAt" >= $1::timestamp
AND e."isDeleted" = false
AND e."executionStatus" IN ('COMPLETED', 'FAILED', 'TERMINATED')
GROUP BY mg.graph_id
HAVING COUNT(*) >= $2
ORDER BY execution_count DESC
"""
start_date = datetime.now(timezone.utc) - timedelta(days=days_back)
result = await query_raw_with_schema(query_template, start_date, min_executions)
return [
MarketplaceGraphData(
graph_id=row["graph_id"],
user_id=row["user_id"],
execution_count=int(row["execution_count"]),
)
for row in result
]