fix: cast OnboardingStep enum to text in funnel view

The completedSteps column is a platform."OnboardingStep" enum array. UNNEST produces enum values that can't be compared directly to text from the VALUES clause. Adding ::text cast fixes the type mismatch.
fix(analytics): address second batch of PR review comments
2026-03-17 03:00:27 -04:00 · 2026-03-13 11:55:37 +00:00 · 2026-03-12 00:47:55 +07:00 · 2026-03-12 00:01:40 +07:00 · 2026-03-11 23:48:42 +07:00 · 2026-03-11 23:39:14 +07:00
16 changed files with 1448 additions and 0 deletions
--- a/autogpt_platform/analytics/queries/auth_activities.sql
+++ b/autogpt_platform/analytics/queries/auth_activities.sql
@@ -0,0 +1,40 @@
 -- =============================================================
 -- View: analytics.auth_activities
 -- Looker source alias: ds49  |  Charts: 1
 -- =============================================================
 -- DESCRIPTION
 --   Tracks authentication events (login, logout, SSO, password
 --   reset, etc.) from Supabase's internal audit log.
 --   Useful for monitoring sign-in patterns and detecting anomalies.
 --
 -- SOURCE TABLES
 --   auth.audit_log_entries  — Supabase internal auth event log
 --
 -- OUTPUT COLUMNS
 --   created_at      TIMESTAMPTZ  When the auth event occurred
 --   actor_id        TEXT         User ID who triggered the event
 --   actor_via_sso   TEXT         Whether the action was via SSO ('true'/'false')
 --   action          TEXT         Event type (e.g. 'login', 'logout', 'token_refreshed')
 --
 -- WINDOW
 --   Rolling 90 days from current date
 --
 -- EXAMPLE QUERIES
 --   -- Daily login counts
 --   SELECT DATE_TRUNC('day', created_at) AS day, COUNT(*) AS logins
 --   FROM analytics.auth_activities
 --   WHERE action = 'login'
 --   GROUP BY 1 ORDER BY 1;
 --
 --   -- SSO vs password login breakdown
 --   SELECT actor_via_sso, COUNT(*) FROM analytics.auth_activities
 --   WHERE action = 'login' GROUP BY 1;
 -- =============================================================
 SELECT
    created_at,
    payload->>'actor_id'      AS actor_id,
    payload->>'actor_via_sso' AS actor_via_sso,
    payload->>'action'        AS action
 FROM auth.audit_log_entries
 WHERE created_at >= NOW() - INTERVAL '90 days'
--- a/autogpt_platform/analytics/queries/graph_execution.sql
+++ b/autogpt_platform/analytics/queries/graph_execution.sql
@@ -0,0 +1,105 @@
 -- =============================================================
 -- View: analytics.graph_execution
 -- Looker source alias: ds16  |  Charts: 21
 -- =============================================================
 -- DESCRIPTION
 --   One row per agent graph execution (last 90 days).
 --   Unpacks the JSONB stats column into individual numeric columns
 --   and normalises the executionStatus — runs that failed due to
 --   insufficient credits are reclassified as 'NO_CREDITS' for
 --   easier filtering.  Error messages are scrubbed of IDs and URLs
 --   to allow safe grouping.
 --
 -- SOURCE TABLES
 --   platform.AgentGraphExecution  — Execution records
 --   platform.AgentGraph           — Agent graph metadata (for name)
 --   platform.LibraryAgent         — To flag possibly-AI (safe-mode) agents
 --
 -- OUTPUT COLUMNS
 --   id                TEXT         Execution UUID
 --   agentGraphId      TEXT         Agent graph UUID
 --   agentGraphVersion INT          Graph version number
 --   executionStatus   TEXT         COMPLETED | FAILED | NO_CREDITS | RUNNING | QUEUED | TERMINATED
 --   createdAt         TIMESTAMPTZ  When the execution was queued
 --   updatedAt         TIMESTAMPTZ  Last status update time
 --   userId            TEXT         Owner user UUID
 --   agentGraphName    TEXT         Human-readable agent name
 --   cputime           DECIMAL      Total CPU seconds consumed
 --   walltime          DECIMAL      Total wall-clock seconds
 --   node_count        DECIMAL      Number of nodes in the graph
 --   nodes_cputime     DECIMAL      CPU time across all nodes
 --   nodes_walltime    DECIMAL      Wall time across all nodes
 --   execution_cost    DECIMAL      Credit cost of this execution
 --   correctness_score FLOAT        AI correctness score (if available)
 --   possibly_ai       BOOLEAN      True if agent has sensitive_action_safe_mode enabled
 --   groupedErrorMessage TEXT       Scrubbed error string (IDs/URLs replaced with wildcards)
 --
 -- WINDOW
 --   Rolling 90 days (createdAt > CURRENT_DATE - 90 days)
 --
 -- EXAMPLE QUERIES
 --   -- Daily execution counts by status
 --   SELECT DATE_TRUNC('day', "createdAt") AS day, "executionStatus", COUNT(*)
 --   FROM analytics.graph_execution
 --   GROUP BY 1, 2 ORDER BY 1;
 --
 --   -- Average cost per execution by agent
 --   SELECT "agentGraphName", AVG("execution_cost") AS avg_cost, COUNT(*) AS runs
 --   FROM analytics.graph_execution
 --   WHERE "executionStatus" = 'COMPLETED'
 --   GROUP BY 1 ORDER BY avg_cost DESC;
 --
 --   -- Top error messages
 --   SELECT "groupedErrorMessage", COUNT(*) AS occurrences
 --   FROM analytics.graph_execution
 --   WHERE "executionStatus" = 'FAILED'
 --   GROUP BY 1 ORDER BY 2 DESC LIMIT 20;
 -- =============================================================
 SELECT
    ge."id"                                                        AS id,
    ge."agentGraphId"                                              AS agentGraphId,
    ge."agentGraphVersion"                                         AS agentGraphVersion,
    CASE
        WHEN jsonb_exists(ge."stats"::jsonb, 'error')
         AND (
               (ge."stats"::jsonb->>'error') ILIKE '%insufficient balance%'
            OR (ge."stats"::jsonb->>'error') ILIKE '%you have no credits left%'
             )
        THEN 'NO_CREDITS'
        ELSE CAST(ge."executionStatus" AS TEXT)
    END                                                            AS executionStatus,
    ge."createdAt"                                                 AS createdAt,
    ge."updatedAt"                                                 AS updatedAt,
    ge."userId"                                                    AS userId,
    g."name"                                                       AS agentGraphName,
    (ge."stats"::jsonb->>'cputime')::decimal                       AS cputime,
    (ge."stats"::jsonb->>'walltime')::decimal                      AS walltime,
    (ge."stats"::jsonb->>'node_count')::decimal                    AS node_count,
    (ge."stats"::jsonb->>'nodes_cputime')::decimal                 AS nodes_cputime,
    (ge."stats"::jsonb->>'nodes_walltime')::decimal                AS nodes_walltime,
    (ge."stats"::jsonb->>'cost')::decimal                          AS execution_cost,
    (ge."stats"::jsonb->>'correctness_score')::float               AS correctness_score,
    COALESCE(la.possibly_ai, FALSE)                                AS possibly_ai,
    REGEXP_REPLACE(
        REGEXP_REPLACE(
            TRIM(BOTH '"' FROM ge."stats"::jsonb->>'error'),
            '(https?://)([A-Za-z0-9.-]+)(:[0-9]+)?(/[^\s]*)?',
            '\1\2/...', 'gi'
        ),
        '[a-zA-Z0-9_:-]*\d[a-zA-Z0-9_:-]*', '*', 'g'
    )                                                              AS groupedErrorMessage
 FROM platform."AgentGraphExecution" ge
 LEFT JOIN platform."AgentGraph" g
       ON ge."agentGraphId" = g."id"
      AND ge."agentGraphVersion" = g."version"
 LEFT JOIN (
    SELECT DISTINCT ON ("userId", "agentGraphId")
           "userId", "agentGraphId",
           ("settings"::jsonb->>'sensitive_action_safe_mode')::boolean AS possibly_ai
    FROM platform."LibraryAgent"
    WHERE "isDeleted"  = FALSE
      AND "isArchived" = FALSE
    ORDER BY "userId", "agentGraphId", "agentGraphVersion" DESC
 ) la ON la."userId" = ge."userId" AND la."agentGraphId" = ge."agentGraphId"
 WHERE ge."createdAt" > CURRENT_DATE - INTERVAL '90 days'
--- a/autogpt_platform/analytics/queries/node_block_execution.sql
+++ b/autogpt_platform/analytics/queries/node_block_execution.sql
@@ -0,0 +1,101 @@
 -- =============================================================
 -- View: analytics.node_block_execution
 -- Looker source alias: ds14  |  Charts: 11
 -- =============================================================
 -- DESCRIPTION
 --   One row per node (block) execution (last 90 days).
 --   Unpacks stats JSONB and joins to identify which block type
 --   was run.  For failed nodes, joins the error output and
 --   scrubs it for safe grouping.
 --
 -- SOURCE TABLES
 --   platform.AgentNodeExecution              — Node execution records
 --   platform.AgentNode                       — Node → block mapping
 --   platform.AgentBlock                      — Block name/ID
 --   platform.AgentNodeExecutionInputOutput   — Error output values
 --
 -- OUTPUT COLUMNS
 --   id                    TEXT         Node execution UUID
 --   agentGraphExecutionId TEXT         Parent graph execution UUID
 --   agentNodeId           TEXT         Node UUID within the graph
 --   executionStatus       TEXT         COMPLETED | FAILED | QUEUED | RUNNING | TERMINATED
 --   addedTime             TIMESTAMPTZ  When the node was queued
 --   queuedTime            TIMESTAMPTZ  When it entered the queue
 --   startedTime           TIMESTAMPTZ  When execution started
 --   endedTime             TIMESTAMPTZ  When execution finished
 --   inputSize             BIGINT       Input payload size in bytes
 --   outputSize            BIGINT       Output payload size in bytes
 --   walltime              NUMERIC      Wall-clock seconds for this node
 --   cputime               NUMERIC      CPU seconds for this node
 --   llmRetryCount         INT          Number of LLM retries
 --   llmCallCount          INT          Number of LLM API calls made
 --   inputTokenCount       BIGINT       LLM input tokens consumed
 --   outputTokenCount      BIGINT       LLM output tokens produced
 --   blockName             TEXT         Human-readable block name (e.g. 'OpenAIBlock')
 --   blockId               TEXT         Block UUID
 --   groupedErrorMessage   TEXT         Scrubbed error (IDs/URLs wildcarded)
 --   errorMessage          TEXT         Raw error output (only set when FAILED)
 --
 -- WINDOW
 --   Rolling 90 days (addedTime > CURRENT_DATE - 90 days)
 --
 -- EXAMPLE QUERIES
 --   -- Most-used blocks by execution count
 --   SELECT "blockName", COUNT(*) AS executions,
 --          COUNT(*) FILTER (WHERE "executionStatus"='FAILED') AS failures
 --   FROM analytics.node_block_execution
 --   GROUP BY 1 ORDER BY executions DESC LIMIT 20;
 --
 --   -- Average LLM token usage per block
 --   SELECT "blockName",
 --          AVG("inputTokenCount") AS avg_input_tokens,
 --          AVG("outputTokenCount") AS avg_output_tokens
 --   FROM analytics.node_block_execution
 --   WHERE "llmCallCount" > 0
 --   GROUP BY 1 ORDER BY avg_input_tokens DESC;
 --
 --   -- Top failure reasons
 --   SELECT "blockName", "groupedErrorMessage", COUNT(*) AS count
 --   FROM analytics.node_block_execution
 --   WHERE "executionStatus" = 'FAILED'
 --   GROUP BY 1, 2 ORDER BY count DESC LIMIT 20;
 -- =============================================================
 SELECT
    ne."id"                                                            AS id,
    ne."agentGraphExecutionId"                                         AS agentGraphExecutionId,
    ne."agentNodeId"                                                   AS agentNodeId,
    CAST(ne."executionStatus" AS TEXT)                                 AS executionStatus,
    ne."addedTime"                                                     AS addedTime,
    ne."queuedTime"                                                    AS queuedTime,
    ne."startedTime"                                                   AS startedTime,
    ne."endedTime"                                                     AS endedTime,
    (ne."stats"::jsonb->>'input_size')::bigint                         AS inputSize,
    (ne."stats"::jsonb->>'output_size')::bigint                        AS outputSize,
    (ne."stats"::jsonb->>'walltime')::numeric                          AS walltime,
    (ne."stats"::jsonb->>'cputime')::numeric                           AS cputime,
    (ne."stats"::jsonb->>'llm_retry_count')::int                       AS llmRetryCount,
    (ne."stats"::jsonb->>'llm_call_count')::int                        AS llmCallCount,
    (ne."stats"::jsonb->>'input_token_count')::bigint                  AS inputTokenCount,
    (ne."stats"::jsonb->>'output_token_count')::bigint                 AS outputTokenCount,
    b."name"                                                           AS blockName,
    b."id"                                                             AS blockId,
    REGEXP_REPLACE(
        REGEXP_REPLACE(
            TRIM(BOTH '"' FROM eio."data"::text),
            '(https?://)([A-Za-z0-9.-]+)(:[0-9]+)?(/[^\s]*)?',
            '\1\2/...', 'gi'
        ),
        '[a-zA-Z0-9_:-]*\d[a-zA-Z0-9_:-]*', '*', 'g'
    )                                                                  AS groupedErrorMessage,
    eio."data"                                                         AS errorMessage
 FROM platform."AgentNodeExecution" ne
 LEFT JOIN platform."AgentNode" nd
       ON ne."agentNodeId" = nd."id"
 LEFT JOIN platform."AgentBlock" b
       ON nd."agentBlockId" = b."id"
 LEFT JOIN platform."AgentNodeExecutionInputOutput" eio
       ON eio."referencedByOutputExecId" = ne."id"
      AND eio."name" = 'error'
      AND ne."executionStatus" = 'FAILED'
 WHERE ne."addedTime" > CURRENT_DATE - INTERVAL '90 days'
--- a/autogpt_platform/analytics/queries/retention_agent.sql
+++ b/autogpt_platform/analytics/queries/retention_agent.sql
@@ -0,0 +1,97 @@
 -- =============================================================
 -- View: analytics.retention_agent
 -- Looker source alias: ds35  |  Charts: 2
 -- =============================================================
 -- DESCRIPTION
 --   Weekly cohort retention broken down per individual agent.
 --   Cohort = week of a user's first use of THAT specific agent.
 --   Tells you which agents keep users coming back vs. one-shot
 --   use. Only includes cohorts from the last 180 days.
 --
 -- SOURCE TABLES
 --   platform.AgentGraphExecution  — Execution records (user × agent × time)
 --   platform.AgentGraph           — Agent names
 --
 -- OUTPUT COLUMNS
 --   agent_id            TEXT   Agent graph UUID
 --   agent_label         TEXT   'AgentName [first8chars]'
 --   agent_label_n       TEXT   'AgentName [first8chars] (n=total_users)'
 --   cohort_week_start   DATE   Week users first ran this agent
 --   cohort_label        TEXT   ISO week label
 --   cohort_label_n      TEXT   ISO week label with cohort size
 --   user_lifetime_week  INT    Weeks since first use of this agent
 --   cohort_users        BIGINT Users in this cohort for this agent
 --   active_users        BIGINT Users who ran the agent again in week k
 --   retention_rate      FLOAT  active_users / cohort_users
 --   cohort_users_w0     BIGINT cohort_users only at week 0 (safe to SUM)
 --   agent_total_users   BIGINT Total users across all cohorts for this agent
 --
 -- EXAMPLE QUERIES
 --   -- Best-retained agents at week 2
 --   SELECT agent_label, AVG(retention_rate) AS w2_retention
 --   FROM analytics.retention_agent
 --   WHERE user_lifetime_week = 2 AND cohort_users >= 10
 --   GROUP BY 1 ORDER BY w2_retention DESC LIMIT 10;
 --
 --   -- Agents with most unique users
 --   SELECT DISTINCT agent_label, agent_total_users
 --   FROM analytics.retention_agent
 --   ORDER BY agent_total_users DESC LIMIT 20;
 -- =============================================================
 WITH params AS (SELECT 12::int AS max_weeks, (CURRENT_DATE - INTERVAL '180 days') AS cohort_start),
 events AS (
  SELECT e."userId"::text AS user_id, e."agentGraphId" AS agent_id,
         e."createdAt"::timestamptz AS created_at,
         DATE_TRUNC('week', e."createdAt")::date AS week_start
  FROM platform."AgentGraphExecution" e
 ),
 first_use AS (
  SELECT user_id, agent_id, MIN(created_at) AS first_use_at,
         DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
  FROM events GROUP BY 1,2
  HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
 ),
 activity_weeks AS (SELECT DISTINCT user_id, agent_id, week_start FROM events),
 user_week_age AS (
  SELECT aw.user_id, aw.agent_id, fu.cohort_week_start,
         ((aw.week_start - DATE_TRUNC('week',fu.first_use_at)::date)/7)::int AS user_lifetime_week
  FROM activity_weeks aw JOIN first_use fu USING (user_id, agent_id)
  WHERE aw.week_start >= DATE_TRUNC('week',fu.first_use_at)::date
 ),
 active_counts AS (
  SELECT agent_id, cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users
  FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2,3
 ),
 cohort_sizes AS (
  SELECT agent_id, cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_use GROUP BY 1,2
 ),
 cohort_caps AS (
  SELECT cs.agent_id, cs.cohort_week_start, cs.cohort_users,
         LEAST((SELECT max_weeks FROM params),
               GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
  FROM cohort_sizes cs
 ),
 grid AS (
  SELECT cc.agent_id, cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
 ),
 agent_names AS (SELECT DISTINCT ON (g."id") g."id" AS agent_id, g."name" AS agent_name FROM platform."AgentGraph" g ORDER BY g."id", g."version" DESC),
 agent_total_users AS (SELECT agent_id, SUM(cohort_users) AS agent_total_users FROM cohort_sizes GROUP BY 1)
 SELECT
  g.agent_id,
  COALESCE(an.agent_name,'(unnamed)')||' ['||LEFT(g.agent_id::text,8)||']'  AS agent_label,
  COALESCE(an.agent_name,'(unnamed)')||' ['||LEFT(g.agent_id::text,8)||'] (n='||COALESCE(atu.agent_total_users,0)||')' AS agent_label_n,
  g.cohort_week_start,
  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')                               AS cohort_label,
  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')'  AS cohort_label_n,
  g.user_lifetime_week, g.cohort_users,
  COALESCE(ac.active_users,0)                                              AS active_users,
  COALESCE(ac.active_users,0)::float / NULLIF(g.cohort_users,0)           AS retention_rate,
  CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END         AS cohort_users_w0,
  COALESCE(atu.agent_total_users,0)                                        AS agent_total_users
 FROM grid g
 LEFT JOIN active_counts     ac  ON ac.agent_id=g.agent_id AND ac.cohort_week_start=g.cohort_week_start AND ac.user_lifetime_week=g.user_lifetime_week
 LEFT JOIN agent_names       an  ON an.agent_id=g.agent_id
 LEFT JOIN agent_total_users atu ON atu.agent_id=g.agent_id
 ORDER BY agent_label, g.cohort_week_start, g.user_lifetime_week;
--- a/autogpt_platform/analytics/queries/retention_execution_daily.sql
+++ b/autogpt_platform/analytics/queries/retention_execution_daily.sql
@@ -0,0 +1,81 @@
 -- =============================================================
 -- View: analytics.retention_execution_daily
 -- Looker source alias: ds111  |  Charts: 1
 -- =============================================================
 -- DESCRIPTION
 --   Daily cohort retention based on agent executions.
 --   Cohort anchor = day of user's FIRST ever execution.
 --   Only includes cohorts from the last 90 days, up to day 30.
 --   Great for early engagement analysis (did users run another
 --   agent the next day?).
 --
 -- SOURCE TABLES
 --   platform.AgentGraphExecution  — Execution records
 --
 -- OUTPUT COLUMNS
 --   Same pattern as retention_login_daily.
 --   cohort_day_start = day of first execution (not first login)
 --
 -- EXAMPLE QUERIES
 --   -- Day-3 execution retention
 --   SELECT cohort_label, retention_rate_bounded AS d3_retention
 --   FROM analytics.retention_execution_daily
 --   WHERE user_lifetime_day = 3 ORDER BY cohort_day_start;
 -- =============================================================
 WITH params AS (SELECT 30::int AS max_days, (CURRENT_DATE - INTERVAL '90 days') AS cohort_start),
 events AS (
  SELECT e."userId"::text AS user_id, e."createdAt"::timestamptz AS created_at,
         DATE_TRUNC('day', e."createdAt")::date AS day_start
  FROM platform."AgentGraphExecution" e WHERE e."userId" IS NOT NULL
 ),
 first_exec AS (
  SELECT user_id, MIN(created_at) AS first_exec_at,
         DATE_TRUNC('day', MIN(created_at))::date AS cohort_day_start
  FROM events GROUP BY 1
  HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
 ),
 activity_days AS (SELECT DISTINCT user_id, day_start FROM events),
 user_day_age AS (
  SELECT ad.user_id, fe.cohort_day_start,
         (ad.day_start - DATE_TRUNC('day',fe.first_exec_at)::date)::int AS user_lifetime_day
  FROM activity_days ad JOIN first_exec fe USING (user_id)
  WHERE ad.day_start >= DATE_TRUNC('day',fe.first_exec_at)::date
 ),
 bounded_counts AS (
  SELECT cohort_day_start, user_lifetime_day, COUNT(DISTINCT user_id) AS active_users_bounded
  FROM user_day_age WHERE user_lifetime_day >= 0 GROUP BY 1,2
 ),
 last_active AS (
  SELECT cohort_day_start, user_id, MAX(user_lifetime_day) AS last_active_day FROM user_day_age GROUP BY 1,2
 ),
 unbounded_counts AS (
  SELECT la.cohort_day_start, gs AS user_lifetime_day, COUNT(*) AS retained_users_unbounded
  FROM last_active la
  CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_day,(SELECT max_days FROM params))) gs
  GROUP BY 1,2
 ),
 cohort_sizes AS (SELECT cohort_day_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_exec GROUP BY 1),
 cohort_caps AS (
  SELECT cs.cohort_day_start, cs.cohort_users,
         LEAST((SELECT max_days FROM params), GREATEST(0,(CURRENT_DATE-cs.cohort_day_start)::int)) AS cap_days
  FROM cohort_sizes cs
 ),
 grid AS (
  SELECT cc.cohort_day_start, gs AS user_lifetime_day, cc.cohort_users
  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_days) gs
 )
 SELECT
  g.cohort_day_start,
  TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')                                AS cohort_label,
  TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')||' (n='||g.cohort_users||')'   AS cohort_label_n,
  g.user_lifetime_day, g.cohort_users,
  COALESCE(b.active_users_bounded,0)     AS active_users_bounded,
  COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
  CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END    AS retention_rate_bounded,
  CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
  CASE WHEN g.user_lifetime_day=0 THEN g.cohort_users ELSE 0 END          AS cohort_users_d0
 FROM grid g
 LEFT JOIN bounded_counts   b ON b.cohort_day_start=g.cohort_day_start AND b.user_lifetime_day=g.user_lifetime_day
 LEFT JOIN unbounded_counts u ON u.cohort_day_start=g.cohort_day_start AND u.user_lifetime_day=g.user_lifetime_day
 ORDER BY g.cohort_day_start, g.user_lifetime_day;
--- a/autogpt_platform/analytics/queries/retention_execution_weekly.sql
+++ b/autogpt_platform/analytics/queries/retention_execution_weekly.sql
@@ -0,0 +1,81 @@
 -- =============================================================
 -- View: analytics.retention_execution_weekly
 -- Looker source alias: ds92  |  Charts: 2
 -- =============================================================
 -- DESCRIPTION
 --   Weekly cohort retention based on agent executions.
 --   Cohort anchor = week of user's FIRST ever agent execution
 --   (not first login). Only includes cohorts from the last 180 days.
 --   Useful when you care about product engagement, not just visits.
 --
 -- SOURCE TABLES
 --   platform.AgentGraphExecution  — Execution records
 --
 -- OUTPUT COLUMNS
 --   Same pattern as retention_login_weekly.
 --   cohort_week_start = week of first execution (not first login)
 --
 -- EXAMPLE QUERIES
 --   -- Week-2 execution retention
 --   SELECT cohort_label, retention_rate_bounded
 --   FROM analytics.retention_execution_weekly
 --   WHERE user_lifetime_week = 2 ORDER BY cohort_week_start;
 -- =============================================================
 WITH params AS (SELECT 12::int AS max_weeks, (CURRENT_DATE - INTERVAL '180 days') AS cohort_start),
 events AS (
  SELECT e."userId"::text AS user_id, e."createdAt"::timestamptz AS created_at,
         DATE_TRUNC('week', e."createdAt")::date AS week_start
  FROM platform."AgentGraphExecution" e WHERE e."userId" IS NOT NULL
 ),
 first_exec AS (
  SELECT user_id, MIN(created_at) AS first_exec_at,
         DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
  FROM events GROUP BY 1
  HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
 ),
 activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
 user_week_age AS (
  SELECT aw.user_id, fe.cohort_week_start,
         ((aw.week_start - DATE_TRUNC('week',fe.first_exec_at)::date)/7)::int AS user_lifetime_week
  FROM activity_weeks aw JOIN first_exec fe USING (user_id)
  WHERE aw.week_start >= DATE_TRUNC('week',fe.first_exec_at)::date
 ),
 bounded_counts AS (
  SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
  FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
 ),
 last_active AS (
  SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
 ),
 unbounded_counts AS (
  SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
  FROM last_active la
  CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
  GROUP BY 1,2
 ),
 cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_exec GROUP BY 1),
 cohort_caps AS (
  SELECT cs.cohort_week_start, cs.cohort_users,
         LEAST((SELECT max_weeks FROM params),
               GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
  FROM cohort_sizes cs
 ),
 grid AS (
  SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
 )
 SELECT
  g.cohort_week_start,
  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')                               AS cohort_label,
  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')'  AS cohort_label_n,
  g.user_lifetime_week, g.cohort_users,
  COALESCE(b.active_users_bounded,0)     AS active_users_bounded,
  COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
  CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END    AS retention_rate_bounded,
  CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
  CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END         AS cohort_users_w0
 FROM grid g
 LEFT JOIN bounded_counts   b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
 LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
 ORDER BY g.cohort_week_start, g.user_lifetime_week;
--- a/autogpt_platform/analytics/queries/retention_login_daily.sql
+++ b/autogpt_platform/analytics/queries/retention_login_daily.sql
@@ -0,0 +1,94 @@
 -- =============================================================
 -- View: analytics.retention_login_daily
 -- Looker source alias: ds112  |  Charts: 1
 -- =============================================================
 -- DESCRIPTION
 --   Daily cohort retention based on login sessions.
 --   Same logic as retention_login_weekly but at day granularity,
 --   showing up to day 30 for cohorts from the last 90 days.
 --   Useful for analysing early activation (days 1-7) in detail.
 --
 -- SOURCE TABLES
 --   auth.sessions  — Login session records
 --
 -- OUTPUT COLUMNS (same pattern as retention_login_weekly)
 --   cohort_day_start          DATE     First day the cohort logged in
 --   cohort_label              TEXT     Date string (e.g. '2025-03-01')
 --   cohort_label_n            TEXT     Date + cohort size (e.g. '2025-03-01 (n=12)')
 --   user_lifetime_day         INT      Days since first login (0 = signup day)
 --   cohort_users              BIGINT   Total users in cohort
 --   active_users_bounded      BIGINT   Users active on exactly day k
 --   retained_users_unbounded  BIGINT   Users active any time on/after day k
 --   retention_rate_bounded    FLOAT    bounded / cohort_users
 --   retention_rate_unbounded  FLOAT    unbounded / cohort_users
 --   cohort_users_d0           BIGINT   cohort_users only at day 0, else 0 (safe to SUM)
 --
 -- EXAMPLE QUERIES
 --   -- Day-1 retention rate (came back next day)
 --   SELECT cohort_label, retention_rate_bounded AS d1_retention
 --   FROM analytics.retention_login_daily
 --   WHERE user_lifetime_day = 1 ORDER BY cohort_day_start;
 --
 --   -- Average retention curve across all cohorts
 --   SELECT user_lifetime_day,
 --          SUM(active_users_bounded)::float / NULLIF(SUM(cohort_users_d0), 0) AS avg_retention
 --   FROM analytics.retention_login_daily
 --   GROUP BY 1 ORDER BY 1;
 -- =============================================================
 WITH params AS (SELECT 30::int AS max_days, (CURRENT_DATE - INTERVAL '90 days')::date AS cohort_start),
 events AS (
  SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
         DATE_TRUNC('day', s.created_at)::date AS day_start
  FROM auth.sessions s WHERE s.user_id IS NOT NULL
 ),
 first_login AS (
  SELECT user_id, MIN(created_at) AS first_login_time,
         DATE_TRUNC('day', MIN(created_at))::date AS cohort_day_start
  FROM events GROUP BY 1
  HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
 ),
 activity_days AS (SELECT DISTINCT user_id, day_start FROM events),
 user_day_age AS (
  SELECT ad.user_id, fl.cohort_day_start,
         (ad.day_start - DATE_TRUNC('day', fl.first_login_time)::date)::int AS user_lifetime_day
  FROM activity_days ad JOIN first_login fl USING (user_id)
  WHERE ad.day_start >= DATE_TRUNC('day', fl.first_login_time)::date
 ),
 bounded_counts AS (
  SELECT cohort_day_start, user_lifetime_day, COUNT(DISTINCT user_id) AS active_users_bounded
  FROM user_day_age WHERE user_lifetime_day >= 0 GROUP BY 1,2
 ),
 last_active AS (
  SELECT cohort_day_start, user_id, MAX(user_lifetime_day) AS last_active_day FROM user_day_age GROUP BY 1,2
 ),
 unbounded_counts AS (
  SELECT la.cohort_day_start, gs AS user_lifetime_day, COUNT(*) AS retained_users_unbounded
  FROM last_active la
  CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_day,(SELECT max_days FROM params))) gs
  GROUP BY 1,2
 ),
 cohort_sizes AS (SELECT cohort_day_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
 cohort_caps AS (
  SELECT cs.cohort_day_start, cs.cohort_users,
         LEAST((SELECT max_days FROM params), GREATEST(0,(CURRENT_DATE-cs.cohort_day_start)::int)) AS cap_days
  FROM cohort_sizes cs
 ),
 grid AS (
  SELECT cc.cohort_day_start, gs AS user_lifetime_day, cc.cohort_users
  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_days) gs
 )
 SELECT
  g.cohort_day_start,
  TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')                                  AS cohort_label,
  TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')||' (n='||g.cohort_users||')'     AS cohort_label_n,
  g.user_lifetime_day, g.cohort_users,
  COALESCE(b.active_users_bounded,0)     AS active_users_bounded,
  COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
  CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END    AS retention_rate_bounded,
  CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
  CASE WHEN g.user_lifetime_day=0 THEN g.cohort_users ELSE 0 END            AS cohort_users_d0
 FROM grid g
 LEFT JOIN bounded_counts   b ON b.cohort_day_start=g.cohort_day_start AND b.user_lifetime_day=g.user_lifetime_day
 LEFT JOIN unbounded_counts u ON u.cohort_day_start=g.cohort_day_start AND u.user_lifetime_day=g.user_lifetime_day
 ORDER BY g.cohort_day_start, g.user_lifetime_day;
--- a/autogpt_platform/analytics/queries/retention_login_onboarded_weekly.sql
+++ b/autogpt_platform/analytics/queries/retention_login_onboarded_weekly.sql
@@ -0,0 +1,96 @@
 -- =============================================================
 -- View: analytics.retention_login_onboarded_weekly
 -- Looker source alias: ds101  |  Charts: 2
 -- =============================================================
 -- DESCRIPTION
 --   Weekly cohort retention from login sessions, restricted to
 --   users who "onboarded" — defined as running at least one
 --   agent within 365 days of their first login.
 --   Filters out users who signed up but never activated,
 --   giving a cleaner view of engaged-user retention.
 --
 -- SOURCE TABLES
 --   auth.sessions                  — Login session records
 --   platform.AgentGraphExecution   — Used to identify onboarders
 --
 -- OUTPUT COLUMNS
 --   Same as retention_login_weekly (cohort_week_start, user_lifetime_week,
 --   retention_rate_bounded, retention_rate_unbounded, etc.)
 --   Only difference: cohort is filtered to onboarded users only.
 --
 -- EXAMPLE QUERIES
 --   -- Compare week-4 retention: all users vs onboarded only
 --   SELECT 'all_users' AS segment, AVG(retention_rate_bounded) AS w4_retention
 --   FROM analytics.retention_login_weekly WHERE user_lifetime_week = 4
 --   UNION ALL
 --   SELECT 'onboarded', AVG(retention_rate_bounded)
 --   FROM analytics.retention_login_onboarded_weekly WHERE user_lifetime_week = 4;
 -- =============================================================
 WITH params AS (SELECT 12::int AS max_weeks, 365::int AS onboarding_window_days),
 events AS (
  SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
         DATE_TRUNC('week', s.created_at)::date AS week_start
  FROM auth.sessions s WHERE s.user_id IS NOT NULL
 ),
 first_login_all AS (
  SELECT user_id, MIN(created_at) AS first_login_time,
         DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
  FROM events GROUP BY 1
 ),
 onboarders AS (
  SELECT fl.user_id FROM first_login_all fl
  WHERE EXISTS (
    SELECT 1 FROM platform."AgentGraphExecution" e
    WHERE e."userId"::text = fl.user_id
      AND e."createdAt" >= fl.first_login_time
      AND e."createdAt" < fl.first_login_time
          + make_interval(days => (SELECT onboarding_window_days FROM params))
  )
 ),
 first_login AS (SELECT * FROM first_login_all WHERE user_id IN (SELECT user_id FROM onboarders)),
 activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
 user_week_age AS (
  SELECT aw.user_id, fl.cohort_week_start,
         ((aw.week_start - DATE_TRUNC('week',fl.first_login_time)::date)/7)::int AS user_lifetime_week
  FROM activity_weeks aw JOIN first_login fl USING (user_id)
  WHERE aw.week_start >= DATE_TRUNC('week',fl.first_login_time)::date
 ),
 bounded_counts AS (
  SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
  FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
 ),
 last_active AS (
  SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
 ),
 unbounded_counts AS (
  SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
  FROM last_active la
  CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
  GROUP BY 1,2
 ),
 cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
 cohort_caps AS (
  SELECT cs.cohort_week_start, cs.cohort_users,
         LEAST((SELECT max_weeks FROM params),
               GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
  FROM cohort_sizes cs
 ),
 grid AS (
  SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
 )
 SELECT
  g.cohort_week_start,
  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')                               AS cohort_label,
  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')'  AS cohort_label_n,
  g.user_lifetime_week, g.cohort_users,
  COALESCE(b.active_users_bounded,0)     AS active_users_bounded,
  COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
  CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END    AS retention_rate_bounded,
  CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
  CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END         AS cohort_users_w0
 FROM grid g
 LEFT JOIN bounded_counts   b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
 LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
 ORDER BY g.cohort_week_start, g.user_lifetime_week;
--- a/autogpt_platform/analytics/queries/retention_login_weekly.sql
+++ b/autogpt_platform/analytics/queries/retention_login_weekly.sql
@@ -0,0 +1,103 @@
 -- =============================================================
 -- View: analytics.retention_login_weekly
 -- Looker source alias: ds83  |  Charts: 2
 -- =============================================================
 -- DESCRIPTION
 --   Weekly cohort retention based on login sessions.
 --   Users are grouped by the ISO week of their first ever login.
 --   For each cohort × lifetime-week combination, outputs both:
 --     - bounded rate: % active in exactly that week
 --     - unbounded rate: % who were ever active on or after that week
 --   Weeks are capped to the cohort's actual age (no future data points).
 --
 -- SOURCE TABLES
 --   auth.sessions  — Login session records
 --
 -- HOW TO READ THE OUTPUT
 --   cohort_week_start   The Monday of the week users first logged in
 --   user_lifetime_week  0 = signup week, 1 = one week later, etc.
 --   retention_rate_bounded   = active_users_bounded / cohort_users
 --   retention_rate_unbounded = retained_users_unbounded / cohort_users
 --
 -- OUTPUT COLUMNS
 --   cohort_week_start         DATE     First day of the cohort's signup week
 --   cohort_label              TEXT     ISO week label (e.g. '2025-W01')
 --   cohort_label_n            TEXT     ISO week label with cohort size (e.g. '2025-W01 (n=42)')
 --   user_lifetime_week        INT      Weeks since first login (0 = signup week)
 --   cohort_users              BIGINT   Total users in this cohort (denominator)
 --   active_users_bounded      BIGINT   Users active in exactly week k
 --   retained_users_unbounded  BIGINT   Users active any time on/after week k
 --   retention_rate_bounded    FLOAT    bounded active / cohort_users
 --   retention_rate_unbounded  FLOAT    unbounded retained / cohort_users
 --   cohort_users_w0           BIGINT   cohort_users only at week 0, else 0 (safe to SUM in pivot tables)
 --
 -- EXAMPLE QUERIES
 --   -- Week-1 retention rate per cohort
 --   SELECT cohort_label, retention_rate_bounded AS w1_retention
 --   FROM analytics.retention_login_weekly
 --   WHERE user_lifetime_week = 1
 --   ORDER BY cohort_week_start;
 --
 --   -- Overall average retention curve (all cohorts combined)
 --   SELECT user_lifetime_week,
 --          SUM(active_users_bounded)::float / NULLIF(SUM(cohort_users_w0), 0) AS avg_retention
 --   FROM analytics.retention_login_weekly
 --   GROUP BY 1 ORDER BY 1;
 -- =============================================================
 WITH params AS (SELECT 12::int AS max_weeks),
 events AS (
  SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
         DATE_TRUNC('week', s.created_at)::date AS week_start
  FROM auth.sessions s WHERE s.user_id IS NOT NULL
 ),
 first_login AS (
  SELECT user_id, MIN(created_at) AS first_login_time,
         DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
  FROM events GROUP BY 1
 ),
 activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
 user_week_age AS (
  SELECT aw.user_id, fl.cohort_week_start,
         ((aw.week_start - DATE_TRUNC('week', fl.first_login_time)::date) / 7)::int AS user_lifetime_week
  FROM activity_weeks aw JOIN first_login fl USING (user_id)
  WHERE aw.week_start >= DATE_TRUNC('week', fl.first_login_time)::date
 ),
 bounded_counts AS (
  SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
  FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
 ),
 last_active AS (
  SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
 ),
 unbounded_counts AS (
  SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
  FROM last_active la
  CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
  GROUP BY 1,2
 ),
 cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
 cohort_caps AS (
  SELECT cs.cohort_week_start, cs.cohort_users,
         LEAST((SELECT max_weeks FROM params),
               GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date - cs.cohort_week_start)/7)::int)) AS cap_weeks
  FROM cohort_sizes cs
 ),
 grid AS (
  SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
 )
 SELECT
  g.cohort_week_start,
  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')                                    AS cohort_label,
  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')'       AS cohort_label_n,
  g.user_lifetime_week, g.cohort_users,
  COALESCE(b.active_users_bounded,0)     AS active_users_bounded,
  COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
  CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END    AS retention_rate_bounded,
  CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
  CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END               AS cohort_users_w0
 FROM grid g
 LEFT JOIN bounded_counts   b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
 LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
 ORDER BY g.cohort_week_start, g.user_lifetime_week
--- a/autogpt_platform/analytics/queries/user_block_spending.sql
+++ b/autogpt_platform/analytics/queries/user_block_spending.sql
@@ -0,0 +1,71 @@
 -- =============================================================
 -- View: analytics.user_block_spending
 -- Looker source alias: ds6  |  Charts: 5
 -- =============================================================
 -- DESCRIPTION
 --   One row per credit transaction (last 90 days).
 --   Shows how users spend credits broken down by block type,
 --   LLM provider and model.  Joins node execution stats for
 --   token-level detail.
 --
 -- SOURCE TABLES
 --   platform.CreditTransaction   — Credit debit/credit records
 --   platform.AgentNodeExecution  — Node execution stats (for token counts)
 --
 -- OUTPUT COLUMNS
 --   transactionKey        TEXT         Unique transaction identifier
 --   userId                TEXT         User who was charged
 --   amount                DECIMAL      Credit amount (positive = credit, negative = debit)
 --   negativeAmount        DECIMAL      amount * -1 (convenience for spend charts)
 --   transactionType       TEXT         Transaction type (e.g. 'USAGE', 'REFUND', 'TOP_UP')
 --   transactionTime       TIMESTAMPTZ  When the transaction was recorded
 --   blockId               TEXT         Block UUID that triggered the spend
 --   blockName             TEXT         Human-readable block name
 --   llm_provider          TEXT         LLM provider (e.g. 'openai', 'anthropic')
 --   llm_model             TEXT         Model name (e.g. 'gpt-4o', 'claude-3-5-sonnet')
 --   node_exec_id          TEXT         Linked node execution UUID
 --   llm_call_count        INT          LLM API calls made in that execution
 --   llm_retry_count       INT          LLM retries in that execution
 --   llm_input_token_count INT          Input tokens consumed
 --   llm_output_token_count INT         Output tokens produced
 --
 -- WINDOW
 --   Rolling 90 days (createdAt > CURRENT_DATE - 90 days)
 --
 -- EXAMPLE QUERIES
 --   -- Total spend per user (last 90 days)
 --   SELECT "userId", SUM("negativeAmount") AS total_spent
 --   FROM analytics.user_block_spending
 --   WHERE "transactionType" = 'USAGE'
 --   GROUP BY 1 ORDER BY total_spent DESC;
 --
 --   -- Spend by LLM provider + model
 --   SELECT "llm_provider", "llm_model",
 --          SUM("negativeAmount") AS total_cost,
 --          SUM("llm_input_token_count") AS input_tokens,
 --          SUM("llm_output_token_count") AS output_tokens
 --   FROM analytics.user_block_spending
 --   WHERE "llm_provider" IS NOT NULL
 --   GROUP BY 1, 2 ORDER BY total_cost DESC;
 -- =============================================================
 SELECT
    c."transactionKey"                                        AS transactionKey,
    c."userId"                                                AS userId,
    c."amount"                                                AS amount,
    c."amount" * -1                                           AS negativeAmount,
    c."type"                                                  AS transactionType,
    c."createdAt"                                             AS transactionTime,
    c.metadata->>'block_id'                                   AS blockId,
    c.metadata->>'block'                                      AS blockName,
    c.metadata->'input'->'credentials'->>'provider'           AS llm_provider,
    c.metadata->'input'->>'model'                             AS llm_model,
    c.metadata->>'node_exec_id'                               AS node_exec_id,
    (ne."stats"->>'llm_call_count')::int                       AS llm_call_count,
    (ne."stats"->>'llm_retry_count')::int                      AS llm_retry_count,
    (ne."stats"->>'input_token_count')::int                    AS llm_input_token_count,
    (ne."stats"->>'output_token_count')::int                   AS llm_output_token_count
 FROM platform."CreditTransaction" c
 LEFT JOIN platform."AgentNodeExecution" ne
       ON (c.metadata->>'node_exec_id') = ne."id"::text
 WHERE c."createdAt" > CURRENT_DATE - INTERVAL '90 days'
--- a/autogpt_platform/analytics/queries/user_onboarding.sql
+++ b/autogpt_platform/analytics/queries/user_onboarding.sql
@@ -0,0 +1,45 @@
 -- =============================================================
 -- View: analytics.user_onboarding
 -- Looker source alias: ds68  |  Charts: 3
 -- =============================================================
 -- DESCRIPTION
 --   One row per user onboarding record.  Contains the user's
 --   stated usage reason, selected integrations, completed
 --   onboarding steps and optional first agent selection.
 --   Full history (no date filter) since onboarding happens
 --   once per user.
 --
 -- SOURCE TABLES
 --   platform.UserOnboarding  — Onboarding state per user
 --
 -- OUTPUT COLUMNS
 --   id                            TEXT         Onboarding record UUID
 --   createdAt                     TIMESTAMPTZ  When onboarding started
 --   updatedAt                     TIMESTAMPTZ  Last update to onboarding state
 --   usageReason                   TEXT         Why user signed up (e.g. 'work', 'personal')
 --   integrations                  TEXT[]       Array of integration names the user selected
 --   userId                        TEXT         User UUID
 --   completedSteps                TEXT[]       Array of onboarding step enums completed
 --   selectedStoreListingVersionId TEXT         First marketplace agent the user chose (if any)
 --
 -- EXAMPLE QUERIES
 --   -- Usage reason breakdown
 --   SELECT "usageReason", COUNT(*) FROM analytics.user_onboarding GROUP BY 1;
 --
 --   -- Completion rate per step
 --   SELECT step, COUNT(*) AS users_completed
 --   FROM analytics.user_onboarding
 --   CROSS JOIN LATERAL UNNEST("completedSteps") AS step
 --   GROUP BY 1 ORDER BY users_completed DESC;
 -- =============================================================
 SELECT
    id,
    "createdAt",
    "updatedAt",
    "usageReason",
    integrations,
    "userId",
    "completedSteps",
    "selectedStoreListingVersionId"
 FROM platform."UserOnboarding"
--- a/autogpt_platform/analytics/queries/user_onboarding_funnel.sql
+++ b/autogpt_platform/analytics/queries/user_onboarding_funnel.sql
@@ -0,0 +1,100 @@
 -- =============================================================
 -- View: analytics.user_onboarding_funnel
 -- Looker source alias: ds74  |  Charts: 1
 -- =============================================================
 -- DESCRIPTION
 --   Pre-aggregated onboarding funnel showing how many users
 --   completed each step and the drop-off percentage from the
 --   previous step.  One row per onboarding step (all 22 steps
 --   always present, even with 0 completions — prevents sparse
 --   gaps from making LAG compare the wrong predecessors).
 --
 -- SOURCE TABLES
 --   platform.UserOnboarding  — Onboarding records with completedSteps array
 --
 -- OUTPUT COLUMNS
 --   step             TEXT     Onboarding step enum name (e.g. 'WELCOME', 'CONGRATS')
 --   step_order       INT      Numeric position in the funnel (1=first, 22=last)
 --   users_completed  BIGINT   Distinct users who completed this step
 --   pct_from_prev    NUMERIC  % of users from the previous step who reached this one
 --
 -- STEP ORDER
 --   1  WELCOME               9  MARKETPLACE_VISIT     17  SCHEDULE_AGENT
 --   2  USAGE_REASON         10  MARKETPLACE_ADD_AGENT  18  RUN_AGENTS
 --   3  INTEGRATIONS         11  MARKETPLACE_RUN_AGENT  19  RUN_3_DAYS
 --   4  AGENT_CHOICE         12  BUILDER_OPEN           20  TRIGGER_WEBHOOK
 --   5  AGENT_NEW_RUN        13  BUILDER_SAVE_AGENT     21  RUN_14_DAYS
 --   6  AGENT_INPUT          14  BUILDER_RUN_AGENT      22  RUN_AGENTS_100
 --   7  CONGRATS             15  VISIT_COPILOT
 --   8  GET_RESULTS          16  RE_RUN_AGENT
 --
 -- WINDOW
 --   Users who started onboarding in the last 90 days
 --
 -- EXAMPLE QUERIES
 --   -- Full funnel
 --   SELECT * FROM analytics.user_onboarding_funnel ORDER BY step_order;
 --
 --   -- Biggest drop-off point
 --   SELECT step, pct_from_prev FROM analytics.user_onboarding_funnel
 --   ORDER BY pct_from_prev ASC LIMIT 3;
 -- =============================================================
 WITH all_steps AS (
  -- Complete ordered grid of all 22 steps so zero-completion steps
  -- are always present, keeping LAG comparisons correct.
  SELECT step_name, step_order
  FROM (VALUES
    ('WELCOME',               1),
    ('USAGE_REASON',          2),
    ('INTEGRATIONS',          3),
    ('AGENT_CHOICE',          4),
    ('AGENT_NEW_RUN',         5),
    ('AGENT_INPUT',           6),
    ('CONGRATS',              7),
    ('GET_RESULTS',           8),
    ('MARKETPLACE_VISIT',     9),
    ('MARKETPLACE_ADD_AGENT', 10),
    ('MARKETPLACE_RUN_AGENT', 11),
    ('BUILDER_OPEN',          12),
    ('BUILDER_SAVE_AGENT',    13),
    ('BUILDER_RUN_AGENT',     14),
    ('VISIT_COPILOT',         15),
    ('RE_RUN_AGENT',          16),
    ('SCHEDULE_AGENT',        17),
    ('RUN_AGENTS',            18),
    ('RUN_3_DAYS',            19),
    ('TRIGGER_WEBHOOK',       20),
    ('RUN_14_DAYS',           21),
    ('RUN_AGENTS_100',        22)
  ) AS t(step_name, step_order)
 ),
 raw AS (
  SELECT
      u."userId",
      step_txt::text AS step
  FROM platform."UserOnboarding" u
  CROSS JOIN LATERAL UNNEST(u."completedSteps") AS step_txt
  WHERE u."createdAt" >= CURRENT_DATE - INTERVAL '90 days'
 ),
 step_counts AS (
  SELECT step, COUNT(DISTINCT "userId") AS users_completed
  FROM raw GROUP BY step
 ),
 funnel AS (
  SELECT
      a.step_name                          AS step,
      a.step_order,
      COALESCE(sc.users_completed, 0)      AS users_completed,
      ROUND(
        100.0 * COALESCE(sc.users_completed, 0)
        / NULLIF(
            LAG(COALESCE(sc.users_completed, 0)) OVER (ORDER BY a.step_order),
            0
          ),
        2
      )                                    AS pct_from_prev
  FROM all_steps a
  LEFT JOIN step_counts sc ON sc.step = a.step_name
 )
 SELECT * FROM funnel ORDER BY step_order
--- a/autogpt_platform/analytics/queries/user_onboarding_integration.sql
+++ b/autogpt_platform/analytics/queries/user_onboarding_integration.sql
@@ -0,0 +1,41 @@
 -- =============================================================
 -- View: analytics.user_onboarding_integration
 -- Looker source alias: ds75  |  Charts: 1
 -- =============================================================
 -- DESCRIPTION
 --   Pre-aggregated count of users who selected each integration
 --   during onboarding.  One row per integration type, sorted
 --   by popularity.
 --
 -- SOURCE TABLES
 --   platform.UserOnboarding  — integrations array column
 --
 -- OUTPUT COLUMNS
 --   integration            TEXT    Integration name (e.g. 'github', 'slack', 'notion')
 --   users_with_integration BIGINT  Distinct users who selected this integration
 --
 -- WINDOW
 --   Users who started onboarding in the last 90 days
 --
 -- EXAMPLE QUERIES
 --   -- Full integration popularity ranking
 --   SELECT * FROM analytics.user_onboarding_integration;
 --
 --   -- Top 5 integrations
 --   SELECT * FROM analytics.user_onboarding_integration LIMIT 5;
 -- =============================================================
 WITH exploded AS (
  SELECT
      u."userId" AS user_id,
      UNNEST(u."integrations") AS integration
  FROM platform."UserOnboarding" u
  WHERE u."createdAt" >= CURRENT_DATE - INTERVAL '90 days'
 )
 SELECT
    integration,
    COUNT(DISTINCT user_id) AS users_with_integration
 FROM exploded
 WHERE integration IS NOT NULL AND integration <> ''
 GROUP BY integration
 ORDER BY users_with_integration DESC
--- a/autogpt_platform/analytics/queries/users_activities.sql
+++ b/autogpt_platform/analytics/queries/users_activities.sql
@@ -0,0 +1,145 @@
 -- =============================================================
 -- View: analytics.users_activities
 -- Looker source alias: ds56  |  Charts: 5
 -- =============================================================
 -- DESCRIPTION
 --   One row per user with lifetime activity summary.
 --   Joins login sessions with agent graphs, executions and
 --   node-level runs to give a full picture of how engaged
 --   each user is.  Includes a convenience flag for 7-day
 --   activation (did the user return at least 7 days after
 --   their first login?).
 --
 -- SOURCE TABLES
 --   auth.sessions                    — Login/session records
 --   platform.AgentGraph              — Graphs (agents) built by the user
 --   platform.AgentGraphExecution     — Agent run history
 --   platform.AgentNodeExecution      — Individual block execution history
 --
 -- PERFORMANCE NOTE
 --   Each CTE aggregates its own table independently by userId.
 --   This avoids the fan-out that occurs when driving every join
 --   from user_logins across the two largest tables
 --   (AgentGraphExecution and AgentNodeExecution).
 --
 -- OUTPUT COLUMNS
 --   user_id                   TEXT         Supabase user UUID
 --   first_login_time          TIMESTAMPTZ  First ever session created_at
 --   last_login_time           TIMESTAMPTZ  Most recent session created_at
 --   last_visit_time           TIMESTAMPTZ  Max of last refresh or login
 --   last_agent_save_time      TIMESTAMPTZ  Last time user saved an agent graph
 --   agent_count               BIGINT       Number of distinct active graphs built (0 if none)
 --   first_agent_run_time      TIMESTAMPTZ  First ever graph execution
 --   last_agent_run_time       TIMESTAMPTZ  Most recent graph execution
 --   unique_agent_runs         BIGINT       Distinct agent graphs ever run (0 if none)
 --   agent_runs                BIGINT       Total graph execution count (0 if none)
 --   node_execution_count      BIGINT       Total node executions across all runs
 --   node_execution_failed     BIGINT       Node executions with FAILED status
 --   node_execution_completed  BIGINT       Node executions with COMPLETED status
 --   node_execution_terminated BIGINT       Node executions with TERMINATED status
 --   node_execution_queued     BIGINT       Node executions with QUEUED status
 --   node_execution_running    BIGINT       Node executions with RUNNING status
 --   is_active_after_7d        INT          1=returned after day 7, 0=did not, NULL=too early to tell
 --   node_execution_incomplete BIGINT       Node executions with INCOMPLETE status
 --   node_execution_review     BIGINT       Node executions with REVIEW status
 --
 -- EXAMPLE QUERIES
 --   -- Users who ran at least one agent and returned after 7 days
 --   SELECT COUNT(*) FROM analytics.users_activities
 --   WHERE agent_runs > 0 AND is_active_after_7d = 1;
 --
 --   -- Top 10 most active users by agent runs
 --   SELECT user_id, agent_runs, node_execution_count
 --   FROM analytics.users_activities
 --   ORDER BY agent_runs DESC LIMIT 10;
 --
 --   -- 7-day activation rate
 --   SELECT
 --     SUM(CASE WHEN is_active_after_7d = 1 THEN 1 ELSE 0 END)::float
 --     / NULLIF(COUNT(CASE WHEN is_active_after_7d IS NOT NULL THEN 1 END), 0)
 --     AS activation_rate
 --   FROM analytics.users_activities;
 -- =============================================================
 WITH user_logins AS (
  SELECT
    user_id::text                                    AS user_id,
    MIN(created_at)                                  AS first_login_time,
    MAX(created_at)                                  AS last_login_time,
    GREATEST(
      MAX(refreshed_at)::timestamptz,
      MAX(created_at)::timestamptz
    )                                                AS last_visit_time
  FROM auth.sessions
  GROUP BY user_id
 ),
 user_agents AS (
  -- Aggregate AgentGraph directly by userId (no fan-out from user_logins)
  SELECT
    "userId"::text                AS user_id,
    MAX("updatedAt")              AS last_agent_save_time,
    COUNT(DISTINCT "id")          AS agent_count
  FROM platform."AgentGraph"
  WHERE "isActive"
  GROUP BY "userId"
 ),
 user_graph_runs AS (
  -- Aggregate AgentGraphExecution directly by userId
  SELECT
    "userId"::text                        AS user_id,
    MIN("createdAt")                      AS first_agent_run_time,
    MAX("createdAt")                      AS last_agent_run_time,
    COUNT(DISTINCT "agentGraphId")        AS unique_agent_runs,
    COUNT("id")                           AS agent_runs
  FROM platform."AgentGraphExecution"
  GROUP BY "userId"
 ),
 user_node_runs AS (
  -- Aggregate AgentNodeExecution directly; resolve userId via a
  -- single join to AgentGraphExecution instead of fanning out from
  -- user_logins through both large tables.
  SELECT
    g."userId"::text                                                   AS user_id,
    COUNT(*)                                                           AS node_execution_count,
    COUNT(*) FILTER (WHERE n."executionStatus" = 'FAILED')             AS node_execution_failed,
    COUNT(*) FILTER (WHERE n."executionStatus" = 'COMPLETED')          AS node_execution_completed,
    COUNT(*) FILTER (WHERE n."executionStatus" = 'TERMINATED')         AS node_execution_terminated,
    COUNT(*) FILTER (WHERE n."executionStatus" = 'QUEUED')             AS node_execution_queued,
    COUNT(*) FILTER (WHERE n."executionStatus" = 'RUNNING')            AS node_execution_running,
    COUNT(*) FILTER (WHERE n."executionStatus" = 'INCOMPLETE')         AS node_execution_incomplete,
    COUNT(*) FILTER (WHERE n."executionStatus" = 'REVIEW')             AS node_execution_review
  FROM platform."AgentNodeExecution" n
  JOIN platform."AgentGraphExecution" g
    ON g."id" = n."agentGraphExecutionId"
  GROUP BY g."userId"
 )
 SELECT
  ul.user_id,
  ul.first_login_time,
  ul.last_login_time,
  ul.last_visit_time,
  ua.last_agent_save_time,
  COALESCE(ua.agent_count, 0)             AS agent_count,
  gr.first_agent_run_time,
  gr.last_agent_run_time,
  COALESCE(gr.unique_agent_runs, 0)       AS unique_agent_runs,
  COALESCE(gr.agent_runs, 0)              AS agent_runs,
  COALESCE(nr.node_execution_count, 0)      AS node_execution_count,
  COALESCE(nr.node_execution_failed, 0)     AS node_execution_failed,
  COALESCE(nr.node_execution_completed, 0)  AS node_execution_completed,
  COALESCE(nr.node_execution_terminated, 0) AS node_execution_terminated,
  COALESCE(nr.node_execution_queued, 0)     AS node_execution_queued,
  COALESCE(nr.node_execution_running, 0)    AS node_execution_running,
  CASE
    WHEN ul.first_login_time < NOW() - INTERVAL '7 days'
     AND ul.last_visit_time  >= ul.first_login_time + INTERVAL '7 days' THEN 1
    WHEN ul.first_login_time < NOW() - INTERVAL '7 days'
     AND ul.last_visit_time  <  ul.first_login_time + INTERVAL '7 days' THEN 0
    ELSE NULL
  END AS is_active_after_7d,
  COALESCE(nr.node_execution_incomplete, 0) AS node_execution_incomplete,
  COALESCE(nr.node_execution_review, 0)     AS node_execution_review
 FROM user_logins ul
 LEFT JOIN user_agents     ua ON ul.user_id = ua.user_id
 LEFT JOIN user_graph_runs gr ON ul.user_id = gr.user_id
 LEFT JOIN user_node_runs  nr ON ul.user_id = nr.user_id
--- a/autogpt_platform/backend/generate_views.py
+++ b/autogpt_platform/backend/generate_views.py
@@ -0,0 +1,246 @@
 #!/usr/bin/env python3
 """
 AutoGPT Analytics — View Generator
 ====================================
 Reads every .sql file in analytics/queries/ and registers it as a
 CREATE OR REPLACE VIEW in the analytics schema.
 Quick start (from autogpt_platform/backend/):
 Step 1 — one-time setup (creates schema, role, grants):
  poetry run analytics-setup
 Step 2 — create / refresh all 14 analytics views:
  poetry run analytics-views
 Both commands auto-detect credentials from .env (DB_* vars).
 Use --db-url to override.
 Step 3 (optional) — enable login and set a password for the read-only
 role so external tools (Supabase MCP, PostHog Data Warehouse) can connect.
 The role is created as NOLOGIN, so you must grant LOGIN at the same time.
 Run in Supabase SQL Editor:
  ALTER ROLE analytics_readonly WITH LOGIN PASSWORD 'your-password';
 Usage
 -----
  poetry run analytics-setup              # apply setup to DB
  poetry run analytics-setup --dry-run   # print setup SQL only
  poetry run analytics-views             # apply all views to DB
  poetry run analytics-views --dry-run   # print all view SQL only
  poetry run analytics-views --only graph_execution,retention_login_weekly
 Environment variables
 ---------------------
  DATABASE_URL   Postgres connection string (checked before .env)
 Notes
 -----
 - .env DB_* vars are read automatically as a fallback.
 - Safe to re-run: uses CREATE OR REPLACE VIEW.
 - Looker, PostHog Data Warehouse, and Supabase MCP all read from the
  same analytics.* views — no raw tables exposed.
 """
 import argparse
 import os
 import sys
 from pathlib import Path
 from urllib.parse import quote
 QUERIES_DIR = Path(__file__).parent.parent / "analytics" / "queries"
 ENV_FILE = Path(__file__).parent / ".env"
 SCHEMA = "analytics"
 SETUP_SQL = """\
 -- =============================================================
 -- AutoGPT Analytics Schema Setup
 -- Run ONCE as the postgres superuser (e.g. via Supabase SQL Editor).
 -- After this, run: poetry run analytics-views
 -- =============================================================
 -- 1. Create the analytics schema
 CREATE SCHEMA IF NOT EXISTS analytics;
 -- 2. Create the read-only role (skip if already exists)
 DO $$
 BEGIN
  IF NOT EXISTS (SELECT FROM pg_roles WHERE rolname = 'analytics_readonly') THEN
    CREATE ROLE analytics_readonly NOLOGIN;
  END IF;
 END
 $$;
 -- 3. Analytics schema grants only.
 --    Views use security_invoker = false so they execute as their
 --    owner (postgres). analytics_readonly never needs direct access
 --    to the platform or auth schemas.
 GRANT USAGE ON SCHEMA analytics TO analytics_readonly;
 GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO analytics_readonly;
 ALTER DEFAULT PRIVILEGES IN SCHEMA analytics
  GRANT SELECT ON TABLES TO analytics_readonly;
 """
 def load_db_url_from_env() -> str | None:
    """Read DB_* vars from .env and build a psycopg2 connection string."""
    if not ENV_FILE.exists():
        return None
    env: dict[str, str] = {}
    for line in ENV_FILE.read_text().splitlines():
        line = line.strip()
        if not line or line.startswith("#") or "=" not in line:
            continue
        key, _, value = line.partition("=")
        env[key.strip()] = value.strip().strip('"').strip("'")
    host = env.get("DB_HOST", "localhost")
    port = env.get("DB_PORT", "5432")
    user = env.get("DB_USER", "postgres")
    password = env.get("DB_PASS", "")
    dbname = env.get("DB_NAME", "postgres")
    if not password:
        return None
    return (
        "postgresql://"
        f"{quote(user, safe='')}:{quote(password, safe='')}"
        f"@{host}:{port}/{quote(dbname, safe='')}"
    )
 def get_db_url(args: argparse.Namespace) -> str | None:
    return args.db_url or os.environ.get("DATABASE_URL") or load_db_url_from_env()
 def connect(db_url: str):
    try:
        import psycopg2
    except ImportError:
        print("psycopg2 not found. Run: poetry install", file=sys.stderr)
        sys.exit(1)
    return psycopg2.connect(db_url)
 def run_sql(db_url: str, statements: list[tuple[str, str]]) -> None:
    """Execute a list of (label, sql) pairs in a single transaction."""
    conn = connect(db_url)
    conn.autocommit = False
    cur = conn.cursor()
    try:
        for label, sql in statements:
            print(f"  {label} ...", end=" ")
            cur.execute(sql)
            print("OK")
        conn.commit()
        print(f"\n✓ {len(statements)} statement(s) applied.")
    except Exception as e:
        conn.rollback()
        print(f"\n✗ Error: {e}", file=sys.stderr)
        sys.exit(1)
    finally:
        cur.close()
        conn.close()
 def build_view_sql(name: str, query_body: str) -> str:
    body = query_body.strip().rstrip(";")
    # security_invoker = false → view runs as its owner (postgres), not the
    # caller, so analytics_readonly only needs analytics schema access.
    return f"CREATE OR REPLACE VIEW {SCHEMA}.{name} WITH (security_invoker = false) AS\n{body};\n"
 def load_views(only: list[str] | None = None) -> list[tuple[str, str]]:
    """Return [(label, sql)] for all views, in alphabetical order."""
    files = sorted(QUERIES_DIR.glob("*.sql"))
    if not files:
        print(f"No .sql files found in {QUERIES_DIR}", file=sys.stderr)
        sys.exit(1)
    known = {f.stem for f in files}
    if only:
        unknown = [n for n in only if n not in known]
        if unknown:
            print(
                f"Unknown view name(s): {', '.join(unknown)}\n"
                f"Available: {', '.join(sorted(known))}",
                file=sys.stderr,
            )
            sys.exit(1)
    result = []
    for f in files:
        name = f.stem
        if only and name not in only:
            continue
        result.append((f"view analytics.{name}", build_view_sql(name, f.read_text())))
    return result
 def no_db_url_error() -> None:
    print(
        "No database URL found.\n"
        "Tried: --db-url, DATABASE_URL env var, and .env (DB_* vars).\n"
        "Use --dry-run to just print the SQL.",
        file=sys.stderr,
    )
    sys.exit(1)
 def cmd_setup(args: argparse.Namespace) -> None:
    if args.dry_run:
        print(SETUP_SQL)
        return
    db_url = get_db_url(args)
    if not db_url:
        no_db_url_error()
    assert db_url
    print("Applying analytics setup...")
    run_sql(db_url, [("schema / role / grants", SETUP_SQL)])
 def cmd_views(args: argparse.Namespace) -> None:
    only = [v.strip() for v in args.only.split(",")] if args.only else None
    views = load_views(only=only)
    if not views:
        print("No matching views found.")
        sys.exit(0)
    if args.dry_run:
        print(f"-- {len(views)} views\n")
        for label, sql in views:
            print(f"-- {label}")
            print(sql)
        return
    db_url = get_db_url(args)
    if not db_url:
        no_db_url_error()
    assert db_url
    print(f"Applying {len(views)} view(s)...")
    # Append grant refresh so the readonly role sees any new views
    grant = f"GRANT SELECT ON ALL TABLES IN SCHEMA {SCHEMA} TO analytics_readonly;"
    run_sql(db_url, views + [("grant analytics_readonly", grant)])
 def main_setup() -> None:
    parser = argparse.ArgumentParser(description="Apply analytics schema setup to DB")
    parser.add_argument(
        "--dry-run", action="store_true", help="Print SQL, don't execute"
    )
    parser.add_argument("--db-url", help="Postgres connection string")
    cmd_setup(parser.parse_args())
 def main_views() -> None:
    parser = argparse.ArgumentParser(description="Apply analytics views to DB")
    parser.add_argument(
        "--dry-run", action="store_true", help="Print SQL, don't execute"
    )
    parser.add_argument("--db-url", help="Postgres connection string")
    parser.add_argument("--only", help="Comma-separated view names to update")
    cmd_views(parser.parse_args())
 if __name__ == "__main__":
    # Default: apply views (backwards-compatible with direct python invocation)
    main_views()
--- a/autogpt_platform/backend/pyproject.toml
+++ b/autogpt_platform/backend/pyproject.toml
@@ -120,6 +120,8 @@ ws = "backend.ws:main"
 scheduler = "backend.scheduler:main"
 notification = "backend.notification:main"
 executor = "backend.exec:main"
 analytics-setup = "generate_views:main_setup"
 analytics-views = "generate_views:main_views"
 copilot-executor = "backend.copilot.executor.__main__:main"
 cli = "backend.cli:main"
 format = "linter:format"
Author	SHA1	Message	Date
Otto (AGPT)	e9afd9fa01	fix: cast OnboardingStep enum to text in funnel view The completedSteps column is a platform."OnboardingStep" enum array. UNNEST produces enum values that can't be compared directly to text from the VALUES clause. Adding ::text cast fixes the type mismatch.	2026-03-13 11:55:37 +00:00
Zamil Majdy	ddb4f6e9de	fix(analytics): address second batch of PR review comments - user_onboarding_funnel: build complete 22-step grid with VALUES CTE so zero-completion steps are always present, fixing LAG comparisons against wrong predecessors; update docs to reflect all 22 steps - users_activities: use COUNT(DISTINCT "id") for agent_count to avoid counting multiple version rows per graph; add COALESCE(..., 0) for agent_count, unique_agent_runs, agent_runs; update docs column list to include node_execution_incomplete and node_execution_review - generate_views: update Step 3 comment to clarify NOLOGIN role needs WITH LOGIN PASSWORD not just WITH PASSWORD; add fail-fast validation for unknown --only view names with helpful error message	2026-03-12 00:47:55 +07:00
Zamil Majdy	f585d97928	fix(analytics): move new status columns to end of users_activities SELECT CREATE OR REPLACE VIEW requires existing columns to stay in position. Moving node_execution_incomplete and node_execution_review after is_active_after_7d so the replacement doesn't shift existing columns.	2026-03-12 00:01:40 +07:00
Zamil Majdy	7d39234fdd	fix(analytics): address PR review comments - user_block_spending: use ->> instead of -> for JSONB field extraction before casting to int (avoids runtime cast errors) - generate_views: create analytics_readonly as NOLOGIN to avoid a usable role with a known default password - generate_views: percent-encode DB credentials in the URI builder so passwords with reserved chars (@, :, /) connect correctly - graph_execution: remove WHERE filter on sensitive_action_safe_mode before DISTINCT ON so the latest LibraryAgent version always wins (fixes possibly_ai being sticky once any version had the flag set) - retention_agent: use DISTINCT ON ordered by version DESC instead of MAX(name) so renamed agents resolve to their latest name - retention_login_daily: add 90-day cohort_start filter to first_login CTE so the view matches its documented window - user_onboarding_funnel: map the 8 missing OnboardingStep enum values (VISIT_COPILOT, RE_RUN_AGENT, SCHEDULE_AGENT, RUN_AGENTS, RUN_3_DAYS, TRIGGER_WEBHOOK, RUN_14_DAYS, RUN_AGENTS_100) to step_order 15-22 - users_activities: use updatedAt instead of createdAt for last_agent_save_time; add node_execution_incomplete and node_execution_review status columns	2026-03-11 23:48:42 +07:00
Zamil Majdy	6e9d4c4333	perf(analytics): fix fan-out in users_activities view The original CTEs drove all joins from user_logins, causing a O(users × executions × node_executions) fan-out that made the view too heavy for Supabase to serve. Rewrote each CTE to aggregate its own source table directly by userId, then LEFT JOIN the aggregates in the final SELECT.	2026-03-11 23:39:14 +07:00
Zamil Majdy	8aad333a45	refactor(analytics): move generate_views.py to backend, add poetry run analytics-setup/analytics-views scripts	2026-03-11 16:23:29 +07:00
Zamil Majdy	856f0d980d	fix(analytics): restrict analytics_readonly to analytics schema only via security_invoker=false views	2026-03-11 16:16:03 +07:00
Zamil Majdy	3c3aadd361	docs(analytics): add step-by-step quick start to generate_views.py docstring	2026-03-11 16:12:22 +07:00
Zamil Majdy	e87a693fdd	feat(analytics): auto-load DB creds from backend/.env as fallback	2026-03-11 16:10:31 +07:00
Zamil Majdy	fe265c10d4	refactor(analytics): generate setup.sql via --setup flag, gitignore it	2026-03-11 16:01:52 +07:00
Zamil Majdy	5d00a94693	chore(analytics): remove auto-generated files, gitignore views.sql	2026-03-11 16:00:48 +07:00
Zamil Majdy	6e1605994d	feat(analytics): add documented SQL views with generation script Introduces an analytics/ layer that wraps production Postgres data in safe, read-only views exposed under the analytics schema. - 14 documented query files in queries/ (one per Looker data source) covering auth activities, user activity, execution metrics, onboarding funnel, and cohort retention (login + execution, weekly + daily) - setup.sql — one-time schema creation and role/grant setup for the analytics_readonly role (auth, platform, analytics schemas) - generate_views.py — reads queries/.sql and applies CREATE OR REPLACE VIEW analytics.<name> to the database; supports --dry-run, --only, and --db-url flags - views.sql — pre-generated combined reference output - README.md — full setup, deployment, and integration guide Looker, PostHog Data Warehouse, and Supabase MCP (for Otto) all connect to the same analytics. views instead of raw tables.	2026-03-11 15:36:27 +07:00