feat(blocks): Add video editing blocks (#11796)

This PR adds general-purpose video editing blocks for the AutoGPT Platform, enabling automated video production workflows like documentary creation, marketing videos, tutorial assembly, and content repurposing. ### Changes 🏗️  **New blocks added in `backend/blocks/video/`:** - `VideoDownloadBlock` - Download videos from URLs (YouTube, Vimeo, news sites, direct links) using yt-dlp - `VideoClipBlock` - Extract time segments from videos with start/end time validation - `VideoConcatBlock` - Merge multiple video clips with optional transitions (none, crossfade, fade_black) - `VideoTextOverlayBlock` - Add text overlays/captions with positioning and timing options - `VideoNarrationBlock` - Generate AI narration via ElevenLabs and mix with video audio (replace, mix, or ducking modes) **Dependencies required:** - `yt-dlp` - For video downloading - `moviepy` - For video editing operations **Implementation details:** - All blocks follow the SDK pattern with proper error handling and exception chaining - Proper resource cleanup in `finally` blocks to prevent memory leaks - Input validation (e.g., end_time > start_time) - Test mocks included for CI ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Blocks follow the SDK pattern with `BlockSchemaInput`/`BlockSchemaOutput` - [x] Resource cleanup is implemented in `finally` blocks - [x] Exception chaining is properly implemented - [x] Input validation is in place - [x] Test mocks are provided for CI environments #### For configuration changes: - [ ] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [ ] I have included a list of my configuration changes in the PR description (under **Changes**) N/A - No configuration changes required.  --- > [!NOTE] > **Medium Risk** > Adds new multimedia blocks that invoke ffmpeg/MoviePy and introduces new external dependencies (plus container packages), which can impact runtime stability and resource usage; download/overlay blocks are present but disabled due to sandbox/policy concerns. > > **Overview** > Adds a new `backend.blocks.video` module with general-purpose video workflow blocks (download, clip, concat w/ transitions, loop, add-audio, text overlay, and ElevenLabs-powered narration), including shared utilities for codec selection, filename cleanup, and an ffmpeg-based chapter-strip workaround for MoviePy. > > Extends credentials/config to support ElevenLabs (`ELEVENLABS_API_KEY`, provider enum, system credentials, and cost config) and adds new dependencies (`elevenlabs`, `yt-dlp`) plus Docker runtime packages (`ffmpeg`, `imagemagick`). > > Improves file/reference handling end-to-end by embedding MIME types in `workspace://...#mime` outputs and updating frontend rendering to detect video vs image from MIME fragments (and broaden supported audio/video extensions), with optional enhanced output rendering behind a feature flag in the legacy builder UI. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit da7a44d794. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com> Co-authored-by: Otto <otto@agpt.co> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-30 03:00:41 -04:00 · 2026-02-05 16:22:33 -06:00
parent bfa942e032
commit 85b6520710
37 changed files with 2288 additions and 337 deletions
--- a/docs/integrations/README.md
+++ b/docs/integrations/README.md
@@ -233,6 +233,7 @@ Below is a comprehensive list of all available blocks, categorized by their prim
 | [Stagehand Extract](block-integrations/stagehand/blocks.md#stagehand-extract) | Extract structured data from a webpage |
 | [Stagehand Observe](block-integrations/stagehand/blocks.md#stagehand-observe) | Find suggested actions for your workflows |
 | [Unreal Text To Speech](block-integrations/llm.md#unreal-text-to-speech) | Converts text to speech using the Unreal Speech API |
+| [Video Narration](block-integrations/video/narration.md#video-narration) | Generate AI narration and add to video |

 ## Search and Information Retrieval

@@ -472,9 +473,13 @@ Below is a comprehensive list of all available blocks, categorized by their prim

 | Block Name | Description |
 |------------|-------------|
-| [Add Audio To Video](block-integrations/multimedia.md#add-audio-to-video) | Block to attach an audio file to a video file using moviepy |
-| [Loop Video](block-integrations/multimedia.md#loop-video) | Block to loop a video to a given duration or number of repeats |
-| [Media Duration](block-integrations/multimedia.md#media-duration) | Block to get the duration of a media file |
+| [Add Audio To Video](block-integrations/video/add_audio.md#add-audio-to-video) | Block to attach an audio file to a video file using moviepy |
+| [Loop Video](block-integrations/video/loop.md#loop-video) | Block to loop a video to a given duration or number of repeats |
+| [Media Duration](block-integrations/video/duration.md#media-duration) | Block to get the duration of a media file |
+| [Video Clip](block-integrations/video/clip.md#video-clip) | Extract a time segment from a video |
+| [Video Concat](block-integrations/video/concat.md#video-concat) | Merge multiple video clips into one continuous video |
+| [Video Download](block-integrations/video/download.md#video-download) | Download video from URL (YouTube, Vimeo, news sites, direct links) |
+| [Video Text Overlay](block-integrations/video/text_overlay.md#video-text-overlay) | Add text overlay/caption to video |

 ## Productivity

--- a/docs/integrations/SUMMARY.md
+++ b/docs/integrations/SUMMARY.md
@@ -85,7 +85,6 @@
 * [LLM](block-integrations/llm.md)
 * [Logic](block-integrations/logic.md)
 * [Misc](block-integrations/misc.md)
-* [Multimedia](block-integrations/multimedia.md)
 * [Notion Create Page](block-integrations/notion/create_page.md)
 * [Notion Read Database](block-integrations/notion/read_database.md)
 * [Notion Read Page](block-integrations/notion/read_page.md)
@@ -129,5 +128,13 @@
 * [Twitter Timeline](block-integrations/twitter/timeline.md)
 * [Twitter Tweet Lookup](block-integrations/twitter/tweet_lookup.md)
 * [Twitter User Lookup](block-integrations/twitter/user_lookup.md)
+* [Video Add Audio](block-integrations/video/add_audio.md)
+* [Video Clip](block-integrations/video/clip.md)
+* [Video Concat](block-integrations/video/concat.md)
+* [Video Download](block-integrations/video/download.md)
+* [Video Duration](block-integrations/video/duration.md)
+* [Video Loop](block-integrations/video/loop.md)
+* [Video Narration](block-integrations/video/narration.md)
+* [Video Text Overlay](block-integrations/video/text_overlay.md)
 * [Wolfram LLM API](block-integrations/wolfram/llm_api.md)
 * [Zerobounce Validate Emails](block-integrations/zerobounce/validate_emails.md)
--- a/docs/integrations/block-integrations/video/add_audio.md
+++ b/docs/integrations/block-integrations/video/add_audio.md
@@ -0,0 +1,39 @@
+# Video Add Audio
+<!-- MANUAL: file_description -->
+This block allows you to attach a separate audio track to a video file, replacing or combining with the original audio.
+<!-- END MANUAL -->
+
+## Add Audio To Video
+
+### What it is
+Block to attach an audio file to a video file using moviepy.
+
+### How it works
+<!-- MANUAL: how_it_works -->
+The block uses MoviePy to combine video and audio files. It loads the video and audio inputs (which can be URLs, data URIs, or local paths), optionally scales the audio volume, then writes the combined result to a new video file using H.264 video codec and AAC audio codec.
+<!-- END MANUAL -->
+
+### Inputs
+
+| Input | Description | Type | Required |
+|-------|-------------|------|----------|
+| video_in | Video input (URL, data URI, or local path). | str (file) | Yes |
+| audio_in | Audio input (URL, data URI, or local path). | str (file) | Yes |
+| volume | Volume scale for the newly attached audio track (1.0 = original). | float | No |
+
+### Outputs
+
+| Output | Description | Type |
+|--------|-------------|------|
+| error | Error message if the operation failed | str |
+| video_out | Final video (with attached audio), as a path or data URI. | str (file) |
+
+### Possible use case
+<!-- MANUAL: use_case -->
+- Adding background music to a silent screen recording
+- Replacing original audio with a voiceover or translated audio track
+- Combining AI-generated speech with stock footage
+- Adding sound effects to video content
+<!-- END MANUAL -->
+
+---
--- a/docs/integrations/block-integrations/video/clip.md
+++ b/docs/integrations/block-integrations/video/clip.md
@@ -0,0 +1,41 @@
+# Video Clip
+<!-- MANUAL: file_description -->
+This block extracts a specific time segment from a video file, allowing you to trim videos to precise start and end times.
+<!-- END MANUAL -->
+
+## Video Clip
+
+### What it is
+Extract a time segment from a video
+
+### How it works
+<!-- MANUAL: how_it_works -->
+The block uses MoviePy's `subclipped` function to extract a portion of the video between specified start and end times. It validates that end time is greater than start time, then creates a new video file containing only the selected segment. The output is encoded with H.264 video codec and AAC audio codec, preserving both video and audio from the original clip.
+<!-- END MANUAL -->
+
+### Inputs
+
+| Input | Description | Type | Required |
+|-------|-------------|------|----------|
+| video_in | Input video (URL, data URI, or local path) | str (file) | Yes |
+| start_time | Start time in seconds | float | Yes |
+| end_time | End time in seconds | float | Yes |
+| output_format | Output format | "mp4" \| "webm" \| "mkv" \| "mov" | No |
+
+### Outputs
+
+| Output | Description | Type |
+|--------|-------------|------|
+| error | Error message if the operation failed | str |
+| video_out | Clipped video file (path or data URI) | str (file) |
+| duration | Clip duration in seconds | float |
+
+### Possible use case
+<!-- MANUAL: use_case -->
+- Extracting highlights from a longer video
+- Trimming intro/outro from recorded content
+- Creating short clips for social media from longer videos
+- Isolating specific segments for further processing in a workflow
+<!-- END MANUAL -->
+
+---
--- a/docs/integrations/block-integrations/video/concat.md
+++ b/docs/integrations/block-integrations/video/concat.md
@@ -0,0 +1,41 @@
+# Video Concat
+<!-- MANUAL: file_description -->
+This block merges multiple video clips into a single continuous video, with optional transitions between clips.
+<!-- END MANUAL -->
+
+## Video Concat
+
+### What it is
+Merge multiple video clips into one continuous video
+
+### How it works
+<!-- MANUAL: how_it_works -->
+The block uses MoviePy's `concatenate_videoclips` function to join multiple videos in sequence. It supports three transition modes: **none** (direct concatenation), **crossfade** (smooth blending where clips overlap), and **fade_black** (each clip fades out to black and the next fades in). At least 2 videos are required. The output is encoded with H.264 video codec and AAC audio codec.
+<!-- END MANUAL -->
+
+### Inputs
+
+| Input | Description | Type | Required |
+|-------|-------------|------|----------|
+| videos | List of video files to concatenate (in order) | List[str (file)] | Yes |
+| transition | Transition between clips | "none" \| "crossfade" \| "fade_black" | No |
+| transition_duration | Transition duration in seconds | int | No |
+| output_format | Output format | "mp4" \| "webm" \| "mkv" \| "mov" | No |
+
+### Outputs
+
+| Output | Description | Type |
+|--------|-------------|------|
+| error | Error message if the operation failed | str |
+| video_out | Concatenated video file (path or data URI) | str (file) |
+| total_duration | Total duration in seconds | float |
+
+### Possible use case
+<!-- MANUAL: use_case -->
+- Combining multiple clips into a compilation video
+- Assembling intro, main content, and outro segments
+- Creating montages from multiple source videos
+- Building video playlists or slideshows with transitions
+<!-- END MANUAL -->
+
+---
--- a/docs/integrations/block-integrations/video/download.md
+++ b/docs/integrations/block-integrations/video/download.md
@@ -0,0 +1,42 @@
+# Video Download
+<!-- MANUAL: file_description -->
+This block downloads videos from URLs, supporting a wide range of video platforms and direct links.
+<!-- END MANUAL -->
+
+## Video Download
+
+### What it is
+Download video from URL (YouTube, Vimeo, news sites, direct links)
+
+### How it works
+<!-- MANUAL: how_it_works -->
+The block uses yt-dlp, a powerful video downloading library that supports over 1000 websites. It accepts a URL, quality preference, and output format, then downloads the video while merging the best available video and audio streams for the selected quality. Quality options: **best** (highest available), **1080p/720p/480p** (maximum resolution at that height), **audio_only** (extracts just the audio track).
+<!-- END MANUAL -->
+
+### Inputs
+
+| Input | Description | Type | Required |
+|-------|-------------|------|----------|
+| url | URL of the video to download (YouTube, Vimeo, direct link, etc.) | str | Yes |
+| quality | Video quality preference | "best" \| "1080p" \| "720p" \| "480p" \| "audio_only" | No |
+| output_format | Output video format | "mp4" \| "webm" \| "mkv" | No |
+
+### Outputs
+
+| Output | Description | Type |
+|--------|-------------|------|
+| error | Error message if the operation failed | str |
+| video_file | Downloaded video (path or data URI) | str (file) |
+| duration | Video duration in seconds | float |
+| title | Video title from source | str |
+| source_url | Original source URL | str |
+
+### Possible use case
+<!-- MANUAL: use_case -->
+- Downloading source videos for editing or remixing
+- Archiving video content for offline processing
+- Extracting audio from videos for transcription or podcast creation
+- Gathering video content for automated content pipelines
+<!-- END MANUAL -->
+
+---
--- a/docs/integrations/block-integrations/video/duration.md
+++ b/docs/integrations/block-integrations/video/duration.md
@@ -0,0 +1,38 @@
+# Video Duration
+<!-- MANUAL: file_description -->
+This block retrieves the duration of video or audio files, useful for planning and conditional logic in media workflows.
+<!-- END MANUAL -->
+
+## Media Duration
+
+### What it is
+Block to get the duration of a media file.
+
+### How it works
+<!-- MANUAL: how_it_works -->
+The block uses MoviePy to load the media file and extract its duration property. It supports both video files (using VideoFileClip) and audio files (using AudioFileClip), determined by the `is_video` flag. The media can be provided as a URL, data URI, or local file path. The duration is returned in seconds as a floating-point number.
+<!-- END MANUAL -->
+
+### Inputs
+
+| Input | Description | Type | Required |
+|-------|-------------|------|----------|
+| media_in | Media input (URL, data URI, or local path). | str (file) | Yes |
+| is_video | Whether the media is a video (True) or audio (False). | bool | No |
+
+### Outputs
+
+| Output | Description | Type |
+|--------|-------------|------|
+| error | Error message if the operation failed | str |
+| duration | Duration of the media file (in seconds). | float |
+
+### Possible use case
+<!-- MANUAL: use_case -->
+- Checking video length before processing to avoid timeout issues
+- Calculating how many times to loop a video to reach a target duration
+- Validating that uploaded content meets length requirements
+- Building conditional workflows based on media duration
+<!-- END MANUAL -->
+
+---
--- a/docs/integrations/block-integrations/video/loop.md
+++ b/docs/integrations/block-integrations/video/loop.md
@@ -0,0 +1,39 @@
+# Video Loop
+<!-- MANUAL: file_description -->
+This block repeats a video to extend its duration, either to a specific length or a set number of repetitions.
+<!-- END MANUAL -->
+
+## Loop Video
+
+### What it is
+Block to loop a video to a given duration or number of repeats.
+
+### How it works
+<!-- MANUAL: how_it_works -->
+The block uses MoviePy's Loop effect to repeat a video clip. You can specify either a target duration (the video will repeat until reaching that length) or a number of loops (the video will repeat that many times). The Loop effect handles both video and audio looping automatically, maintaining sync. Either `duration` or `n_loops` must be provided. The output is encoded with H.264 video codec and AAC audio codec.
+<!-- END MANUAL -->
+
+### Inputs
+
+| Input | Description | Type | Required |
+|-------|-------------|------|----------|
+| video_in | The input video (can be a URL, data URI, or local path). | str (file) | Yes |
+| duration | Target duration (in seconds) to loop the video to. Either duration or n_loops must be provided. | float | No |
+| n_loops | Number of times to repeat the video. Either n_loops or duration must be provided. | int | No |
+
+### Outputs
+
+| Output | Description | Type |
+|--------|-------------|------|
+| error | Error message if the operation failed | str |
+| video_out | Looped video returned either as a relative path or a data URI. | str (file) |
+
+### Possible use case
+<!-- MANUAL: use_case -->
+- Extending a short background video to match the length of narration audio
+- Creating seamless looping content for digital signage
+- Repeating a product demo video multiple times for emphasis
+- Extending short clips to meet minimum duration requirements for platforms
+<!-- END MANUAL -->
+
+---
--- a/docs/integrations/block-integrations/video/narration.md
+++ b/docs/integrations/block-integrations/video/narration.md
@@ -0,0 +1,44 @@
+# Video Narration
+<!-- MANUAL: file_description -->
+This block generates AI voiceover narration using ElevenLabs and adds it to a video, with flexible audio mixing options.
+<!-- END MANUAL -->
+
+## Video Narration
+
+### What it is
+Generate AI narration and add to video
+
+### How it works
+<!-- MANUAL: how_it_works -->
+The block uses ElevenLabs text-to-speech API to generate natural-sounding narration from your script. It then combines the narration with the video using MoviePy. Three audio mixing modes are available: **replace** (completely replaces original audio), **mix** (blends narration with original audio at configurable volumes), and **ducking** (similar to mix but applies stronger attenuation to original audio, making narration more prominent). The block outputs both the final video and the generated audio file separately.
+<!-- END MANUAL -->
+
+### Inputs
+
+| Input | Description | Type | Required |
+|-------|-------------|------|----------|
+| video_in | Input video (URL, data URI, or local path) | str (file) | Yes |
+| script | Narration script text | str | Yes |
+| voice_id | ElevenLabs voice ID | str | No |
+| model_id | ElevenLabs TTS model | "eleven_multilingual_v2" \| "eleven_flash_v2_5" \| "eleven_turbo_v2_5" \| "eleven_turbo_v2" | No |
+| mix_mode | How to combine with original audio. 'ducking' applies stronger attenuation than 'mix'. | "replace" \| "mix" \| "ducking" | No |
+| narration_volume | Narration volume (0.0 to 2.0) | float | No |
+| original_volume | Original audio volume when mixing (0.0 to 1.0) | float | No |
+
+### Outputs
+
+| Output | Description | Type |
+|--------|-------------|------|
+| error | Error message if the operation failed | str |
+| video_out | Video with narration (path or data URI) | str (file) |
+| audio_file | Generated audio file (path or data URI) | str (file) |
+
+### Possible use case
+<!-- MANUAL: use_case -->
+- Adding professional voiceover to product demos or tutorials
+- Creating narrated explainer videos from screen recordings
+- Generating multi-language versions of video content
+- Adding commentary to gameplay or walkthrough videos
+<!-- END MANUAL -->
+
+---
--- a/docs/integrations/block-integrations/video/text_overlay.md
+++ b/docs/integrations/block-integrations/video/text_overlay.md
@@ -0,0 +1,44 @@
+# Video Text Overlay
+<!-- MANUAL: file_description -->
+This block adds customizable text captions or titles to videos, with control over positioning, timing, and styling.
+<!-- END MANUAL -->
+
+## Video Text Overlay
+
+### What it is
+Add text overlay/caption to video
+
+### How it works
+<!-- MANUAL: how_it_works -->
+The block uses MoviePy's TextClip and CompositeVideoClip to render text onto video frames. The text is created as a separate clip with configurable font size, color, and optional background color, then composited over the video at the specified position. Timing can be controlled to show text only during specific portions of the video. Position options include center alignments (top, center, bottom) and corner positions (top-left, top-right, bottom-left, bottom-right). The output is encoded with H.264 video codec and AAC audio codec.
+<!-- END MANUAL -->
+
+### Inputs
+
+| Input | Description | Type | Required |
+|-------|-------------|------|----------|
+| video_in | Input video (URL, data URI, or local path) | str (file) | Yes |
+| text | Text to overlay on video | str | Yes |
+| position | Position of text on screen | "top" \| "center" \| "bottom" \| "top-left" \| "top-right" \| "bottom-left" \| "bottom-right" | No |
+| start_time | When to show text (seconds). None = entire video | float | No |
+| end_time | When to hide text (seconds). None = until end | float | No |
+| font_size | Font size | int | No |
+| font_color | Font color (hex or name) | str | No |
+| bg_color | Background color behind text (None for transparent) | str | No |
+
+### Outputs
+
+| Output | Description | Type |
+|--------|-------------|------|
+| error | Error message if the operation failed | str |
+| video_out | Video with text overlay (path or data URI) | str (file) |
+
+### Possible use case
+<!-- MANUAL: use_case -->
+- Adding titles or chapter headings to video content
+- Creating lower-thirds with speaker names or captions
+- Watermarking videos with branding text
+- Adding call-to-action text at specific moments in a video
+<!-- END MANUAL -->
+
+---