mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-07 05:15:09 -05:00

Files

Nicholas Tindle 85b6520710 feat(blocks): Add video editing blocks (#11796 )

<!-- Clearly explain the need for these changes: -->
This PR adds general-purpose video editing blocks for the AutoGPT
Platform, enabling automated video production workflows like documentary
creation, marketing videos, tutorial assembly, and content repurposing.

### Changes 🏗️

<!-- Concisely describe all of the changes made in this pull request:
-->

**New blocks added in `backend/blocks/video/`:**
- `VideoDownloadBlock` - Download videos from URLs (YouTube, Vimeo, news
sites, direct links) using yt-dlp
- `VideoClipBlock` - Extract time segments from videos with start/end
time validation
- `VideoConcatBlock` - Merge multiple video clips with optional
transitions (none, crossfade, fade_black)
- `VideoTextOverlayBlock` - Add text overlays/captions with positioning
and timing options
- `VideoNarrationBlock` - Generate AI narration via ElevenLabs and mix
with video audio (replace, mix, or ducking modes)

**Dependencies required:**
- `yt-dlp` - For video downloading
- `moviepy` - For video editing operations

**Implementation details:**
- All blocks follow the SDK pattern with proper error handling and
exception chaining
- Proper resource cleanup in `finally` blocks to prevent memory leaks
- Input validation (e.g., end_time > start_time)
- Test mocks included for CI

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Blocks follow the SDK pattern with
`BlockSchemaInput`/`BlockSchemaOutput`
  - [x] Resource cleanup is implemented in `finally` blocks
  - [x] Exception chaining is properly implemented
  - [x] Input validation is in place
  - [x] Test mocks are provided for CI environments

#### For configuration changes:
- [ ] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [ ] I have included a list of my configuration changes in the PR
description (under **Changes**)

N/A - No configuration changes required.


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds new multimedia blocks that invoke ffmpeg/MoviePy and introduces
new external dependencies (plus container packages), which can impact
runtime stability and resource usage; download/overlay blocks are
present but disabled due to sandbox/policy concerns.
> 
> **Overview**
> Adds a new `backend.blocks.video` module with general-purpose video
workflow blocks (download, clip, concat w/ transitions, loop, add-audio,
text overlay, and ElevenLabs-powered narration), including shared
utilities for codec selection, filename cleanup, and an ffmpeg-based
chapter-strip workaround for MoviePy.
> 
> Extends credentials/config to support ElevenLabs
(`ELEVENLABS_API_KEY`, provider enum, system credentials, and cost
config) and adds new dependencies (`elevenlabs`, `yt-dlp`) plus Docker
runtime packages (`ffmpeg`, `imagemagick`).
> 
> Improves file/reference handling end-to-end by embedding MIME types in
`workspace://...#mime` outputs and updating frontend rendering to detect
video vs image from MIME fragments (and broaden supported audio/video
extensions), with optional enhanced output rendering behind a feature
flag in the legacy builder UI.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
da7a44d794. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-05 22:22:33 +00:00

2.1 KiB

Raw Blame History

Video Narration

This block generates AI voiceover narration using ElevenLabs and adds it to a video, with flexible audio mixing options.

Video Narration

What it is

Generate AI narration and add to video

How it works

The block uses ElevenLabs text-to-speech API to generate natural-sounding narration from your script. It then combines the narration with the video using MoviePy. Three audio mixing modes are available: replace (completely replaces original audio), mix (blends narration with original audio at configurable volumes), and ducking (similar to mix but applies stronger attenuation to original audio, making narration more prominent). The block outputs both the final video and the generated audio file separately.

Inputs

Input	Description	Type	Required
video_in	Input video (URL, data URI, or local path)	str (file)	Yes
script	Narration script text	str	Yes
voice_id	ElevenLabs voice ID	str	No
model_id	ElevenLabs TTS model	"eleven_multilingual_v2" \| "eleven_flash_v2_5" \| "eleven_turbo_v2_5" \| "eleven_turbo_v2"	No
mix_mode	How to combine with original audio. 'ducking' applies stronger attenuation than 'mix'.	"replace" \| "mix" \| "ducking"	No
narration_volume	Narration volume (0.0 to 2.0)	float	No
original_volume	Original audio volume when mixing (0.0 to 1.0)	float	No

Outputs

Output	Description	Type
error	Error message if the operation failed	str
video_out	Video with narration (path or data URI)	str (file)
audio_file	Generated audio file (path or data URI)	str (file)

Possible use case

Adding professional voiceover to product demos or tutorials
Creating narrated explainer videos from screen recordings
Generating multi-language versions of video content
Adding commentary to gameplay or walkthrough videos

2.1 KiB Raw Blame History

Video Narration

Video Narration

What it is

How it works

Inputs

Outputs

Possible use case

2.1 KiB

Raw Blame History