Commit Graph

3 Commits

Author SHA1 Message Date
lif
266e0d79d4 fix(blocks): add YouTube Shorts URL support (#11659)
## Summary
Added support for parsing YouTube Shorts URLs (`youtube.com/shorts/...`)
in the TranscribeYoutubeVideoBlock to extract video IDs correctly.

## Changes
- Modified `_extract_video_id` method in `youtube.py` to handle Shorts
URL format
- Added test cases for YouTube Shorts URL extraction

## Related Issue
Fixes #11500

## Test Plan
- [x] Added unit tests for YouTube Shorts URL extraction
- [x] Verified existing YouTube URL formats still work
- [x] CI should pass all existing tests

---------

Co-authored-by: Ubbe <hi@ubbe.dev>
2026-01-05 16:11:45 +00:00
Swifty
7d53c0de27 fix(backend): Fix Youtube blocking our cloud ips (#11456)
Youtube can blocks cloud ips causing the youtube transcribe blocks to
not work. This PR adds webshare proxy to get around this issue

### Changes 🏗️

- add webshare proxy to youtube transcribe block 

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
  - [x] I have tested this works locally using the proxy

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Routes YouTube transcript fetching through Webshare proxy using
user/password credentials, wiring in provider enum, settings, default
credentials, and updated tests.
> 
> - **Blocks** (`backend/blocks/youtube.py`):
> - Use `WebshareProxyConfig` with `YouTubeTranscriptApi` to fetch
transcripts via proxy.
> - Add `credentials` input (`user_password` for `webshare_proxy`);
include test credentials and mocks.
> - Update method signatures: `get_transcript(video_id, credentials)`
and `run(..., *, credentials, ...)`.
>   - Change description to indicate proxy usage; add logging.
> - **Integrations**:
> - Providers (`backend/integrations/providers.py`): add
`ProviderName.WEBSHARE_PROXY`.
> - Credentials store (`backend/integrations/credentials_store.py`): add
`webshare_proxy` `UserPasswordCredentials`; include in
`DEFAULT_CREDENTIALS` and conditionally in `get_all_creds`.
> - **Settings** (`backend/util/settings.py`): add secrets
`webshare_proxy_username` and `webshare_proxy_password`.
> - **Tests** (`test/blocks/test_youtube.py`): update to pass
credentials and assert proxy config; add custom-credentials test; adjust
fallback/priority tests.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
d060898488. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2025-12-01 20:54:52 +01:00
Copilot
90af8f8e1a feat(backend): Add language fallback for YouTube transcription block (#11057)
## Problem

The YouTube transcription block would fail when attempting to transcribe
videos that only had transcripts available in non-English languages.
Even when usable transcripts existed in other languages, the block would
raise a `NoTranscriptFound` error because it only requested English
transcripts.

**Example video that would fail:**
https://www.youtube.com/watch?v=3AMl5d2NKpQ (only has Hungarian
transcripts)

**Error message:**
```
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=3AMl5d2NKpQ! 
No transcripts were found for any of the requested language codes: ('en',)

For this video (3AMl5d2NKpQ) transcripts are available in the following languages:
(GENERATED) - hu ("Hungarian (auto-generated)")
```

## Solution

Implemented intelligent language fallback in the
`TranscribeYoutubeVideoBlock.get_transcript()` method:

1. **First**, tries to fetch English transcript (maintains backward
compatibility)
2. **If English unavailable**, lists all available transcripts and
selects the first one using this priority:
   - Manually created transcripts (any language)
   - Auto-generated transcripts (any language)
3. **Only fails** if no transcripts exist at all

**Example behavior:**
```python
# Before: Video with only Hungarian transcript
get_transcript("3AMl5d2NKpQ")  #  Raises NoTranscriptFound

# After: Video with only Hungarian transcript  
get_transcript("3AMl5d2NKpQ")  #  Returns Hungarian transcript
```

## Changes

- **Modified** `backend/blocks/youtube.py`: Added try-catch logic to
fallback to any available language when English is not found
- **Added** `test/blocks/test_youtube.py`: Comprehensive test suite
covering URL extraction, language fallback, transcript preferences, and
error handling (7 tests)
- **Updated** `docs/content/platform/blocks/youtube.md`: Documented the
language fallback behavior and transcript priority order

## Testing

-  All 7 new unit tests pass
-  Block integration test passes
-  Full test suite: 621 passed, 0 failed (no regressions)
-  Code formatting and linting pass

## Impact

This fix enables the YouTube transcription block to work with
international content while maintaining full backward compatibility:

-  Videos in any language can now be transcribed
-  English is still preferred when available
-  No breaking changes to existing functionality
-  Graceful degradation to available languages

Fixes #10637
Fixes https://linear.app/autogpt/issue/OPEN-2626

> [!WARNING]
>
> <details>
> <summary>Firewall rules blocked me from connecting to one or more
addresses (expand for details)</summary>
>
> #### I tried to connect to the following addresses, but was blocked by
firewall rules:
>
> - `www.youtube.com`
> - Triggering command:
`/home/REDACTED/.cache/pypoetry/virtualenvs/autogpt-platform-backend-Ajv4iu2i-py3.11/bin/python3`
(dns block)
>
> If you need me to access, download, or install something from one of
these locations, you can either:
>
> - Configure [Actions setup
steps](https://gh.io/copilot/actions-setup-steps) to set up my
environment, which run before the firewall is enabled
> - Add the appropriate URLs or hosts to the custom allowlist in this
repository's [Copilot coding agent
settings](https://github.com/Significant-Gravitas/AutoGPT/settings/copilot/coding_agent)
(admins only)
>
> </details>

<!-- START COPILOT CODING AGENT SUFFIX -->



<details>

<summary>Original prompt</summary>

> Issue Title: if theres only one lanague available for transcribe
youtube return that langage not an error
> Issue Description: `Could not retrieve a transcript for the video
https://www.youtube.com/watch?v=3AMl5d2NKpQ! This is most likely caused
by: No transcripts were found for any of the requested language codes:
('en',) For this video (3AMl5d2NKpQ) transcripts are available in the
following languages: (MANUALLY CREATED) None (GENERATED) - hu
("Hungarian (auto-generated)") (TRANSLATION LANGUAGES) None If you are
sure that the described cause is not responsible for this error and that
a transcript should be retrievable, please create an issue at
https://github.com/jdepoix/youtube-transcript-api/issues. Please add
which version of youtube_transcript_api you are using and provide the
information needed to replicate the error. Also make sure that there are
no open issues which already describe your problem!` you can use this
video to test:
[https://www.youtube.com/watch?v=3AMl5d2NKpQ\`](https://www.youtube.com/watch?v=3AMl5d2NKpQ%60)
> Fixes
https://linear.app/autogpt/issue/OPEN-2626/if-theres-only-one-lanague-available-for-transcribe-youtube-return
> 
> 
> Comment by User :
> This thread is for an agent session with githubcopilotcodingagent.
> 
> Comment by User :
> This thread is for an agent session with githubcopilotcodingagent.
> 
> Comment by User :
> This comment thread is synced to a corresponding [GitHub
issue](https://github.com/Significant-Gravitas/AutoGPT/issues/10637).
All replies are displayed in both locations.
> 
> 


</details>


<!-- START COPILOT CODING AGENT TIPS -->
---

 Let Copilot coding agent [set things up for
you](https://github.com/Significant-Gravitas/AutoGPT/issues/new?title=+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2025-10-21 02:31:33 +00:00