17 Commits

Author SHA1 Message Date
Kayvan Sylvan
da28d91d65 refactor: extract common yt-dlp logic to reduce code duplication in YouTube plugin
## CHANGES

- Extract shared yt-dlp logic into tryMethodYtDlpInternal helper
- Add processVTTFileFunc parameter for flexible VTT processing
- Implement language matching for 2-char language codes
- Refactor tryMethodYtDlp to use new helper function
- Refactor tryMethodYtDlpWithTimestamps to use helper
- Reduce code duplication between transcript methods
- Maintain existing functionality with cleaner structure
2025-06-17 00:32:33 -07:00
Kayvan Sylvan
680febbe66 *fix: replace Unix-specific file operations with cross-platform alternatives
## CHANGES

- Replace hardcoded `/tmp` with `os.TempDir()` for paths
- Use `filepath.Join()` instead of string concatenation
- Remove Unix `find` command dependency completely
- Add new `findVTTFiles()` method using `filepath.Walk()`
- Make VTT file discovery work on Windows
- Improve error handling for file operations
- Maintain backward compatibility with existing functionality
2025-06-11 22:24:48 -07:00
Kayvan Sylvan
2dfd78ef0b feat: cleanup after yt-dlp addition
### CHANGES
- Update README with yt-dlp requirement for transcripts
- Ensure the errors are clear and actionable.
2025-06-11 17:27:11 -07:00
Kayvan Sylvan
704ad3067a refactor: replace web scraping with yt-dlp for YouTube transcript extraction
## CHANGES

- Remove unreliable YouTube API scraping methods
- Add yt-dlp integration for transcript extraction
- Implement VTT subtitle parsing functionality
- Add timestamp preservation for transcripts
- Remove soup HTML parsing dependency
- Add error handling for missing yt-dlp
- Create temporary directory management
- Support multiple subtitle format fallbacks
2025-06-11 14:24:40 -07:00
Kayvan Sylvan
1dafb09e07 remove spurious newline 2025-03-04 21:20:24 -08:00
Kayvan Sylvan
e8caf9fc10 feat: update YouTube regex to support live URLs 2025-03-04 21:18:51 -08:00
Krzysztof Łuczak
f3a1982e30 Add the ability to grab YouTube video transcript with timestamps
This commit adds the ability to grab the transcript
of a YouTube video with timestamps. The timestamps
are formatted as HH:MM:SS and are prepended to
each line of the transcript. The feature is enabled
by the new `--transcript-with-timestamps` flag,
so it's similar to the existing `--transcript` flag.

Example future use-case:

Providing summary of a video that includes timestamps
for quick navigation to specific parts of the video.
2025-02-07 15:25:22 +01:00
Cory Sougstad
1629f36c59 Better metadata 2025-01-02 09:41:15 -07:00
Cory Sougstad
b4b8b96260 Added metadata lookup to youtube helper 2024-12-31 15:31:17 -07:00
Eugen Eisler
e01a84b21d Merge branch 'main' into fix/yt-shorts 2024-11-06 23:22:52 +01:00
Eugen Eisler
b5b45c8474 fix: short YouTube url patter 2024-11-06 21:54:36 +01:00
butterflyx
203add15e5 [add] VideoID for YT shorts 2024-11-05 19:55:45 +01:00
Eugen Eisler
0bb4f58222 fix: bufio.Scanner message too long 2024-11-05 11:25:32 +01:00
Eugen Eisler
b750171593 feat: YouTube PlayList support 2024-11-04 10:48:22 +01:00
Eugen Eisler
e17b96d864 feat: write tools output also to output file if defined; fix XouTube transcript ' character 2024-10-30 13:50:52 +01:00
Eugen Eisler
61f66f88e3 feat: plugins arch., new setup procedure 2024-10-19 13:09:37 +02:00
Eugen Eisler
17bde814cc feat: restructure for better reuse 2024-10-12 22:25:17 +03:00