7 Commits

Author SHA1 Message Date
Kayvan Sylvan
dc722f9724 feat: improve timestamp parsing to handle fractional seconds in YouTube tool
## CHANGES

- Move timestamp regex initialization to init function
- Add parseSeconds helper function for fractional seconds
- Replace direct strconv.Atoi calls with parseSeconds function
- Support decimal seconds in timestamp format parsing
- Extract seconds parsing logic into reusable function
2025-07-18 11:53:28 -07:00
Kayvan Sylvan
1a35f32a48 fix: Youtube VTT parsing gap test 2025-07-18 11:27:23 -07:00
Kayvan Sylvan
65bd2753c2 feat: enhance VTT duplicate filtering to allow legitimate repeated content
## CHANGES

- Fix regex escape sequence for timestamp parsing
- Add configurable time gap constant for repeat detection
- Track content with timestamps instead of simple deduplication
- Implement time-based repeat inclusion logic for choruses
- Add timestamp parsing helper functions for calculations
- Allow repeated content after significant time gaps
- Preserve legitimate recurring phrases while filtering duplicates
2025-07-18 11:08:37 -07:00
Kayvan Sylvan
570c9a9404 chore: refactor timestamp regex and seenSegments logic
### CHANGES

- Update `timestampRegex` to support optional seconds/milliseconds
- Change `seenSegments` to use `struct{}` for memory efficiency
- Refactor duplicate check using `struct{}` pattern
- Improve readability by restructuring timestamp logic
2025-07-18 09:44:31 -07:00
Kayvan Sylvan
15151fe9ee chore: refactor timestamp regex to global scope and add spell check words
## CHANGES

- Move timestamp regex to global package scope
- Remove duplicate regex compilation from isTimeStamp function
- Add "horts", "mbed", "WEBVTT", "youtu" to spell checker
- Improve regex performance by avoiding repeated compilation
- Clean up code organization in YouTube module
2025-07-18 09:33:46 -07:00
Kayvan Sylvan
2aad4caf9b fix: prevent duplicate segments in VTT file processing
- Add deduplication map to track seen segments
- Skip duplicate text segments in plain VTT processing
- Skip duplicate segments in timestamped VTT processing
- Improve timestamp regex to handle more formats
- Use clean text as deduplication key consistently
2025-07-18 09:17:03 -07:00
Kayvan Sylvan
4004c51b9e refactor: restructure project to align with standard Go layout
### CHANGES

- Introduce `cmd` directory for all main application binaries.
- Move all Go packages into the `internal` directory.
- Rename the `restapi` package to `server` for clarity.
- Consolidate patterns and strategies into a new `data` directory.
- Group all auxiliary scripts into a new `scripts` directory.
- Move all documentation and images into a `docs` directory.
- Update all Go import paths to reflect the new structure.
- Adjust CI/CD workflows and build commands for new layout.
2025-07-08 22:47:17 -07:00