
Description
This repo performs various operations on video and audio files, including:
- Extracting short video clips from longer ones.
- Enhancing audio by adjusting pitch and volume, eg. for a deeper voice.
- Compressing and converting video files to WebM format.
- Extracting audio from a video and saving it as an MP3 file.
- Amplifying audio if necessary.
- Transcribing audio using Whisper.
- Correcting raw audio transcripts using ChatGPT.
- Embedding subtitles into the WebM video files.
Main Functions
- Extract video clips.
- Enhance audio in a video file.
- Convert video to WebM format for web optimization.
- Convert audio to MP3 and amplify it.
- Transcribe audio using Whisper.
- Correct transcripts using AI (ChatGPT).
- Add subtitles to videos.
The main file of this repo is runtools.py. In this file, (un)comment the functions you want execute.
Single Video Transcription
For quick, standalone transcription of individual videos, use transcribe_single_video.py. This script provides:
- Zero-cost local processing: Uses FFmpeg + Whisper locally (no API calls required)
- High-quality output: Employs the Whisper large-v2 model for accurate transcription
- Simple workflow: Extract audio → Transcribe → Save as plain text
When to use this script
- Quick transcription of a single video without running the full pipeline
- Testing transcription quality before processing larger batches
- Transcribing videos where the speaker has an accent (supports language forcing)
Configuration
Before running, modify these variables at the top of the script:
FFMPEG_PATH: Path to your FFmpeg executable
WHISPER_MODEL_PATH: Directory for Whisper model cache
INPUT_DIR / OUTPUT_DIR: Source and destination folders
TEST_VIDEO: Filename of the video to transcribe
LANG: Language code (e.g., “en” for English, “nl” for Dutch)
Requirements
- FFmpeg for video/audio processing. It must be installed on your machine and added to the PATH variable
- OpenAI API (Whisper and ChatGPT models) for transcription and transcript correction.
- Set OpenAI API key for ChatGPT in the .env file. Whisper can be run without API key
Demo
Using this toolkit, an mp4-video has been converted into the following products:
- A WebM video. In this video, the sound volume has been amplified and the voice of the speaker has been made lower/deeper. Also the file size of the webm is about 10 times smaller than the orginal mp4.
- A full text audio transcript (.txt) has been generated. It has been embedded in the video description. This was done using Whisper with ChatGPT post-corrections.
- Closed captions / subtitles in English were also generated. This was done using Whisper with ChatGPT post-corrections.
Articles
Info