Here's what actually happens
You upload a 47-minute podcast episode. Two minutes later, you get back:
Clip 1 (58 seconds)
"The moment where your guest explains why they quit their 6-figure job. Starts with the setup question, ends on the punchline."
Clip 2 (34 seconds)
"That tangent about morning routines that got a big laugh. Clean cut at the joke landing."
Clip 3 (72 seconds)
"The controversial take on hiring that'll get comments. Includes the reaction shot."
...and 8 more suggestions, ranked by engagement potential
You preview each one. Keep 4. Trash the rest. Those 4 get captions, thumbnails, and social posts generated automatically.
Under the hood
Not magic — pattern recognition at scale.
Transcribes everything
Whisper converts your audio to text with word-level timestamps. We know exactly when each phrase starts and ends.
Finds natural boundaries
Complete thoughts, finished stories, resolved questions. We don't cut mid-sentence or leave ideas hanging.
Spots engaging moments
Strong opinions. Emotional shifts. Surprising reveals. Clear advice. The stuff that holds attention in short-form.
Ranks by potential
Not every moment is clip-worthy. We surface the best candidates first, but you see everything if you want.
Best results with
- → Podcasts and interviews
- → Educational talking-head content
- → Webinars and presentations
- → Commentary and reaction videos
- → Anything with clear spoken dialogue
Won't work well for
- → Music videos (duh)
- → Mostly silent footage
- → Heavy background noise
- → Non-English content (improving)
- → Visual-first content like cooking demos
Clips are just the start
Once you've picked your clips, the same transcript powers everything else:
You're the editor, not the AI
Set target lengths
30 sec for TikTok, 60 for Reels, 90 for YouTube Shorts
Adjust any boundary
Extend 3 seconds here, trim the intro there
Save preferences
Prioritize certain topics or formats next time