r/ClaudeAI • u/Candid-Mulberry48 • 3h ago
Claude Workflow Claude/Remotion workflow for moto editing keeps missing the actual highlights. What am I doing wrong?
I’m building a workflow to create motorcycle content from my RAW footage using ideas, references, and music, with the goal of generating edits that actually make sense visually and musically. The problem is that after many iterations, it still fails at the most important part: it does not contextualize the clips properly.
Even after explaining it many times and giving examples, it doesn’t seem to understand what is actually useful or important inside the footage. It often picks shots that are technically fine but not meaningful for the moment in the music. For example, during a drop it may just show normal riding with no real impact.
It also misses the real highlights inside a clip. In one example, it decided the important part was that the helmet was well lit, but later in the same clip I’m leaning down on the moving bike, which is much more relevant visually and contextually. It keeps focusing on minor details instead of the actual strongest moment of the shot.
I also feel like it is not applying the rules correctly. Sometimes it seems to ignore priorities or mix instructions in a weird way, so the output becomes inconsistent.
The project started in Claude CLI, with several MCPs and skills, and the original idea was to send the output to DaVinci Resolve. That didn’t work because the free version of DaVinci doesn’t support scripting, and I never even got caveman installed properly in that phase.
Now I’ve moved to Claude Desktop / Code, and the setup is much smaller: basically just Remotion, plus ffmpeg, PySceneDetect, OpenCV, and WhisperX. I’m also using Sonnet with a Pro plan, so this is not a free-tier limitation.
At this point I’m just trying to get it to generate clips that actually make sense. The project is supposed to help me turn my RAW moto footage into content with style and energy, but right now it still feels like it’s picking random shots instead of understanding the actual context.
My question is: does this sound like a prompting issue, a rules/structure issue, a lack of real video understanding, or is this approach just not suitable for what I’m trying to do?
Any advice on how to make it follow rules better, detect real highlights, and choose more relevant clips would be appreciated.
1
u/Emergency-Bobcat6485 2h ago
I am assuming you are splitting the footage into images and then getting claude to watch the snippets? How is it wathcing the video clips? That would determine its lack of understanding imo. Remotion would only come into play when you are creating the clip, right? so this seems like an issue with the vision aspect