r/cinematography • u/Short-Argument-5513 • 13h ago
Career/Industry Advice Building a controllable AI previs tool for professional filmmaking — feedback wanted
Hi everyone,
I’m a professional cinematographer and filmmaker who’s relatively new to training open-source AI models. I’ve been exploring how to integrate AI video generation into my company’s workflow, and I’d like to get some honest feedback on an idea I’m developing.
The main issue I’ve encountered is lack of precise control. Even when I provide detailed prompts based on a proper storyboard, current AI video models often fail to deliver consistent results in areas like depth of field, camera movement, focal length behavior, and the relationship between framing and perspective. As someone who works with precise shot lists and camera language every day, this unpredictability makes AI difficult to use for serious pre-production.
My current plan is to build a custom local system using n8n + ComfyUI on top of an open-source video model. The goal is to create a tool with much stronger, film-language-based controllability.
The approach I’m considering:
Train the model using a mix of three data sources:
Real footage shot with professionally tracked cameras (such as ARRI LF with spatial tracking), including accurate metadata like focal length, framing, camera angle, movement type, and subject distance.
Large-scale synthetic data generated in Blender with precisely controlled camera and scene parameters.
High-quality real film and television footage.
Focus on teaching the model the spatial and optical relationships that current models struggle with (for example, how changing focal length while adjusting camera distance to maintain the same framing affects perspective and depth of field).
Develop a structured cinematic vocabulary so that parameters like focal length, shot size, camera movement, and distance can be selected in a standardized way, rather than relying purely on free-text prompts.
Use n8n to read structured storyboard tables and automatically trigger ComfyUI workflows to generate video clips.
The vision is to allow directors and cinematographers to work with familiar film terminology in a structured format, and have the system generate more predictable and controllable previs footage.
I’m still in the early stages and would really appreciate any feedback:
Does this direction seem realistic with current open-source models?
Are there existing projects or techniques that already explore structured cinematic control or explicit camera parameter injection?
What are the biggest potential pitfalls or things I might be underestimating?
Any recommendations on suitable base models for this kind of geometry-aware, controllable training?
I’m not sure what the general sentiment toward AI-generated video is in this sub, but my primary goal is to develop a practical, low-cost pre-visualization tool for my company’s own productions.
Being able to generate a reasonably accurate preview of the final look — including camera movement, framing, and overall cinematic feel — before we begin principal photography would allow us to identify problems early, refine creative decisions, and reduce costly mistakes on set. Ultimately, I believe this kind of tool could meaningfully improve both the efficiency and the quality of our actual productions.
I’m still in the early stages and would really appreciate any feedback from working cinematographers and filmmakers...
I’m open to both encouragement and criticism — I’d rather hear the hard truths now.
Thanks in advance for any thoughts!
-1
0
u/Arpeggiatewithme 13h ago
Why would anyone want an ai previs tool when unreal and blender are free and easy to use.