Exploring Machine Vision through Generative AI Video
YUSHIEN, 2025
YUSHIEN, 2025
Abstract
This paper examines recent advancements in generative text-to-video AI models, with a focus on Meta's Make-A-Video and Google's Video Diffusion Models (VDM). Both approaches tackle key challenges in generating coherent video content from text prompts, such as limited paired text-video datasets and maintaining temporal consistency. The paper also explores cutting-edge platforms like OpenAI’s Sora and MiniMax’s Hailuo AI, which have made significant strides in producing realistic, high-resolution videos. Despite these advances, persistent challenges remain, including semantic accuracy, nuanced temporal understanding, and ethical concerns.
Keywords: Text-to-Video Models, Video Diffusion Models, machine vision, video synthesis
Click to read the paper