YUSHIEN archive

ゆうしえん

Exploring Machine Vision through Generative AI Video

YUSHIEN, 2025

Abstract

This paper examines recent advancements in generative text-to-video AI models, with a focus on Meta's Make-A-Video and Google's Video Diffusion Models (VDM). Both approaches tackle key challenges in generating coherent video content from text prompts, such as limited paired text-video datasets and maintaining temporal consistency. The paper also explores cutting-edge platforms like OpenAI’s Sora and MiniMax’s Hailuo AI, which have made significant strides in producing realistic, high-resolution videos. Despite these advances, persistent challenges remain, including semantic accuracy, nuanced temporal understanding, and ethical concerns.

Keywords: Text-to-Video Models, Video Diffusion Models, machine vision, video synthesis

Click to read the paper