Meta AI has released Movie Gen, a suite of foundation models capable of generating high-quality videos and audio from text prompts. These models are trained on a massive dataset of images, videos, and audio, and can create realistic videos of up to 16 seconds long with synchronized audio. The research paper explores the architecture, training objectives, and evaluation metrics for Movie Gen, highlighting the innovative use of transformers, flow matching, and temporal autoencoders. The paper also details the development of personalized video generation capabilities, allowing users to create videos featuring a specific person, and video editing capabilities, enabling users to make precise changes to both real and generated videos. Additionally, Movie Gen Audio, a specialized model for generating audio, showcases the ability to create cinematic soundtracks, including both diegetic sound effects and non-diegetic music, that align with the visual content of a video.