Philipp Schmid, a technical lead at open-source AI community platform Hugging Face, took to X to announce its latest AI model. Dubbed the Pyramid Flow SD3, the 2 billion parameter Diffusion Transformer (DiT) is capable of generating up to 10-second videos based on text or image inputs.
The open-source model will be MIT-licensed, as the developers aim to foster wide adoption among creative professionals and developers. Moreover, Pyramid Flow SD3 offers both text-to-video and image-to-video capabilities, with Schmid describing it in his post as the first “real good open-source text-to-video model.”
Let’s go! First real good open-source text-to-video model with MIT license! Pyramid Flow SD3 is a 2B Diffusion Transformer (DiT) that can generate 10-second videos at 768p with 24fps! 🤯 🎥✨
TL;DR;
🎬 Can Generate 10-second videos at 768p/24FPS
🍹 2B parameter single unified… pic.twitter.com/eel7PTtndo— Philipp Schmid (@_philschmid) October 10, 2024
Pyramid Flow SD3 leverages a technique called Flow Matching to enhance its training efficiency, leading to significantly less resource consumption when compared to traditional video LLMs. Additionally, an easy two-step process for developers to integrate the model will make it more accessible to Hugging Face’s community.
The video LLM, trained purely on open-source datasets, offers three variants:
- A resolution of 384p for a 5-second, 24FPS video
- A resolution of 768p for a 10-second, 24FPS video
- A natural image-to-video generation
While the team released the technical report, project page and open-source model checkpoint today, the training code and new model checkpoints will be made available soon. Yet, users and developers are excited, with the founder of artul.ai, commenting,
Interesting development! Pyramid Flow SD3 adapts multi-scale temporal fusion from optical flow models to maintain coherence across frames. Leveraging this technique in a 2B parameter model is groundbreaking for open-source video generation.
— Akram Artul (50% human, 50% ai) (@bate5a55) October 10, 2024
Latest News:
IBM Reveals Mainframes Are Crucial for AI and Hybrid Cloud.
Dunelm Enhances Online Shopping Using Google Cloud’s AI Technology