DeepSeek Launches Its Debut Multi-modal LLM

DeepSeek introduces Janus, a multimodal LLM framework that decouples visual encoding into distinct pathways, boosting flexibility and performance while rivaling task-specific models.

October 18, 20242899 views0

DeepSeek AI Unveils Janus With 1.3 Billion Parameters

Hangzhou, China – 18th October 2024 – DeepSeek, a Chinese startup focused on artificial general intelligence (AGI), launched Janus, its novel autoregressive framework designed for multimodal understanding and generation tasks. Janus stands out by addressing limitations in earlier models by decoupling visual encoding into distinct pathways.

DeepSeek’s first multi-modal LLM will be available on Hugging Face, announced by Philipp Schmid, the Tech Lead and LLMs at HuggingFace, with a post on X:

First Multimodal Model from @deepseek_ai is now on @huggingface!
> Janus is a 1.3B unified MLLM, which decouples visual encoding for multimodal understanding and generation.
> Its based on DeepSeek-LLM-1.3b-base and SigLIP-L as the vision encoderhttps://t.co/jaOHLdG5Lh

— Philipp Schmid (@_philschmid) October 18, 2024

While the visual encoding is separated for each task, Janus utilizes a single, unified transformer architecture for processing. By decoupling the visual encoding pathways, Janus resolves conflicts that visual encoders face: handling both understanding and generation tasks.

The launch of Janus represents the ability to integrate MLLMs seamlessly across various tasks, a significant improvement over its predecessors. It also leads to enhanced flexibility without sacrificing performance.

However, Janus not only surpasses previous models but also exceeds the performances of task-specific models. It improves the handling of multimodal inputs compared to older frameworks, making Janus a frontrunner among the next generation of unified multimodal models.

Janus is built on DeepSeek’s LLM-1.3b-base and is trained on approximately 500B text tokens. It also leverages SigLIP-L as the vision encoder, supporting image input resolutions of 384 x 384. This makes Janus a strong contender for potentially driving innovations in AI-powered content creation, multimedia analysis and more.

DeepSeek’s Janus positions itself as a leading solution in the evolving multimodal LLM landscape by offering decoupled visual pathways but retaining a unified transformer framework. Its flexibility without compromising performance will make it a popular tool among Hugging Face’s community and other AI developers.

Latest Stories:

Google Cloud Aims To Solve Healthcare Woes With AI Tools

Lantronix’s SmartLV Gateway Brings AI To The Edge

RSA Maintains Its Dominance In IGA With Another Award

Aman Dasgupta

Aman is an experienced content marketer and strategist with expertise in technology, finance and marketing. With an engineering background, he aims to simplify the latest news and trends in technology for digital audiences. Having worked with leading tech businesses in AI/ML, data science, AR/VR and Web 3.0, Aman helps decision-makers stay up-to-date and informed on everything technology.