Elon Musk's artificial intelligence company, xAI, recently unveiled its latest large language model, Grok 4, signaling a rapid advancement in the competitive AI landscape. The launch was accompanied by confirmations from Musk himself regarding upcoming, highly anticipated capabilities for the model, including substantial improvements in image and video understanding, and the development of a dedicated video generation model. These enhancements are expected to roll out as part of future iterations, potentially under a designation like "Grok 4.20" as speculated by some.
Grok 4, which is based on version 6 of xAI's foundational model, has already demonstrated impressive performance on various benchmarks. Elon Musk stated that Grok 4 is "PhD-level in everything," showcasing its ability to achieve near-perfect scores on graduate-level exams like the GRE and solve a significant portion of problems on the challenging Humanity's Last Exam. The model is available in both single-agent (Grok 4) and multi-agent (Grok 4 Heavy) versions, with the latter leveraging parallel processing to tackle complex tasks.
A key area of focus for xAI is the model's multimodal capabilities, which Musk acknowledged as a current "weakness." To address this, xAI is actively training version 7 of its foundational model, expected to be completed within weeks. This next-generation model is anticipated to bring a "step function improvement" in Grok's understanding of images, video, and audio, allowing it to "hear and see the world just like any of you."
Beyond enhanced understanding, xAI has confirmed plans for a video generation model. This ambitious project aims to enable "pixel in, pixel out" capabilities, envisioning an "infinite scroll of content" on the X platform where users can not only watch generated videos but also interact and intervene. Training for this video model is slated to begin within the next three to four weeks, utilizing over 100,000 GB200 GPUs.
The rapid development cycle, from Grok 1 in November 2023 to Grok 4 in July 2025, underscores xAI's aggressive push in the AI domain, directly challenging established players like OpenAI and Google. As the company continues to integrate Grok into real-world applications and explore new frontiers like AI-generated video games, the confirmed multimodal and video generation capabilities position Grok for a significant evolution in its interaction with and creation of digital content.