8 Billion Parameter MiniCPM-V 4.5 Claims Top On-Device Multimodal AI Performance, Surpassing Larger Models

OpenBMB and TsinghuaNLP have recently launched MiniCPM-V 4.5, an 8 billion parameter multimodal large language model (MLLM), asserting its superior performance for on-device applications. The model, officially released on August 26, 2025, aims to redefine efficient and powerful AI processing directly on consumer devices. The availability of a dedicated MiniCPM-V app was recently highlighted by user "AK" on social media, providing a direct link for users to access the new application.

MiniCPM-V 4.5 reportedly surpasses leading proprietary and open-source models, including GPT-4o-latest, Gemini-2.0 Pro, and Qwen2.5-VL 72B, in vision-language capabilities. This compact 8B parameter model achieved an average score of 77.0 on OpenCompass, a comprehensive evaluation across eight popular benchmarks, positioning it as the most performant MLLM under 30 billion parameters in the open-source community. Its strengths are particularly evident in single-image, multi-image, and video understanding tasks.

A significant technical advancement in MiniCPM-V 4.5 is its Unified 3D-Resampler, which enables an impressive 96x compression rate for video tokens. This innovation allows the model to process up to six 448x448 video frames into just 64 tokens, facilitating efficient high-refresh-rate (up to 10FPS) and long video understanding without increasing LLM inference costs. Additionally, the model offers a controllable hybrid fast/deep thinking mode, optimizing performance for diverse user scenarios by balancing efficiency with complex problem-solving.

The model also boasts strong handwritten OCR and complex document parsing capabilities, built upon the LLaVA-UHD architecture, which allows it to process high-resolution images up to 1.8 million pixels with significantly fewer visual tokens. Furthermore, MiniCPM-V 4.5 incorporates RLAIF-V (Reinforcement Learning from AI Feedback) to ensure trustworthy behavior, reporting a low 10.3% hallucination rate on Object HalBench, a performance metric that reportedly surpasses GPT-4o-latest. The model also supports over 30 languages, enhancing its global utility.

Designed for maximal accessibility, MiniCPM-V 4.5 is optimized for end-side deployment, including a dedicated iOS app for iPhone and iPad users. It offers broad compatibility with various local inference frameworks such as llama.cpp, Ollama, and vLLM. The availability of the model in 16 quantized sizes, including int4, GGUF, and AWQ formats, underscores its commitment to delivering advanced multimodal AI efficiently on a wide range of consumer hardware, promoting both privacy and low-latency interactions.