RWKV Community Member Calls for Significant Pretraining Sponsorship

Image for RWKV Community Member Calls for Significant Pretraining Sponsorship

A prominent member of the RWKV language model community, Yam Peleg, has publicly called for substantial sponsorship to fund a large-scale pretraining run for the innovative AI architecture. Peleg's plea, made on social media, highlights the ongoing need for significant computational resources to advance the development of RWKV models, which offer a distinct alternative to the dominant Transformer architecture.

"Someone please sponsor a huge-ass RWKV pretraining run already," Peleg stated in the tweet, underscoring the critical demand for resources.

RWKV (Receptance Weighted Key Value) is an open-source large language model architecture that combines the parallel training capabilities of Transformers with the efficient, constant-time inference of Recurrent Neural Networks (RNNs). This hybrid approach aims to address the quadratic scaling issues of Transformers, particularly concerning memory usage and inference speed for long contexts. The project, initially proposed by Bo Peng (Blink_DL), joined the Linux Foundation in September 2023 and has seen rapid development, with its latest iteration being RWKV-7 "Goose."

Pretraining large language models like RWKV requires immense computational power, often involving thousands of high-performance GPUs and millions of dollars in electricity costs. Industry estimates suggest that pretraining a 70-billion-parameter model can cost between $1 million and $5 million, depending on factors like data size, training duration, and hardware efficiency. A "huge-ass" run, as requested by Peleg, would likely fall into or exceed the higher end of this spectrum, potentially pushing towards tens of millions for even larger models or extended training.

YuanShi Intelligence (RWKV), the commercial entity behind the open-source project, recently secured a tens of millions of RMB angel funding round from Tianji Capital in December. This investment is earmarked for team expansion, architecture iteration, and commercialization efforts, indicating growing confidence in RWKV's potential. However, the tweet from Peleg suggests that the demand for raw compute for foundational pretraining still outstrips current resources, pointing to the high capital intensity of advanced AI research and development.

The RWKV community has successfully released models up to 14 billion parameters, with a 32 billion parameter preview model available and plans for 70 billion parameter models in 2025. These advancements demonstrate RWKV's ability to scale, but each step up in model size significantly increases the financial and computational burden of pretraining. Securing a major sponsorship for a large pretraining run could accelerate the development of even more capable RWKV models, potentially enabling them to compete more directly with state-of-the-art Transformer-based models while maintaining their efficiency advantages.