FinDPO Model Achieves 67% Annual Return and 2.0 Sharpe Ratio in Algorithmic Trading Backtests

A new research paper introduces FinDPO, a financial sentiment analysis model based on Direct Preference Optimization (DPO), which demonstrates significant profitability in live trading simulations. The model, detailed in a paper titled "FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs" by Giorgos Iacovides and co-authors, addresses limitations of previous supervised fine-tuned (SFT) language models in financial applications. Existing sentiment models often struggle with overfitting and producing overly simplistic labels, making it difficult for traders to effectively rank stocks, as highlighted in the recent announcement by Rohan Paul.

Traditional sentiment models, relying on supervised fine-tuning, frequently memorize training data and fail to generalize to novel market events, a critical flaw in the dynamic financial domain. In contrast, FinDPO leverages human preference signals to refine its understanding. The authors initiated their work with Llama-3-8B, freezing a reference copy, and then applied DPO to 32,970 labeled financial headlines sourced from three public finance datasets. Each training iteration guided the policy model towards human-selected correct labels and away from incorrect ones, with the use of LoRA adapters ensuring the training process remained computationally efficient, requiring only one A100 GPU for 4.5 hours.

The model's performance was rigorously evaluated through backtests conducted on 417 S&P 500 firms spanning from 2015 to 2021. The results indicate a substantial 747% growth in the model's long-short portfolio without trading costs. Even after accounting for 5 basis points in trading expenses, the portfolio still yielded an impressive 67% annual return and achieved a Sharpe ratio of 2.0. A Sharpe ratio of 2.0 is widely considered "very good" in algorithmic trading, signifying that the model's returns significantly double its volatility, outperforming earlier systems that often resulted in losses.

Beyond its trading profitability, FinDPO also demonstrated superior accuracy in standard sentiment analysis benchmarks. The model surpassed established systems like PhraseBank, Twitter, and Newswire tests, achieving an average weighted F1 score of 0.846, an 11% improvement over the previous leader. This enhanced performance is attributed to DPO's mechanism, which actively discourages incorrect responses rather than solely rewarding correct ones, enabling the model to generalize effectively to unforeseen market shocks and nuanced financial language. The efficiency of the training process further suggests that this advanced financial intelligence tool could be replicated and utilized by smaller research teams.