Renowned programmer and AI pioneer John Carmack has highlighted a critical insight into hyperparameter tuning in machine learning, suggesting that many parameters are more effectively expressed and explored using logarithmic scales. In a recent social media post, Carmack observed that "Many hyperparameters are better expressed in negative integral log2."
Carmack, known for his foundational work in gaming and virtual reality, and now leading Artificial General Intelligence (AGI) efforts at Keen Technologies, pointed out that while small values like learning rates can be directly represented, parameters close to 1, such as Exponential Moving Average (EMA) factors and Temporal Difference (TD) lambda/gamma, are often better managed with expressions like 1 - 2**val
. This approach allows for a more nuanced exploration of values that are extremely close to 1, where small linear changes would have disproportionately large effects.
The use of logarithmic scales in hyperparameter tuning is a widely recognized best practice in machine learning. Experts note that many hyperparameters, particularly those influencing learning rates or regularization strengths, operate multiplicatively. Sampling these parameters uniformly on a linear scale often leads to inefficient exploration, as a significant portion of the search space might be concentrated in ranges where the model's performance is relatively insensitive to changes. Logarithmic sampling, conversely, ensures that changes across different orders of magnitude are explored with equal emphasis, allowing for efficient discovery of optimal values.
Carmack further emphasized that many machine learning parameters are "relatively insensitive to doubling or halving, and need bigger changes to reliably move the results." This observation reinforces the utility of non-linear search spaces, such as those provided by logarithmic transformations, to efficiently identify impactful adjustments. His insights underscore the importance of thoughtful hyperparameter representation for optimizing model performance and training efficiency, a crucial aspect in the complex landscape of modern AI development.