LMArena is a popular online AI leaderboard that is a broken system for evaluating AI models. The platform rewards superficial qualities like verbosity, aggressive formatting, and emojis over factual accuracy because users quickly vote based on aesthetics rather than careful evaluation. These wrong incentives lead models to optimize for "hallucination-plus-formatting."
