An approachable guide to reasoning models: their roots, construction, strengths, limitations, and broader impact on the AI ecosystem
What exactly are "reasoning" and "thinking" models? Last September, OpenAI launched its first "reasoning" model, o1. Unlike previous models, o1 "thinks" before delivering a final answer, chewing on a problem with step-by-step notes. OpenAI explained, Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem. This tactic delivered incredible results, at least for programming and math applications. OpenAI's o1 model scored 6 times higher than the non-reasoning GPT-4o when answering competitive math questions and 8 times higher when solving coding challenges. Suddenly, every AI lab was building and releasing reasoning models. Four months later, DeepSeek's arrival made these models a household name. Today, Claude has "thinking modes" and ChatGPT has a "reason" button. But how exactly did these models learn to "reason"? What do we mean when we say, "think"? Why do reasoning models excel at coding and math, but struggle to show improvements when venturing beyond these domains? Today let's look at how we arrived at reasoning models, review how they're built, and examine their impact on the ecosystem. Like everything else in AI, reasoning models aren't magic. They just haven't been clearly explained. Longer Prompts Are Better Prompts Following the arrival of ChatGPT, many began experimenting with best practices for prompting. Often these were little hacks that resulted in better responses. A few were somewhat absurd, like promising to tip the LLM or not punish it if it produces a good answer (I'm not joking). One of the most important discoveries was that short prompts are bad prompts. When we provide LLMs with plenty of details, context, and examples, we get better answers. There are a few ways to do this: Provide more detailed instructions: Be explicit, detailed, and exhaustive about the task at hand. State the core task at the beginning of the prompt and reiterate…