Sam Alman, CEO of OpenAI, made a confession recently. He shared that despite being the CEO, despite having the best access to the most capable AI tools on the planet, despite his own internal data showing that AI now beats human experts on 3/4 of well scopeed knowledge tasks, guess what? He still hasn't really changed how he works. Altman admitted at a town hall recently that he still runs his workflow in the same way. Even though, quote, he says, "I know that I could be using AI much more than I am." That's Sam Alman. This is the strange paradox at the center of AI right now. Something fundamental shifted in December 2025. The people closest to technology are calling it a phase transition, a threshold crossing, a break in the timeline. Andre Carpathy, who helped build open AAI and has been writing code professionally for decades, says his workflow has now inverted in just a matter of a couple of weeks. From 80% manual coding to 80% AI agents. Ethan Mollik, the Wharton professor who tracks AI adoption, has put it really bluntly. Projects from 6 weeks ago may now already be obsolete. And yet, most people, including the CEO of Open AI, haven't caught up. The capability is there. The adoption is not. It's just going too fast. Understanding this gap and what to do about it is the real story of January 2026. So what actually happened in December? The shift was not just one thing and I think that by itself is part of the story because previously I could point to this model released and this was the change. Not anymore. This is a convergence of model releases. It's a convergence of releases, orchestration patterns and proof points that all together crossed a bunch of respective thresholds in the same compressed window. This is exactly what AI accelerationists have been telling us is coming. Change will happen slowly and then all at once. This is one of those all at once moments. Start with the models. In the space of just 6 days late last year, three frontier releases landed. Google's Gemini 3 Pro, OpenAI's GPT 5.1 Codeex Max, and then 5.2 came out soon after that. And then also Anthropics Claude Opus 4.5. All of these models are explicitly optimized for something previous models could not do well. Sustained autonomous work over hours or days rather than minutes. GPT's 5.1 and now 5.2 class models are designed for continuous operation more than a day of autonomous work. Claude Opus 4.5 has introduced an effort parameter that lets developers dial reasoning up or down. And Enthropic has priced it 2/3 cheaper than the previous version. And now we have techniques like context compaction from both OpenAI and Anthropic that lets the model summarize its own work as sessions extend so that the model can more easily maintain coherence again over longer time frames. Are you getting the theme? Look, the cursor team has tested these models. Other teams have tested these models. We're seeing reports come back of models being able to do a week of work autonomously and code up to three million lines before coming back for more. This is not the same category of work as we were seeing even in September and October of 2025. It's a new category. Things have changed all at once. And you know what? Better models, as much as I like them, were necessary, but they were not sufficient. The real unlock came from orchestration patterns that went viral in late December. The first was Ralph, named after the Simpsons character known for cheerful obliviousness. Jeffrey Huntley, an open-source developer way out in rural Australia, grew frustrated with Agentic coding central limitation. Models keep stopping to ask permission or they report progress and they're wrong or they're overoptimistic. And every pause requires human attention and often you're frustrated because you're telling the model the same thing. So all Jeffoffrey did is he wrote a bash script that runs claude code in a loop using git commits and files as memory between these iterations and when the context window fills up a fresh agent picks up where the last one left off and you just wipe the previous context window and keep going against that task. The technique is embarrassingly simple for an engineer. And while the AI industry was building elaborate multi- aent frameworks, all Jeffree did was discover that you can just be really persistent. You can repeat the goal. You can wipe the context window and you're going to get somewhere. A loop that keeps running until tests pass is more reliable than very carefully choreographed agent handoffs. Venture Beat called it the biggest name in AI right now and they weren't wrong. The pattern spread because it enabled you to do much more autonomous work for a long period of time. The second viral piece was Gas Town. It was released by Steve Yagi on January 1st. While Ralph is minimalist, Gast Town is unabashedly maximalist. It's a completely insane workspace manager that spawns and coordinates dozens of AI agents working in parallel. And honestly, Gas Town is something that reflects Steve Yagi's brain more than it reflects a coherent enterprise agentic pattern. But but but it's still relevant because both patterns share the same core insight. The bottleneck has shifted. You are now the manager of however many agents you can keep track of productively. Your productive capacity is limited now only by your attention span and your ability to scope tasks well. And then things kept changing because in late January, Anthropic shipped Claude Code's new task system. And suddenly even Ralph looked like a clever workaround to a problem that now has native infrastructure. You don't have to get Ralph anymore. CJ Hess, a developer who stress tests new AI tooling, was in the middle of a large author factor when Claude codes task system. He pushed it to its limits, right? He created a massive task list. He had it orchestrate sub agents to execute the entire thing. And he reports that it completely nailed it. And that's weird. We're used to things where agents fumble, where they don't get it done. And in this case, a simple task system that just looks like a to-do list was what it took to coordinate agents across a complex multi-agent problem. Now, to be fair, the task list that Anthropic released is more than just a simple tick box. Under the surface, each task can spawn its own sub agent, and each sub agent can get a fresh 200,000 token context window that's completely isolated from the main conversation. So you can have a clean focused job for that sub agent. Let's say agent one is digging through authentication code and agent two is refactoring database queries and agent three is working through tests. None of them are polluting each other's context or getting confused by what the others are doing because they don't know each other exists which is the same insight Yaggi had in Gastone. The old approach was Claude trying to hold everything in one long threaded conversation, remembering decisions from earlier while implementing new things and it just got complicated and Claude lost the plot. That still works for small stuff, but for stuff that's complex, context management becomes the bottleneck and stuff falls through the cracks. The task system changes that architecture. Each agent focuses on just one thing. When a task completes, anything blocked by it then automatically unblocks and the next wave of agent just magically kicks off. So you can have between seven and 10 sub aents running simultaneously and the system just picks the right model for the job. Haiku for quick searches, Sonnet for implementation, opus for reasoning. All you do is define your dependencies and the system handles all of that orchestration for you. Look, the key innovation here is the realization that dependencies are structural. They're not cognitive. Without them, Claude has to hold the entire plan in working memory. And the plan will degrade the moment the context window fills up. You end up reexplaining over and over to the agent. This is what's done. This is what's left. This is what depends on what. But when you externalize the dependencies, the graph doesn't forget and doesn't drift. you never need to reexplain to the agent because it never got stored in memory to begin with. It's just a task sheet. It's going back to Ralph was a bash loop workaround to this same problem. The task system is Anthropic's answer. It's a native platform infrastructure for the same capability and it illustrates how fast things are moving. Patterns can go viral and just a couple of weeks later they're obsolete because they've been absorbed into the platform. Cursor is carrying the flag for very large running autonomous projects. I've talked about their project to build a browser and how it took 3 million lines of code. They've written about it extensively, but they're not done with the browser. Cursor is running similar experiments using AI agents to build a Windows emulator. They're building an Excel clone. They're building a Java language server. These are big code bases. They range from half a million to one and a half million lines. They're all being generated autonomously. Now, the point here is not that cursor is immediately going to start shipping Excel and competing with Windows. The point is that they are proving that autonomous AI agents can build complex software. At Davos in late January, Dario Amade described what he called the most important dynamic in AI today, the self acceleration loop. And it's important that we understand it. He said, I have engineers at Anthropic who tell me, I don't write code anymore. I let the model write the code. Now, we've heard that on Twitter a lot and the mechanism is simple. But the fact that Anthropic is doing it is really important to understand because fundamentally they are accelerating the production of the next AI systems using AI. AI has entered a self acceleration loop. This is also why OpenAI is starting to slow hiring. Just this past week, Altman announced that OpenAI plans to dramatically slow down hiring. And he said he did it because of the capabilities and the span he sees from existing engineers. Now, they're not stopping hiring altogether, but one of the things he shared is that the expectation he has for new hires is now skyhigh because of what AI tooling can give. He said literally they're having new hires sit down if you're in the interview loop. and he said, "We're asking them to do something that would normally take weeks using AI tools in 10 or 20 minutes." That's a reasonable request. I've shared earlier how you can use Claude in Excel to do weeks worth of work in 10 to 15 minutes. This is the reality of work in 2026. And what Sam is choosing to do is responsible because as he said, he doesn't want to have awkward conversations and overhire. He would rather hire the right people, keep them around and expand their span with AI tooling. The numbers behind this decision come from OpenAI's own benchmark GDP val. It measures how often AI output is preferred over human expert output on a well scope knowledge work. And we see the tipping point hitting around this same time that the last few weeks of the year of 2025 because GPT thinking tied or beat humans only 38% of the time. That was the model in the fall. GPT 5.2 Pro which was released much more recently at the very end of the year early this year reached 74%. It doubled. So on three quarters of scope knowledge tasks the AI is now preferred. And that is you can read that as a general pattern for cutting edge models. Now it's not just chat GPT. And as Sam put it, right, if you can assign your co-workers something that takes an hour and you get something that's better than what a human would do 74% of the time and it's taking vastly less time, it's pretty extraordinary feeling. And this brings us back to the paradox. If models are beating human experts like this on scope tasks and doing it faster, why hasn't work transformed more? Why is the CEO of OpenAI, Sam himself, still running his workflow, as he says, in much the same way? This is a capability overhang because capability has jumped way ahead and humans don't change that fast. Adoption hasn't. Most knowledge workers are still using AI at I would say a chat GPT 3.5 chat GPT4 level. Ask a question, get an answer, move on. Summarize this document for me. Please draft this email. They're not running AI agent loops overnight. They're not assigning hour-long tasks to their AI co-workers. They're not managing fleets of parallel workers across their backlog. The overhang explains why the discourse feels so disconnected. Why it feels like you have constant jet lag if you are living at the edge of the capability and you're going back to look at how work looks today. Someone running task loops in Anthropic or Ralph is living in a different technical reality than someone who queries chat GPT four or five times a day even though they have daily access to the exact same underlying tools. One person is seeing the acceleration, everything happening all at once, the other is seeing incremental improvement and wondering why AI is such a big deal. This creates a very temporary arbitrage. If you figure out how to use these models before your competitors do, if you can get your teams to do that, you have a massive edge. And if you're waiting for AI to get smart enough before changing the workflow, you are already behind and you're showing that you're not using AI well. So what does closing this overhang that's developed especially in the last few weeks look like? What are specific skills that power users describe? Well, a few patterns emerge. Number one, power users that are really on the edge are assigning tasks. They are not asking questions. When you treat AI as an oracle, you are in the wrong mental model. The shift is very much toward what I would call declarative spec. Describe the end state you want. provide the success criteria and let the system figure out how to get there. This is sort of a pro postprompting world. It's still prompting, but it looks a lot more like a specification. Number two, accept imperfections and start to iterate. Ralph works because it embraces failure. The AI will produce broken code, so we're just going to make it retry till it fixes it. And it never gets tired and it keeps retrying. And you go and make coffee or lunch and you come back and it's done. This requires abandoning the expectation that AI should get things right the first time. It often won't and it doesn't matter because it doesn't get tired. Third, invest in specification. Invest in reviews. Invest less in implementation. The work is shifting. It's less time writing code. It's much more time defining what you want. It's much more time evaluating whether you got there. This is a real big skill change. Most engineers have spent years developing their intuitions around implementation and those are now not super useful. The new skill is describing the system precisely enough that AI can build it and then writing tests that capture the real success criteria and reviewing AI generated code for subtle conceptual area and then reviewing AI generated code for subtle conceptual errors rather than simple syntax mistakes. The errors get very interesting here. Maggie Appleton is a designer who's been analyzing these tools for a bit. I think she puts it really well. When agents write the code, design becomes a bottleneck. And so the questions that slow you down are less and less about the details of code syntax. They're more and more about architecture, about user experience, about composability. What should this feel like? Do we have the right abstraction here? These are the decisions that agents cannot make for you. and they require your context and your taste and your vision. I will say the speed is dangerous in and of itself. Watch out for the foot gun. You can move really really fast with AI agents and you can forget how much trash you are putting out there. To be honest, if you are not thinking through what you want done, the speed can lead you to very quickly build a giant pile of code that is not very useful. That is a superpower that everyone has been handed for better or worse and we are about to see who is actually able to think well. Yes, it is time to use multiple agents in parallel. That's another lesson. It's it's transformative because every single one stacks your capability. Some developers are going from a few PRs per day to dozens. The constraint moves from coding to coordination. How can you scope your tasks? How can you review outputs? Etc. Fundamentally, even if it's tricky and you have to figure out what review looks like in this new world, this is where we're all going because of the multiplicative effect of agents pointing in the right direction that you're thinking about where you're going. Just stacking up on top of each other and solving multiple tasks at once. And this includes letting agents run all the time. Ralph was designed for overnight sessions. Define the work, start the loop, and go to bed is a new engineer's day. Now, of course, this only works with proper guardrails, but when it works, you're getting productive hours around the clock from time that was previously idle. And look, the last thing from power users, which I think is true, is you got to actually try it. This sounds incredibly obvious, but it is the main barrier. Most people haven't run an agent loop for more than a couple of minutes, and the models improved a lot in December. If you have not revisited your AI workflow since, you're probably operating on stale assumptions about what is actually possible. To be honest with you, the shape of work itself is changing. Andre noted something really important about the errors that current models make. They're not these simple syntax errors. And he thinks, and I think he's correct, that a hasty junior developer would make very similar conceptual errors to the quality of errors the models are making now. And that's a good thing. It means the models are getting stronger and getting to the level of a junior developer because they're making wrong assumptions and they're running without checking. They're failing to surface trade-offs sometimes. Those are things that junior developers do. These are supervision problems, not capability problems. And the solution isn't to do the work yourself. It's to get better at your management skills. You do have to watch the agents, but if you do, you can catch the moments when you've implemented a thousand lines to solve a problem that could have taken a hundred. And this is something where to be quite frank, our technical teams need to level up. So they're able to do this kind of management of agents and they're able to write evals that test correctly. There are evals you can write that test whether the agent is writing a simple enough solution for this problem. Those are the kinds of eval we need to think about, not just traditional functional tests. This is what Sam means when he talks about being an engineer changing so quickly. You're not spending time typing. You're not debugging. You're spending most of your time, frankly, as a manager. And yes, we should be honest, the ability to code manually is going to start to atrophy as a skill set because you're just not using it as much. Generation and discrimination are very different skill sets, and you're using those every day. This is not a failure and it's not something to be embarrassed about. It's a reallocation of very scarce human cognitive resources toward a skill that has higher leverage. Now, this obviously leads to a debate. How close should developers stay to the code? There are widely differing opinions by senior developers here and I would argue that I think the right answer is a function of what you are building. If your risk tolerance for a mistake is very low, you are going to have to watch the agent coding in an IDE and write your eval super carefully if you want to leave it alone. If you are trying to write really good front-end code, that is more complicated right now than backend code because defining what something looks like remains a challenge. But if you're willing to experiment, if you're willing to iterate, if it's a green field project and it's a prototype, you really can step back. And so I think what this calls for is another level of abstraction from engineering. We need to think as technical leaders about where engineers should stand in relation to the code based on the risk profile of that codebase itself. That becomes something we can intentionally set as a policy for teams. Hey, this is production. This is not something we can mess up. This is our expectation as leadership for how you code with agents against this codebase. That is something we're going to have to start to do because otherwise it's just going to be a free-for-all and everyone will make their own rules and you're going to get all sorts of issues in production. So where does all of this leave us? The December convergence of models of orchestration patterns of of tools like Ralph established a new baseline. Models can now maintain coherence for days. Orchestration patterns do exist that manage fleets of agents and the economics absolutely work. This doesn't mean you have to use Ralph specifically. The point is that the problems these tools wrestle with are fundamentally different and point to a very rapid change in how we work particularly in technical domains. If you're wrestling with context persistence and parallel coordination and those problems suddenly get an order of magnitude easier which they did because of exactly what I've described around how we handle tasks and workflow and more capable models designed to run long running work patterns. Well, suddenly it's like the ceiling lifts. Everything gets an order of magnitude easier when you're building big stuff. And the overhang that that generates when this happens all at once is real. If Amade is right and AI can handle endtoend software engineering tasks within 6 to 12 months, then the gap between what we are doing today and full automation has never felt larger. If the overhang feels big after the last few weeks, as you listen to what I'm describing here, the overhang is only going to get bigger because AI is continuing to accelerate. Look at how quickly Anthropic was able to turn around and ship co-work just 10 days. Look at how quickly they turned around and shipped their version of Ralph that was more natively integrated. Yes, the people who are building this moment sometimes aren't fully into it yet. They're still moving their furniture into the new AI way of working to use a metaphor. Sam Alman admitted that about himself. But the future is here now. And if you can get through the overhang and start to accelerate into a world where you are asking the AI to do big tasks for you, you're moving from prompting with questions to defining specifications. You're running multi- aent patterns. This is going to fundamentally change your day. If you have not seen, on a personal note, if you have not felt the power of having five or six clawed code windows up on your screen at once, it's it's hard to get past it. Like there's nothing like how fast you feel you can go. And the future belongs to people who know how to handle that speed responsibly and be thoughtful with it. the overhang is going to continue and the benefits to those who can get over it are just going to get greater and greater and greater because these are exponential gains that we're looking at. Every single agent you can run in parallel multiplies your productivity. And so this is the future we're looking at. A future made not by one model maker or not one breakthrough but a collective phase transition where model capabilities as a whole over the last couple of weeks, five, six weeks have moved us from a world where it was kind of irrational to run a dozen agents to a world where if you're not running a dozen agents doing autonomous tasks for days at a time, you're behind and things are only going to go faster from here. Good luck. I have a full write up on this on Substack. Let me know where you have questions and we'll all get through it