OpenAI Is Slowing Hiring. Anthropic's Engineers Stopped Writing Code. Here's Why You Should Care.
Abstract
In December 2025, AI capabilities crossed a critical threshold through converging model releases (GPT-5.1/5.2, Claude Opus 4.5, Gemini 3 Pro) and viral orchestration patterns (Ralph, Gastotown, Claude Code's task system). AI now beats human experts on 75% of well-scoped knowledge tasks and can autonomously code for days, yet a massive 'capability overhang' exists—even Sam Altman admits he hasn't changed his workflow. The lesson: those who learn to manage fleets of parallel AI agents as specifications-writers and reviewers rather than manual coders will gain exponential productivity advantages, while the gap between early adopters and laggards widens dramatically.
Summary
0:00 The Paradox: Sam Altman's Confession and the Capability Overhang
Sam Altman revealed that despite leading OpenAI and having access to the most advanced AI tools, he hasn't fundamentally changed his workflow, even knowing he should be using AI much more. This confession illustrates the strange paradox of January 2026: AI capabilities have undergone what experts call a 'phase transition'—a fundamental threshold crossing—yet adoption lags dramatically behind. Andrej Karpathy reports his workflow inverted from 80% manual coding to 80% AI agents in just weeks. Ethan Mollick warns that projects from six weeks ago may already be obsolete. This gap between capability and adoption is the defining story of early 2026, representing what the speaker calls a 'capability overhang' where the technology has leaped far ahead of human behavioral adaptation.
1:32 The December Convergence: Three Frontier Models in Six Days
In late December 2025, three major AI releases landed within six days: Google's Gemini 3 Pro, OpenAI's GPT-5.1 (later 5.2), and Anthropic's Claude Opus 4.5. Unlike previous generations, these models are explicitly optimized for sustained autonomous work over hours or days rather than minutes. GPT-5.1/5.2 can operate continuously for more than 24 hours. Claude Opus 4.5 introduced an 'effort parameter' allowing developers to dial reasoning intensity up or down, priced two-thirds cheaper than predecessors. New techniques like context compaction let models summarize their own work as sessions extend, maintaining coherence over longer timeframes. The Cursor team reports models autonomously handling a week's worth of work across up to three million lines of code. This represents a categorical shift, not incremental improvement—the convergence of model releases, orchestration patterns, and proof points crossing multiple thresholds simultaneously, exactly as AI accelerationists predicted: change happens slowly, then all at once.
3:02 Ralph and Gastotown: Viral Orchestration Patterns That Changed Everything
Better models were necessary but not sufficient—the real unlock came from orchestration patterns. Jeffrey Huntley in rural Australia created 'Ralph' (named after the Simpsons character), a simple bash script addressing AI agents' key limitation: they stop to ask permission or give unreliable progress reports. Ralph runs Claude Code in a loop using git commits and files as memory between iterations; when context fills up, a fresh agent picks up where the last left off. This embarrassingly simple technique—just being persistent and repeating goals while wiping context—proved more reliable than elaborate multi-agent frameworks. VentureBeat called it 'the biggest name in AI.' Steve Yaggi then released 'Gastotown' on January 1st, a maximalist workspace manager spawning dozens of parallel AI agents. Both patterns share the core insight: the bottleneck has shifted from AI capability to human attention span and task-scoping ability. Your productive capacity is now limited only by how many agents you can manage effectively. By late January, Anthropic absorbed these lessons into Claude Code's native task system, making even Ralph look like a workaround to a problem with platform infrastructure.
6:03 Claude Code's Task System: Native Infrastructure for Multi-Agent Orchestration
Anthropic's Claude Code task system represents the platform-level absorption of grassroots orchestration patterns. Unlike simple tick boxes, each task can spawn its own sub-agent with a fresh 200,000-token context window completely isolated from the main conversation. Agent one might dig through authentication code while agent two refactors database queries and agent three works through tests—none polluting each other's context or getting confused because they don't know the others exist. This architectural shift solves the fundamental problem: the old approach had Claude holding everything in one threaded conversation, remembering earlier decisions while implementing new things, inevitably losing the plot on complex projects. The task system externalizes dependencies as structural rather than cognitive—the dependency graph doesn't forget or drift, eliminating constant re-explanations. Seven to ten sub-agents can run simultaneously with the system automatically selecting the right model (Haiku for searches, Sonnet for implementation, Opus for reasoning). Developer CJ Hess stress-tested this on a massive refactoring project and reports it 'completely nailed it.' The key innovation: when you define dependencies upfront, they never degrade because they're never stored in working memory to begin with.
9:03 Cursor's Experiments and the Self-Acceleration Loop
Cursor is proving autonomous agents can build genuinely complex software, running experiments to build a browser (3 million lines), Windows emulator, Excel clone, and Java language server—codebases ranging from 500,000 to 1.5 million lines, all generated autonomously. The point isn't competing with Microsoft but demonstrating capability at scale. At Davos, Dario Amodei described 'the most important dynamic in AI today: the self-acceleration loop.' Engineers at Anthropic tell him 'I don't write code anymore—I let the model write it.' This matters profoundly because Anthropic is using AI to accelerate production of the next AI systems. AI has entered self-reinforcing acceleration. This dynamic explains OpenAI's hiring slowdown: Sam Altman announced dramatically reduced hiring because existing engineers' span has expanded so dramatically with AI tooling. New hire expectations have skyrocketed—candidates are asked to complete what normally takes weeks using AI tools in 10-20 minutes. The numbers behind this decision come from OpenAI's GPDPVal benchmark measuring how often AI output is preferred over human expert output on well-scoped knowledge work: GPT-5.2 Pro reached 74%, double the fall 2025 model's 38%. On three-quarters of scoped knowledge tasks, AI is now preferred.
12:03 The Capability Overhang: Why Work Hasn't Transformed Yet
Despite models beating human experts 74% of the time on scoped tasks while working faster, work hasn't transformed because capability jumped far ahead while humans don't change that quickly. Most knowledge workers still use AI at a ChatGPT-3.5/4 level: ask question, get answer, move on. They're not running overnight agent loops, assigning hour-long tasks to AI coworkers, or managing parallel worker fleets. This overhang explains why discourse feels disconnected—people experience constant jet lag between capability frontiers and current practice. Someone running task loops in Anthropic or Ralph lives in a different technical reality than someone querying ChatGPT five times daily, even with access to identical underlying tools. One sees everything happening at once; the other sees incremental improvement and wonders about the hype. This creates temporary arbitrage: figure out how to use these models before competitors do and gain massive edge. Waiting for AI to get smart enough before changing workflow means you're already behind and demonstrating poor AI usage. The overhang will only grow as AI continues accelerating.
13:33 Power User Patterns: From Questions to Specifications
Specific skills distinguish power users at the capability edge. First, they assign tasks rather than ask questions—treating AI as oracle is the wrong mental model. The shift is toward declarative specifications: describe the end state, provide success criteria, let the system figure out how to get there. This is a post-prompting world that looks more like specifications. Second, accept imperfections and iterate. Ralph works because it embraces failure—AI produces broken code, retries until fixed, never gets tired. This requires abandoning expectations of first-time correctness. Third, invest in specification and review, less in implementation. Work is shifting: less time writing code, much more defining what you want and evaluating whether you got there. This represents profound skill change—most engineers spent years developing implementation intuitions now less useful. New skills: describing systems precisely enough for AI to build them, writing tests capturing real success criteria, reviewing AI-generated code for subtle conceptual errors rather than syntax mistakes. Designer Maggie Appleton observes that when agents write code, design becomes the bottleneck—questions shift from code syntax details to architecture, user experience, composability. What should this feel like? Do we have the right abstraction? These decisions require human context, taste, and vision.
15:03 The Dangers of Speed and the Importance of Thought
Speed is dangerous in itself—a foot-gun everyone's been handed. You can move incredibly fast with AI agents while forgetting how much trash you're producing. Without thinking through what you want done, speed leads to quickly building giant piles of useless code. This is a superpower handed to everyone for better or worse, and we're about to see who can actually think well. Time to use multiple agents in parallel—transformative because each one stacks your capability multiplicatively. Some developers go from a few PRs daily to dozens. The constraint moves from coding to coordination: how to scope tasks and review outputs. Even if review is tricky in this new world, the multiplicative effect of well-directed agents solving multiple tasks simultaneously is where everyone's heading. This includes letting agents run constantly—Ralph was designed for overnight sessions. Define work, start the loop, go to bed is the new engineer's day. Of course, this only works with proper guardrails, but when it works, you get productive hours around the clock from previously idle time.
16:33 The Nature of Errors and the Management Challenge
The shape of work itself is changing fundamentally. Andrej Karpathy noted something crucial about current model errors: they're not simple syntax errors but conceptual mistakes a hasty junior developer would make—wrong assumptions, running without checking, failing to surface trade-offs. This is actually positive because it indicates models reaching junior developer capability level. These are supervision problems, not capability problems. The solution isn't doing work yourself but getting better at management skills. You must watch agents, but doing so lets you catch moments when they implement a thousand lines to solve problems requiring a hundred. Technical teams need to level up in managing agents and writing evals that test correctly—including evals testing whether agents write simple enough solutions, not just traditional functional tests. This is what Sam Altman means about engineering changing so quickly: you're not spending time typing or debugging but mostly managing. The ability to code manually will atrophy as a skill because you're not using it as much. Generation and discrimination are very different skill sets you're now using daily. This isn't failure or embarrassment but reallocation of scarce human cognitive resources toward higher-leverage skills.
18:04 Context-Dependent Proximity to Code
How close should developers stay to code? Widely differing opinions exist among senior developers, but the right answer is a function of what you're building. If risk tolerance for mistakes is very low, watch agents coding in IDEs and write evals super carefully. Frontend code remains more complicated than backend because defining appearance is still challenging. But if you're willing to experiment and iterate on greenfield prototypes, you can step back substantially. This calls for another abstraction level from engineering: technical leaders must think about where engineers should stand relative to code based on the codebase's risk profile. This becomes something intentionally set as team policy—'this is production, we can't mess up, here's our expectation for how you code with agents against this codebase.' Without such policies, it'll be a free-for-all with everyone making their own rules, causing all sorts of production issues.
19:34 The New Baseline and Future Acceleration
The December convergence of models, orchestration patterns, and tools like Ralph established a new baseline: models maintain coherence for days, orchestration patterns manage agent fleets, and the economics work. You don't have to use Ralph specifically—the point is that problems these tools wrestle with fundamentally differ and point to rapid change in how we work, particularly in technical domains. When context persistence and parallel coordination suddenly get an order of magnitude easier, the ceiling lifts and everything becomes dramatically easier for building big systems. The resulting overhang is real. If Amodei is right that AI can handle end-to-end software engineering within 6-12 months, the gap between today's practices and full automation has never felt larger. If the overhang feels big now, it's only getting bigger as AI continues accelerating—witness how quickly Anthropic shipped Cowork in 10 days or natively integrated their Ralph version. The future is here now. Those moving from prompting questions to defining specifications and running multi-agent patterns will see fundamentally changed days. The experience of having five or six Claude Code windows running simultaneously is transformative—there's nothing like that speed. The future belongs to those handling that speed responsibly and thoughtfully. The overhang will continue, and benefits to those overcoming it grow exponentially because every parallel agent multiplies productivity. This represents not one model or breakthrough but collective phase transition making it rational—even necessary—to run dozens of agents on autonomous multi-day tasks. Things are only accelerating from here.
All Frames
Transcript
Full transcript (24292 chars)
Sam Alman, CEO of OpenAI, made a confession recently. He shared that despite being the CEO, despite having the best access to the most capable AI tools on the planet, despite his own internal data showing that AI now beats human experts on 3/4 of well scopeed knowledge tasks, guess what? He still hasn't really changed how he works. Altman admitted at a town hall recently that he still runs his workflow in the same way. Even though, quote, he says, "I know that I could be using AI much more than I am." That's Sam Alman. This is the strange paradox at the center of AI right now. Something fundamental shifted in December 2025. The people closest to technology are calling it a phase transition, a threshold crossing, a break in the timeline. Andre Carpathy, who helped build open AAI and has been writing code professionally for decades, says his workflow has now inverted in just a matter of a couple of weeks. From 80% manual coding to 80% AI agents. Ethan Mollik, the Wharton professor who tracks AI adoption, has put it really bluntly. Projects from 6 weeks ago may now already be obsolete. And yet, most people, including the CEO of Open AI, haven't caught up. The capability is there. The adoption is not. It's just going too fast. Understanding this gap and what to do about it is the real story of January 2026. So what actually happened in December? The shift was not just one thing and I think that by itself is part of the story because previously I could point to this model released and this was the change. Not anymore. This is a convergence of model releases. It's a convergence of releases, orchestration patterns and proof points that all together crossed a bunch of respective thresholds in the same compressed window. This is exactly what AI accelerationists have been telling us is coming. Change will happen slowly and then all at once. This is one of those all at once moments. Start with the models. In the space of just 6 days late last year, three frontier releases landed. Google's Gemini 3 Pro, OpenAI's GPT 5.1 Codeex Max, and then 5.2 came out soon after that. And then also Anthropics Claude Opus 4.5. All of these models are explicitly optimized for something previous models could not do well. Sustained autonomous work over hours or days rather than minutes. GPT's 5.1 and now 5.2 class models are designed for continuous operation more than a day of autonomous work. Claude Opus 4.5 has introduced an effort parameter that lets developers dial reasoning up or down. And Enthropic has priced it 2/3 cheaper than the previous version. And now we have techniques like context compaction from both OpenAI and Anthropic that lets the model summarize its own work as sessions extend so that the model can more easily maintain coherence again over longer time frames. Are you getting the theme? Look, the cursor team has tested these models. Other teams have tested these models. We're seeing reports come back of models being able to do a week of work autonomously and code up to three million lines before coming back for more. This is not the same category of work as we were seeing even in September and October of 2025. It's a new category. Things have changed all at once. And you know what? Better models, as much as I like them, were necessary, but they were not sufficient. The real unlock came from orchestration patterns that went viral in late December. The first was Ralph, named after the Simpsons character known for cheerful obliviousness. Jeffrey Huntley, an open-source developer way out in rural Australia, grew frustrated with Agentic coding central limitation. Models keep stopping to ask permission or they report progress and they're wrong or they're overoptimistic. And every pause requires human attention and often you're frustrated because you're telling the model the same thing. So all Jeffoffrey did is he wrote a bash script that runs claude code in a loop using git commits and files as memory between these iterations and when the context window fills up a fresh agent picks up where the last one left off and you just wipe the previous context window and keep going against that task. The technique is embarrassingly simple for an engineer. And while the AI industry was building elaborate multi- aent frameworks, all Jeffree did was discover that you can just be really persistent. You can repeat the goal. You can wipe the context window and you're going to get somewhere. A loop that keeps running until tests pass is more reliable than very carefully choreographed agent handoffs. Venture Beat called it the biggest name in AI right now and they weren't wrong. The pattern spread because it enabled you to do much more autonomous work for a long period of time. The second viral piece was Gas Town. It was released by Steve Yagi on January 1st. While Ralph is minimalist, Gast Town is unabashedly maximalist. It's a completely insane workspace manager that spawns and coordinates dozens of AI agents working in parallel. And honestly, Gas Town is something that reflects Steve Yagi's brain more than it reflects a coherent enterprise agentic pattern. But but but it's still relevant because both patterns share the same core insight. The bottleneck has shifted. You are now the manager of however many agents you can keep track of productively. Your productive capacity is limited now only by your attention span and your ability to scope tasks well. And then things kept changing because in late January, Anthropic shipped Claude Code's new task system. And suddenly even Ralph looked like a clever workaround to a problem that now has native infrastructure. You don't have to get Ralph anymore. CJ Hess, a developer who stress tests new AI tooling, was in the middle of a large author factor when Claude codes task system. He pushed it to its limits, right? He created a massive task list. He had it orchestrate sub agents to execute the entire thing. And he reports that it completely nailed it. And that's weird. We're used to things where agents fumble, where they don't get it done. And in this case, a simple task system that just looks like a to-do list was what it took to coordinate agents across a complex multi-agent problem. Now, to be fair, the task list that Anthropic released is more than just a simple tick box. Under the surface, each task can spawn its own sub agent, and each sub agent can get a fresh 200,000 token context window that's completely isolated from the main conversation. So you can have a clean focused job for that sub agent. Let's say agent one is digging through authentication code and agent two is refactoring database queries and agent three is working through tests. None of them are polluting each other's context or getting confused by what the others are doing because they don't know each other exists which is the same insight Yaggi had in Gastone. The old approach was Claude trying to hold everything in one long threaded conversation, remembering decisions from earlier while implementing new things and it just got complicated and Claude lost the plot. That still works for small stuff, but for stuff that's complex, context management becomes the bottleneck and stuff falls through the cracks. The task system changes that architecture. Each agent focuses on just one thing. When a task completes, anything blocked by it then automatically unblocks and the next wave of agent just magically kicks off. So you can have between seven and 10 sub aents running simultaneously and the system just picks the right model for the job. Haiku for quick searches, Sonnet for implementation, opus for reasoning. All you do is define your dependencies and the system handles all of that orchestration for you. Look, the key innovation here is the realization that dependencies are structural. They're not cognitive. Without them, Claude has to hold the entire plan in working memory. And the plan will degrade the moment the context window fills up. You end up reexplaining over and over to the agent. This is what's done. This is what's left. This is what depends on what. But when you externalize the dependencies, the graph doesn't forget and doesn't drift. you never need to reexplain to the agent because it never got stored in memory to begin with. It's just a task sheet. It's going back to Ralph was a bash loop workaround to this same problem. The task system is Anthropic's answer. It's a native platform infrastructure for the same capability and it illustrates how fast things are moving. Patterns can go viral and just a couple of weeks later they're obsolete because they've been absorbed into the platform. Cursor is carrying the flag for very large running autonomous projects. I've talked about their project to build a browser and how it took 3 million lines of code. They've written about it extensively, but they're not done with the browser. Cursor is running similar experiments using AI agents to build a Windows emulator. They're building an Excel clone. They're building a Java language server. These are big code bases. They range from half a million to one and a half million lines. They're all being generated autonomously. Now, the point here is not that cursor is immediately going to start shipping Excel and competing with Windows. The point is that they are proving that autonomous AI agents can build complex software. At Davos in late January, Dario Amade described what he called the most important dynamic in AI today, the self acceleration loop. And it's important that we understand it. He said, I have engineers at Anthropic who tell me, I don't write code anymore. I let the model write the code. Now, we've heard that on Twitter a lot and the mechanism is simple. But the fact that Anthropic is doing it is really important to understand because fundamentally they are accelerating the production of the next AI systems using AI. AI has entered a self acceleration loop. This is also why OpenAI is starting to slow hiring. Just this past week, Altman announced that OpenAI plans to dramatically slow down hiring. And he said he did it because of the capabilities and the span he sees from existing engineers. Now, they're not stopping hiring altogether, but one of the things he shared is that the expectation he has for new hires is now skyhigh because of what AI tooling can give. He said literally they're having new hires sit down if you're in the interview loop. and he said, "We're asking them to do something that would normally take weeks using AI tools in 10 or 20 minutes." That's a reasonable request. I've shared earlier how you can use Claude in Excel to do weeks worth of work in 10 to 15 minutes. This is the reality of work in 2026. And what Sam is choosing to do is responsible because as he said, he doesn't want to have awkward conversations and overhire. He would rather hire the right people, keep them around and expand their span with AI tooling. The numbers behind this decision come from OpenAI's own benchmark GDP val. It measures how often AI output is preferred over human expert output on a well scope knowledge work. And we see the tipping point hitting around this same time that the last few weeks of the year of 2025 because GPT thinking tied or beat humans only 38% of the time. That was the model in the fall. GPT 5.2 Pro which was released much more recently at the very end of the year early this year reached 74%. It doubled. So on three quarters of scope knowledge tasks the AI is now preferred. And that is you can read that as a general pattern for cutting edge models. Now it's not just chat GPT. And as Sam put it, right, if you can assign your co-workers something that takes an hour and you get something that's better than what a human would do 74% of the time and it's taking vastly less time, it's pretty extraordinary feeling. And this brings us back to the paradox. If models are beating human experts like this on scope tasks and doing it faster, why hasn't work transformed more? Why is the CEO of OpenAI, Sam himself, still running his workflow, as he says, in much the same way? This is a capability overhang because capability has jumped way ahead and humans don't change that fast. Adoption hasn't. Most knowledge workers are still using AI at I would say a chat GPT 3.5 chat GPT4 level. Ask a question, get an answer, move on. Summarize this document for me. Please draft this email. They're not running AI agent loops overnight. They're not assigning hour-long tasks to their AI co-workers. They're not managing fleets of parallel workers across their backlog. The overhang explains why the discourse feels so disconnected. Why it feels like you have constant jet lag if you are living at the edge of the capability and you're going back to look at how work looks today. Someone running task loops in Anthropic or Ralph is living in a different technical reality than someone who queries chat GPT four or five times a day even though they have daily access to the exact same underlying tools. One person is seeing the acceleration, everything happening all at once, the other is seeing incremental improvement and wondering why AI is such a big deal. This creates a very temporary arbitrage. If you figure out how to use these models before your competitors do, if you can get your teams to do that, you have a massive edge. And if you're waiting for AI to get smart enough before changing the workflow, you are already behind and you're showing that you're not using AI well. So what does closing this overhang that's developed especially in the last few weeks look like? What are specific skills that power users describe? Well, a few patterns emerge. Number one, power users that are really on the edge are assigning tasks. They are not asking questions. When you treat AI as an oracle, you are in the wrong mental model. The shift is very much toward what I would call declarative spec. Describe the end state you want. provide the success criteria and let the system figure out how to get there. This is sort of a pro postprompting world. It's still prompting, but it looks a lot more like a specification. Number two, accept imperfections and start to iterate. Ralph works because it embraces failure. The AI will produce broken code, so we're just going to make it retry till it fixes it. And it never gets tired and it keeps retrying. And you go and make coffee or lunch and you come back and it's done. This requires abandoning the expectation that AI should get things right the first time. It often won't and it doesn't matter because it doesn't get tired. Third, invest in specification. Invest in reviews. Invest less in implementation. The work is shifting. It's less time writing code. It's much more time defining what you want. It's much more time evaluating whether you got there. This is a real big skill change. Most engineers have spent years developing their intuitions around implementation and those are now not super useful. The new skill is describing the system precisely enough that AI can build it and then writing tests that capture the real success criteria and reviewing AI generated code for subtle conceptual area and then reviewing AI generated code for subtle conceptual errors rather than simple syntax mistakes. The errors get very interesting here. Maggie Appleton is a designer who's been analyzing these tools for a bit. I think she puts it really well. When agents write the code, design becomes a bottleneck. And so the questions that slow you down are less and less about the details of code syntax. They're more and more about architecture, about user experience, about composability. What should this feel like? Do we have the right abstraction here? These are the decisions that agents cannot make for you. and they require your context and your taste and your vision. I will say the speed is dangerous in and of itself. Watch out for the foot gun. You can move really really fast with AI agents and you can forget how much trash you are putting out there. To be honest, if you are not thinking through what you want done, the speed can lead you to very quickly build a giant pile of code that is not very useful. That is a superpower that everyone has been handed for better or worse and we are about to see who is actually able to think well. Yes, it is time to use multiple agents in parallel. That's another lesson. It's it's transformative because every single one stacks your capability. Some developers are going from a few PRs per day to dozens. The constraint moves from coding to coordination. How can you scope your tasks? How can you review outputs? Etc. Fundamentally, even if it's tricky and you have to figure out what review looks like in this new world, this is where we're all going because of the multiplicative effect of agents pointing in the right direction that you're thinking about where you're going. Just stacking up on top of each other and solving multiple tasks at once. And this includes letting agents run all the time. Ralph was designed for overnight sessions. Define the work, start the loop, and go to bed is a new engineer's day. Now, of course, this only works with proper guardrails, but when it works, you're getting productive hours around the clock from time that was previously idle. And look, the last thing from power users, which I think is true, is you got to actually try it. This sounds incredibly obvious, but it is the main barrier. Most people haven't run an agent loop for more than a couple of minutes, and the models improved a lot in December. If you have not revisited your AI workflow since, you're probably operating on stale assumptions about what is actually possible. To be honest with you, the shape of work itself is changing. Andre noted something really important about the errors that current models make. They're not these simple syntax errors. And he thinks, and I think he's correct, that a hasty junior developer would make very similar conceptual errors to the quality of errors the models are making now. And that's a good thing. It means the models are getting stronger and getting to the level of a junior developer because they're making wrong assumptions and they're running without checking. They're failing to surface trade-offs sometimes. Those are things that junior developers do. These are supervision problems, not capability problems. And the solution isn't to do the work yourself. It's to get better at your management skills. You do have to watch the agents, but if you do, you can catch the moments when you've implemented a thousand lines to solve a problem that could have taken a hundred. And this is something where to be quite frank, our technical teams need to level up. So they're able to do this kind of management of agents and they're able to write evals that test correctly. There are evals you can write that test whether the agent is writing a simple enough solution for this problem. Those are the kinds of eval we need to think about, not just traditional functional tests. This is what Sam means when he talks about being an engineer changing so quickly. You're not spending time typing. You're not debugging. You're spending most of your time, frankly, as a manager. And yes, we should be honest, the ability to code manually is going to start to atrophy as a skill set because you're just not using it as much. Generation and discrimination are very different skill sets, and you're using those every day. This is not a failure and it's not something to be embarrassed about. It's a reallocation of very scarce human cognitive resources toward a skill that has higher leverage. Now, this obviously leads to a debate. How close should developers stay to the code? There are widely differing opinions by senior developers here and I would argue that I think the right answer is a function of what you are building. If your risk tolerance for a mistake is very low, you are going to have to watch the agent coding in an IDE and write your eval super carefully if you want to leave it alone. If you are trying to write really good front-end code, that is more complicated right now than backend code because defining what something looks like remains a challenge. But if you're willing to experiment, if you're willing to iterate, if it's a green field project and it's a prototype, you really can step back. And so I think what this calls for is another level of abstraction from engineering. We need to think as technical leaders about where engineers should stand in relation to the code based on the risk profile of that codebase itself. That becomes something we can intentionally set as a policy for teams. Hey, this is production. This is not something we can mess up. This is our expectation as leadership for how you code with agents against this codebase. That is something we're going to have to start to do because otherwise it's just going to be a free-for-all and everyone will make their own rules and you're going to get all sorts of issues in production. So where does all of this leave us? The December convergence of models of orchestration patterns of of tools like Ralph established a new baseline. Models can now maintain coherence for days. Orchestration patterns do exist that manage fleets of agents and the economics absolutely work. This doesn't mean you have to use Ralph specifically. The point is that the problems these tools wrestle with are fundamentally different and point to a very rapid change in how we work particularly in technical domains. If you're wrestling with context persistence and parallel coordination and those problems suddenly get an order of magnitude easier which they did because of exactly what I've described around how we handle tasks and workflow and more capable models designed to run long running work patterns. Well, suddenly it's like the ceiling lifts. Everything gets an order of magnitude easier when you're building big stuff. And the overhang that that generates when this happens all at once is real. If Amade is right and AI can handle endtoend software engineering tasks within 6 to 12 months, then the gap between what we are doing today and full automation has never felt larger. If the overhang feels big after the last few weeks, as you listen to what I'm describing here, the overhang is only going to get bigger because AI is continuing to accelerate. Look at how quickly Anthropic was able to turn around and ship co-work just 10 days. Look at how quickly they turned around and shipped their version of Ralph that was more natively integrated. Yes, the people who are building this moment sometimes aren't fully into it yet. They're still moving their furniture into the new AI way of working to use a metaphor. Sam Alman admitted that about himself. But the future is here now. And if you can get through the overhang and start to accelerate into a world where you are asking the AI to do big tasks for you, you're moving from prompting with questions to defining specifications. You're running multi- aent patterns. This is going to fundamentally change your day. If you have not seen, on a personal note, if you have not felt the power of having five or six clawed code windows up on your screen at once, it's it's hard to get past it. Like there's nothing like how fast you feel you can go. And the future belongs to people who know how to handle that speed responsibly and be thoughtful with it. the overhang is going to continue and the benefits to those who can get over it are just going to get greater and greater and greater because these are exponential gains that we're looking at. Every single agent you can run in parallel multiplies your productivity. And so this is the future we're looking at. A future made not by one model maker or not one breakthrough but a collective phase transition where model capabilities as a whole over the last couple of weeks, five, six weeks have moved us from a world where it was kind of irrational to run a dozen agents to a world where if you're not running a dozen agents doing autonomous tasks for days at a time, you're behind and things are only going to go faster from here. Good luck. I have a full write up on this on Substack. Let me know where you have questions and we'll all get through it