One fiscal quarter is the equivalent of a full year of AI agent advancement. An 18-month roadmap equates to about a decade of capability shifts. And systems designed for today’s AI may already be outdated by the time they go live. These are some of the insights extrapolated from a new research paper from METR, “Measuring AI Ability to Complete Long Tasks,” which may provide the clearest explanation yet of how rapidly AI agents are transforming the nature of knowledge work. https://lnkd.in/eB8KPNtb Instead of narrow benchmarks, the study asks a broader and more useful question: How long can an AI system work on a real-world task before it breaks down? The answer: the “task completion horizon” has been doubling every 3 to 7 months since 2019. If your planning frameworks are built around human time scales, it’s worth recognizing that AI is evolving in dog years — or faster. This presents a strategic challenge most enterprise leaders haven’t encountered before: We are being asked to plan for something that is unknowable in its specifics, but inevitable in its trajectory. There’s no steady-state to optimize around. No predictable plateau. There’s just an exponential curve that is already reshaping what’s possible in software development, cybersecurity, reasoning, and long-horizon task automation. The temptation to wait for maturity is understandable — but with this rate of change, waiting creates risk, and inaction becomes a liability. So what’s the alternative? Enterprises that thrive in this environment will embrace adaptive strategy — grounded in action today and built for flexibility tomorrow: • Design workflows and systems that can scale with rising agent capabilities • Rethink governance as a dynamic, living framework • Embed feedback loops, experimentation, and modularity • Focus on readiness, not perfection The METR team is careful not to overstate the trend’s longevity — but the current data is clear: AI agents are now reliably completing hour-long tasks. Full-day or week-long task automation is no longer speculative — it’s within reach. We may not know the precise timeline. But we know where we’re headed. And we know the speed is unfamiliar. AI’s future is unknowable. Its impact is inevitable. And it’s unfolding on a clock most organizations weren’t built to manage. The question isn’t whether to act. It’s whether your organization can learn, adapt, and lead at the speed of change. #AI #GenAI #EnterpriseAI #AIAgents #DigitalTransformation #Leadership #Strategy #AIReadiness #FutureOfWork #MooresLaw #UnknowableButInevitable
Trends in AI Task Completion
Explore top LinkedIn content from expert professionals.
Summary
Trends in AI task completion highlight how quickly artificial intelligence is improving its ability to autonomously handle longer and more complex tasks, with progress accelerating every few months. This means AI systems are moving from short assignments to managing multi-hour or even multi-day projects, reshaping how businesses approach productivity and planning.
- Update planning cycles: Build your project timelines with the understanding that AI capabilities may double in a matter of months, so regular reassessment is key.
- Redesign workflows: Start restructuring team roles and processes to take advantage of AI’s growing ability to handle tasks independently.
- Embrace experimentation: Encourage ongoing feedback and trial runs with AI agents to learn how they perform with longer tasks and adjust your strategies as their capabilities evolve.
-
-
𝗧𝗟;𝗗𝗥: As per McKinsey, success of AI depends 𝗽𝗿𝗶𝗺𝗮𝗿𝗶𝗹𝘆 𝗼𝗻 𝗖𝗘𝗢 𝗹𝗲𝘃𝗲𝗹 𝘀𝗽𝗼𝗻𝘀𝗼𝗿𝘀𝗵𝗶𝗽 and the ability to 𝗿𝗲𝘄𝗶𝗿𝗲 𝗮𝗻 𝗼𝗿𝗴𝗮𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻’𝘀 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀 (vs just deploying intelligent chatbots). Interestingly as per METR, AI performance in terms of the 𝗹𝗲𝗻𝗴𝘁𝗵 𝗼𝗳 𝘁𝗮𝘀𝗸𝘀 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝗰𝗮𝗻 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲 𝗵𝗮𝘀 𝗯𝗲𝗲𝗻 𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝘁𝗹𝘆 𝗲𝘅𝗽𝗼𝗻𝗲𝗻𝘁𝗶𝗮𝗹𝗹𝘆 𝗶𝗻𝗰𝗿𝗲𝗮𝘀𝗶𝗻𝗴 𝗼𝘃𝗲𝗿 𝘁𝗵𝗲 𝗽𝗮𝘀𝘁 𝟲 𝘆𝗲𝗮𝗿𝘀, 𝘄𝗶𝘁𝗵 𝗮 𝗱𝗼𝘂𝗯𝗹𝗶𝗻𝗴 𝘁𝗶𝗺𝗲 𝗼𝗳 𝗮𝗿𝗼𝘂𝗻𝗱 𝟳 𝗺𝗼𝗻𝘁𝗵𝘀. This will have a huge impact on business rewiring and faster time to outcomes. Some key points from McKinsey & Company State of AI report (https://mck.co/4hMale0): • 78% of organizations now use AI in at least one business function, up from 55% last year. • 𝗟𝗮𝗿𝗴𝗲 𝗰𝗼𝗺𝗽𝗮𝗻𝗶𝗲𝘀 𝗹𝗲𝗮𝗱 𝗔𝗜 𝗮𝗱𝗼𝗽𝘁𝗶𝗼𝗻 𝘁𝗵𝗿𝗼𝘂𝗴𝗵 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄 𝗿𝗲𝗱𝗲𝘀𝗶𝗴𝗻𝘀 𝗮𝗻𝗱 𝗱𝗲𝗱𝗶𝗰𝗮𝘁𝗲𝗱 𝗶𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝘁𝗲𝗮𝗺𝘀. • CEO oversight of AI governance shows strongest correlation with positive financial impact. • Organizations increasingly mitigate AI risks around accuracy, security, and IP infringement. • Companies are both hiring AI specialists and reskilling existing employees. • Over 80% of organizations still see no material enterprise-level EBIT impact from AI. On a related topic to workflow redesign, METR did some great work (https://bit.ly/4hCk2LQ) where they showed AI's ability to complete tasks (measured by equivalent human time required) has been doubling approximately every 7 months for the past 6 years which means that 𝘄𝗶𝘁𝗵𝗶𝗻 𝟮-𝟰 𝘆𝗲𝗮𝗿𝘀, 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝗰𝗼𝘂𝗹𝗱 𝗮𝘂𝘁𝗼𝗻𝗼𝗺𝗼𝘂𝘀𝗹𝘆 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲 𝘄𝗲𝗲𝗸-𝗹𝗼𝗻𝗴 𝗽𝗿𝗼𝗷𝗲𝗰𝘁𝘀 𝗱𝗼𝗻𝗲 𝗯𝘆 𝗵𝘂𝗺𝗮𝗻𝘀! (hat tip to Ethan Mollick for the METR link) Organizations that strategically reimagine their operations 𝗮𝗿𝗼𝘂𝗻𝗱 𝗶𝗻𝗰𝗿𝗲𝗮𝘀𝗶𝗻𝗴𝗹𝘆 𝗰𝗮𝗽𝗮𝗯𝗹𝗲 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀—centralizing risk and data governance while distributing tech talent in hybrid models as the McKinsey survey suggests—will capture greater value. 𝗔𝗰𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝗖𝗘𝗢𝘀 𝗮𝗻𝗱 𝗖𝗔𝗜𝗢𝘀: Rather than waiting for AI to demonstrate enterprise-wide EBIT impact, 𝗳𝗼𝗿𝘄𝗮𝗿𝗱-𝘁𝗵𝗶𝗻𝗸𝗶𝗻𝗴 𝗰𝗼𝗺𝗽𝗮𝗻𝗶𝗲𝘀 𝘀𝗵𝗼𝘂𝗹𝗱 𝗯𝗲 𝗺𝗮𝗽𝗽𝗶𝗻𝗴 𝗼𝘂𝘁 𝘄𝗵𝗶𝗰𝗵 𝗶𝗻𝗰𝗿𝗲𝗮𝘀𝗶𝗻𝗴𝗹𝘆 𝗰𝗼𝗺𝗽𝗹𝗲𝘅 𝘁𝗮𝘀𝗸𝘀 𝗔𝗜 𝘄𝗶𝗹𝗹 𝗵𝗮𝗻𝗱𝗹𝗲 𝗶𝗻 𝘁𝗵𝗲 𝗰𝗼𝗺𝗶𝗻𝗴 𝗺𝗼𝗻𝘁𝗵𝘀 𝗮𝗻𝗱 𝘆𝗲𝗮𝗿𝘀, allowing them to proactively restructure roles, retrain employees, and redesign processes to leverage this exponential growth in AI task completion capabilities.
-
Are we underestimating exponential growth again? Just like during COVID? An Anthropic researcher thinks we do. The study from METR reveals AI task completion is doubling every 7 months. Two years ago, GPT3.5 managed 15-second tasks. Today's models handle 1-hour assignments at a 50% success rate. Julian Schrittwieser argues that underestimating exponential AI progress mirrors our early Covid-19 blindness. Despite clear exponential curves, leaders dismissed pandemic risks until disruption hit. The same pattern emerges with AI capability debates. We see models struggle with complex code and conclude plateauing. Yet the trajectory remains exponential. I made this mistake myself when evaluating early LLMs. Watching GPT4 fail at basic tasks, I assumed "fundamental" limits. Looking back, that assessment aged poorly within 18 months. The METR index tracks autonomous software engineering specifically. Not demos or cherry-picked examples, but measurable task completion across difficulty levels. Current frontier models approach hour-long engineering problems. By mid-2026, the data suggests 4-hour task mastery. What happens when AI handles full-day engineering sprints? The productivity implications for tech organizations are staggering. The challenge for people leaders: exponential change breaks linear planning models. Traditional hiring cycles assume stable skill requirements over 12-24 months. But doubling every 7 months means 8x capability improvement in 21 months. Organizations treating AI adoption as optional might face the same rude awakening as February 2020 pandemic skeptics...
-
December 2025 – January 2026 will be remembered as the period when AI-assisted software development crossed an irreversible threshold. Not because of one launch. Because three independent trendlines converged — and the compounding effect broke our mental models. 𝟏. 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐝𝐢𝐬𝐩𝐥𝐚𝐜𝐞𝐝 𝐢𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐚𝐬 𝐭𝐡𝐞 𝐝𝐨𝐦𝐢𝐧𝐚𝐧𝐭 𝐦𝐨𝐝𝐞. For years, LLMs operated on a single forward pass: pattern-match the input, produce the most likely output. Useful, but brittle. That changed when reasoning models moved from curiosity to default. Per OpenRouter’s 100-trillion-token study, reasoning-optimized models went from negligible usage in early Q1 2025 to over 50% of all tokens processed by Q4. When a model spends tokens exploring solution paths before committing, the failure modes change. It doesn’t just guess better — it plans. And planning is the prerequisite for autonomy. 𝟐. 𝐓𝐡𝐞 𝐭𝐨𝐤𝐞𝐧 𝐞𝐜𝐨𝐧𝐨𝐦𝐢𝐜𝐬 𝐢𝐧𝐯𝐞𝐫𝐭𝐞𝐝. Reasoning models use 10–20x more tokens per task. Sounds expensive. But developers voted with their wallets: they chose slower, costlier models that actually complete complex tasks over fast, cheap ones requiring constant human correction. If a model independently resolves a multi-file bug that would take a senior engineer 4 hours, $15 in reasoning tokens is a rounding error on a $200/hr loaded cost. 𝟑. 𝐓𝐚𝐬𝐤 𝐚𝐮𝐭𝐨𝐧𝐨𝐦𝐲 𝐮𝐧𝐝𝐞𝐫𝐰𝐞𝐧𝐭 𝐚 𝐬𝐭𝐞𝐩-𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧 𝐜𝐡𝐚𝐧𝐠𝐞. METR has tracked the “task-completion time horizon” — the duration of tasks AI agents complete with 50% reliability. It’s been doubling every ~7 months for 6 years. In 2024–2025, that accelerated to every ~4 months. Concretely: Opus 4.5 hit a 50% time horizon of ~4h49m. Sonnet 4.5 handles 30+ hour autonomous sessions. CTOs I talk to report multi-day runs on specialized codebases. SWE-bench scores went from ~50% to 80%+ in one year. Scale AI’s harder SWE-Bench Pro still stumps the best models at under 25%. That gap is the roadmap. 𝐖𝐡𝐲 𝐭𝐡𝐢𝐬 𝐦𝐚𝐭𝐭𝐞𝐫𝐬 𝐟𝐨𝐫 𝐞𝐯𝐞𝐫𝐲 𝐟𝐨𝐮𝐧𝐝𝐞𝐫: We’ve entered the era of AI-delegated development. The question isn’t “can AI help my team code faster?” It’s “what work can I delegate entirely, and what’s the supervision model?” If METR’s trend holds, agents will complete week-long tasks within 2 years. The orgs that thrive will build the muscle memory now: what to delegate, how to decompose work for agents, how to validate autonomous output, and how to restructure teams around human-agent collaboration. What’s the longest autonomous task you’ve successfully delegated to an AI agent?
-
What might be the impact of AI on overall productivity? This is an important but challenging question to answer. Important because it has material implications for fiscal and monetary policy, for financial markets, and for the labor market. Challenging because the impact of AI will be broad-based, uneven, and pervasive throughout the economy—making it difficult to extrapolate precise productivity gains from narrow domains to the rest of the economy. We need a scalable framework for measuring the complexity of tasks that AI is actually used to tackle and the associated efficiency gains from its use. Alex Tamkin and I tackle this challenging problem in a research brief published this morning: "Estimating AI productivity gains from Claude conversations" Using privacy-preserving tools, we sample 100k conversations on claude.ai and ask Claude to evaluate how long it would take a skilled professional to complete the tasks that Claude is asked to handle—both with and without AI assistance. We have four key findings: 1) Claude can distinguish between short-horizon and long-horizon tasks. For a set of tasks where we know actual task completion times, Claude systematically produces longer estimates for tasks that actually take humans longer to complete—though in this benchmark forecast exercise Claude is less capable than humans. 2) Across our sample of real world conversations, Claude estimates that AI reduces task completion time by around 80% on average. Some tasks—like evaluating diagnostic images—show smaller time savings of around 20%. Others—like compiling information from reports—show savings of 95%. 3) The tasks that show up in our sample for higher wage occupations tend to be more complex, longer duration tasks. Management related tasks that Claude is asked to handle have the longest estimated human-only duration, with Business and financial operations following closely behind. 4) Aggregating Claude's estimates of task-level efficiency gains, we can assess what current usage of current-generation models might mean for the aggregate economy. Our task-level estimates would imply an increase in U.S. labor productivity growth of 1.8%pt annually if it takes a decade for AI gains to diffuse throughout the economy—roughly doubling the post-2005 pace of U.S. labor productivity growth. There are limitations to this work. Perhaps most notably, Claude's time estimates are imperfect and we lack real-world validation across all the tasks in our sample. Another limitation is that we analyze current-generation models, but capabilities are improving rapidly—which could mean larger productivity impacts ahead. Slower diffusion or bottlenecks could mean smaller. But we think it's important to work in the open and to generate useful signals about how AI is already reshaping the economy. We will continue to track this over time to provide another measure of how capabilities are improving, not just in principle but in practice.
-
AI models are now reaching the ability to complete tasks that take skilled humans nearly an hour - and this capability is doubling every 7 months. A new paper "Measuring AI Ability to Complete Long Tasks" measures the "time horizon" - the human-time-equivalent of tasks AI systems can complete with 50% reliability. Using this approach, researchers found that... ▪️ Current frontier models like Claude 3.7 Sonnet have a 50% time horizon of ~50 minutes ▪️Time horizons have been doubling approximately every 7 months since 2019 ▪️ This progress is driven by better reasoning, tool use, and adaptation to mistakes ▪️At 80% reliability, horizons are much shorter (15 minutes vs. 59 minutes for Claude 3.7) What makes this fascinating is how predictable this growth appears to be. And while these benchmarks aren't perfect representations of real-world tasks, the trend holds even when controlling for task "messiness" factors. If this trajectory continues, we may see AI agents capable of handling month-long software development tasks before 2031.
-
The ability for AI models to complete tasks relative to human effort is now doubling roughly every 4 months. METR has continued their research measuring the success rate for frontier models completing tasks at increasing complexity. The results are evaluating a 50% liklihood of success for tasks that would take an experienced engineer a given amount of time. Until recently, the pace of doubling was occurring every 7 months. This year, now ending with Claude Opus 4.5 leading the pack, the rate of improvement has shown to be accelerating, doubling every 4 months. The improvements are exponential. If the accelerated trend continues, we could see AI completing month-long tasks by the end of 2027. This rate of change is amazing. With the competition and investment focus moving from consumer to enterprise productivity, I’m sure 2026 will bring more improvement. But even if progress slows, the fact is that we’ve already seen massive improvements in the ability of frontier models to accomplish real-world tasks. I think this data shows we should keep our eyes open to the innovation in the space and take advantage of the latest improvements with each new model release. The real win here is that we can see measurable improvement for no added cost to us. When a new model becomes available, they’re getting faster AND less expensive on average. It’s often at most just a checkbox for an admin to enable access to the new models. How have you or your teams kept up with the pace of change so far? Are you seeing these improvements show up in real world value? (Links to the AI Digest article on the METR research in the comments.)
-
Researchers at METR @METR just published a new paper that shows that the length of tasks AI agents can complete autonomously has been doubling every 7 months since 2019 - essentially revealing a "Moore's Law" of sorts that can help us better understand the trajectory of AI capabilities. Key Takeaways: - To measure AI progress in a way that compares to humans, the study introduces a new metric: the 50%-task-completion time horizon. This represents the longest task an AI agent can complete correctly half the time, based on how long it usually takes a human expert to finish the same task. - AI’s ability to complete long, complex tasks has been doubling every 7 months since 2019. - If this trend continues, AI agents could independently handle tasks that take humans a month by 2028-2031. - The biggest drivers of improvement: Better reasoning, tool use, and adaptability—not just bigger models. It will be interesting to see how approaches like OpenAI and Google's Deep Research impact this. - AI still struggles with messy, real-world tasks that require intuition, judgment, and seeking out missing information. Paper: https://lnkd.in/d7bW6RTV METR thread: https://lnkd.in/dp4-Y64v Great thread on the background of the paper from Elizabeth (Beth) Barnes: https://lnkd.in/dfuvi6ZY
-
AI Is Evolving From Task Execution to Intelligent Delegation Google DeepMind's Intelligent AI Delegation framework marks a significant shift in how future AI systems will operate. Delegation is no longer framed as simply handing off tasks—it becomes a structured, adaptive decision process that defines when to delegate, how to specify intent and boundaries, how to verify outputs, and how to transfer authority across human and AI agents. The visual below illustrates this shift clearly: delegation becomes a cycle—monitor, decide, evaluate, escalate, verify—mirroring how real operational teams function. This is the architecture behind reliable agentic systems. For data leaders, this solves three widespread challenges: over‑trusting models, under‑utilizing them, and lacking clear accountability across human‑AI workflows. DeepMind’s approach introduces dynamic task decomposition, capability‑based trust scoring, confidence‑aware verification, and transparent authority propagation across multi-agent chains. The implications for India are substantial. Sectors like healthcare, insurance, and financial services run on high-volume, high-complexity workflows constrained by limited skilled capacity. Intelligent delegation enables verifiable diagnostic pipelines, reliable underwriting decisions, and auditable fraud-detection workflows—building both scalability and trust. This paper is a signal: enterprise AI is shifting from “automation” to coordinated, accountable decision orchestration. Teams that adopt a delegation‑native mindset will be best positioned to operationalize agentic AI safely and at scale. Link to the paper in the comments below. #AbhiWritesAI #Agents