The privacy and security of our customers and contractors is foundational to everything we do at Mercor. We recently identified that we were one of thousands of companies impacted by a supply chain attack involving LiteLLM. Our security team moved promptly to contain and remediate the incident. We are conducting a thorough investigation supported by leading third-party forensics experts. We will continue to communicate with our customers and contractors directly as appropriate and devote the resources necessary to resolving the matter as soon as possible.
Mercor
Software Development
San Francisco, California 676,548 followers
Defining the future of work
About us
Mercor is defining the future of work. We connect human expertise with leading AI labs and enterprises to train frontier models.
- Website
-
mercor.com
External link for Mercor
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2023
Locations
-
Primary
Get directions
San Francisco, California 94105, US
Employees at Mercor
Updates
-
Does Training on APEX-Agents Dev Set Generalize Beyond the Benchmark? Applied Compute post-trained GLM-4.7 on ~2,000 expert Mercor tasks and achieved state-of-the-art legal performance on APEX-Agents. We then evaluated that model, AC-Small, on benchmarks outside its training distribution. On GDPVal, AC-Small's win+tie rate rose from 55.0% to 62.7% (+7.7pp), placing it 5th overall and ahead of Opus 4.5. To understand where the gain came from, we ran two ablations: On Toolathalon, AC-Small improved by +8.0pp, from 26.5% to 34.6%. On APEX, which removes tool use and agent loops, AC-Small moved up seven spots, beating Opus 4.5, Sonnet 4.5, and Grok 4. The biggest surprise was medicine. AC-Small placed 4th at 64.8%, ahead of GPT 5.4, Gemini 3.1 Pro, and o3, despite zero medical tasks in training. The gains appear to come from stronger procedural discipline: preserving sub-details, checking intermediate outputs, and catching logical errors. Read more at the links in the comment.
-
-
"The most important problem in the world is what we do all day for work and how the knowledge work economy operates." - Brendan Foody, at Upfront Ventures Summit. Brendan sat down with Sundeep Peechu of Felicis to talk about the future of work, what's blocking enterprise AI, and why humans become more valuable as AI advances. Watch the full video at the link in the comments.
-
Mercor reposted this
Traditional coding benchmarks do not reflect how software is actually built and maintained. That's why we built a new benchmark, APEX-SWE, in partnership with Cognition. It measures whether AI models can perform complex, real-world software engineering work to ship systems that work and debug them when they don't. APEX-SWE Leaderboard | Pass@1 🥇OpenAI GPT-5.3 Codex (High) at 41.5% 🥈Anthropic Opus 4.6 (High) at 40.5% 🥉Anthropic Opus 4.5 (High) at 38.7% Every frontier model fails on nearly 60% of real production tasks.
-
Introducing APEX-SWE, in collaboration with Cognition. They see firsthand that real software engineering is not just writing code anymore. On APEX-SWE, every model fails to reliably solve the real production software engineering tasks. OpenAI GPT-5.3 Codex (High) tops the leaderboard at 41.5% on Pass@1, followed by Anthropic Opus 4.6 (High) at 40.5%. APEX-SWE tests two things legacy benchmarks ignore: building and deploying end-to-end systems across cloud services and databases, and diagnosing real production failures from logs and unstructured context. Read more about our new benchmark at the link in the comments.
-
-
Colin built his career inside UK institutions, including Cambridge University, the Bank of England, and the Financial Conduct Authority, where he helped reshape national consumer protection rules. When he moved to the U.S., he realized how much professional networks mattered and he didn't want to start over. He found Mercor instead. Now, Colin works alongside investment bankers, PhD economists, and law professors on some of the most complex legal and economic problems. "I'm teaching AI how to do legal reasoning, how to distinguish a case subtly from another one, and I'm asking it to reproduce some of the hardest work that I've done in my own professional life. It's felt like something that's given me my dignity back as a professional." Find your next opportunity: www.mercor.com
-
Amresh Subramaniam spent nearly a decade at McKinsey before joining Mercor. He was ready to stop just advising on AI and start building it. In his first year, he worked with top AI labs and built a team of more than 20 people, learning what kinds of data actually move the needle on model performance. For Amresh, it's the experts who've made the work meaningful. People who joined for flexibility, who hit a rough patch, or who simply wanted to put their expertise to work on their own terms. Read more of Amresh's story at the link in the comments.
-
-
Jay spent decades advising companies from Nordic countries, Australia, and Japan on how to enter the U.S. market. He now brings those experiences across very different industries and geographies to every project he works on at Mercor. "I think that I'm part of something which is growing, which is evolving, which is breaking new frontiers." Find opportunities with Mercor at the link in the comments.
-
Mercor reposted this
We shipped GPT-5.4 last week and startups immediately put it to work. Early teams pushed the model inside real production workflows across recruiting, GTM, legal analysis, product analytics, and testing. Their feedback helped surface signal fast. Here’s some early feedback from Basis, Clay, Hex, Harvey, HockeyStack, Legora, Mainstay, Mercor, Momentic, Pace, Rogo, and Shortcut.ai (Fundamental Research Labs) who helped stress test GPT-5.4.
-
Anish Bathwal spent years in consulting getting dropped into rooms where the problem wasn't even defined yet. Software companies, universities, enterprises across industries. By the time he left, every client was asking the same question: how do we use AI? "The bottleneck was never buy-in. It was performance. The models weren't good enough on the problems that actually mattered, the domain-specific work where expertise couldn't be faked." That is why he joined Mercor. Human knowledge is what will accelerate progress. Read Anish's story at the link in the comments.
-