Protege

Data Infrastructure and Analytics

New York City, New York 6,346 followers

The AI training data platform. Connecting data holders with vetted data users.

See jobs Follow

View all 73 employees

About us

The biggest unmet need in AI today is getting access to the right training data. Data holders often don’t know where to start and are rightly concerned about governance, intellectual property, and security implications. AI companies can spend years finding and negotiating access to the data they need. Protege is solving these problems by providing an easy-to-use platform to connect data holders with vetted data users.

Website: http://www.withprotege.ai
External link for Protege
Industry: Data Infrastructure and Analytics
Company size: 51-200 employees
Headquarters: New York City, New York
Type: Privately Held
Founded: 2024
Specialties: data, AI, and training data

Locations

Primary

New York City, New York, US

Get directions

Employees at Protege

See all employees

Updates

Protege reposted this
Dave Davis
3d Edited
Report this post
The Media Copilot podcast and host Pete Pachal did an episode with guest Jason Henderson, Esq., exploring AI, copyright and licensing. Glad to see Protege mentioned in minute 14! (Link to episode in comments.) The podcast covers a pretty wide range of related issues, including the legal landscape of AI training, the fair use test and the role of licensing. A nice legal primer for anyone interested in the subject.

2 Comments

Like Comment Share
Protege reposted this
Emily Lindemer, PhD
4d Edited
Report this post
I'm hiring! The Protege product org is growing in Q2, and I'm looking for an all-star product leader to come and help build our core data platform. I'm looking for someone who loves challenging data problems, finding signals in the noise, and most of all -- building. DM me! https://lnkd.in/ejbn9FC3

Product Manager, Platform jobs.ashbyhq.com

3 Comments

Like Comment Share
Protege

6,346 followers
5d
Report this post
Our CEO Bobby Samuels spoke with AI‐Tech Park about one of AI's biggest challenges: access to high-quality, real-world training data. Three key takeaways: 🔍 The Data Bottleneck Is Real The internet still accounts a tiny fraction of data in the world. The most valuable information sources — ie. clinical records, film libraries, proprietary databases — have never been structured for model training. (which is why data isn't just "AI-ready" out of the box! ⚖️ Ethical Licensing Is the future The industry is shifting to collaboration and engagement. Our Chief Content Officer Dave Davis spoke about this in Brussels at the European Broadcasting Union (EBU)'s AI Forum last week: Transparent data licensing gives data providers revenue and control, while giving AI developers cleaner datasets without the legal risk. 🧪 Synthetic Data Isn't Enough! Models trained exclusively on Ai-created content (or even content created by people but in a vacuum) can perform well in a lab but fail in real world application. We need real-world data for grounded, unbiased training and evaluation. Full interview in the comments!
1 Comment

Like Comment Share
Protege

6,346 followers
5d
Report this post
Protege

6,346 followers
5d

OpenAI cancelled Sora last week... and then their billion-dollar deal with Disney. 💡 Protege's Dave Davis explained in The Verge's coverage how Disney's response is indicative of the licensing-first world we've entered for any AI content: 𝘞𝘩𝘪𝘭𝘦 𝘴𝘰𝘮𝘦 𝘰𝘯𝘭𝘪𝘯𝘦 𝘤𝘩𝘢𝘵𝘵𝘦𝘳 𝘴𝘶𝘨𝘨𝘦𝘴𝘵𝘦𝘥 𝘪𝘵 𝘥𝘦𝘮𝘰𝘯𝘴𝘵𝘳𝘢𝘵𝘦𝘴 𝘢 𝘭𝘢𝘳𝘨𝘦𝘳 𝘧𝘢𝘪𝘭𝘶𝘳𝘦 𝘰𝘧 𝘈𝘐 𝘪𝘯 𝘦𝘯𝘵𝘦𝘳𝘵𝘢𝘪𝘯𝘮𝘦𝘯𝘵, 𝘋𝘢𝘷𝘦 𝘋𝘢𝘷𝘪𝘴, 𝘤𝘩𝘪𝘦𝘧 𝘤𝘰𝘯𝘵𝘦𝘯𝘵 𝘰𝘧𝘧𝘪𝘤𝘦𝘳 𝘢𝘵 𝘗𝘳𝘰𝘵𝘦𝘨𝘦, 𝘴𝘢𝘪𝘥 𝘵𝘩𝘢𝘵 𝘪𝘵 𝘸𝘢𝘴 𝘤𝘭𝘦𝘢𝘳 𝘵𝘩𝘢𝘵 𝘋𝘪𝘴𝘯𝘦𝘺 𝘪𝘴 𝘷𝘦𝘳𝘺 𝘮𝘶𝘤𝘩 𝘴𝘵𝘪𝘭𝘭 𝘰𝘱𝘦𝘯 𝘵𝘰 𝘭𝘪𝘤𝘦𝘯𝘴𝘪𝘯𝘨 𝘢𝘨𝘳𝘦𝘦𝘮𝘦𝘯𝘵𝘴 𝘸𝘪𝘵𝘩 𝘰𝘵𝘩𝘦𝘳 𝘤𝘰𝘮𝘱𝘢𝘯𝘪𝘦𝘴 𝘸𝘰𝘳𝘬𝘪𝘯𝘨 𝘰𝘯 𝘷𝘪𝘥𝘦𝘰-𝘨𝘦𝘯𝘦𝘳𝘢𝘵𝘪𝘰𝘯 𝘈𝘐. 𝘛𝘩𝘢𝘵 𝘤𝘰𝘶𝘭𝘥 𝘦𝘷𝘦𝘯𝘵𝘶𝘢𝘭𝘭𝘺 𝘮𝘦𝘢𝘯 𝘱𝘢𝘳𝘵𝘯𝘦𝘳𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘤𝘰𝘮𝘱𝘢𝘯𝘪𝘦𝘴 𝘭𝘪𝘬𝘦 𝘎𝘰𝘰𝘨𝘭𝘦, 𝘙𝘶𝘯𝘸𝘢𝘺, 𝘓𝘶𝘮𝘢, 𝘔𝘰𝘰𝘯𝘷𝘢𝘭𝘭𝘦𝘺, 𝘒𝘭𝘪𝘯𝘨, 𝘰𝘳 𝘚𝘦𝘦𝘥𝘢𝘯𝘤𝘦. “𝘞𝘦 𝘴𝘦𝘦 𝘢 𝘭𝘰𝘵 𝘰𝘧 𝘮𝘰𝘮𝘦𝘯𝘵𝘶𝘮 𝘪𝘯 𝘭𝘪𝘤𝘦𝘯𝘴𝘪𝘯𝘨,” 𝘋𝘢𝘷𝘪𝘴 𝘴𝘢𝘪𝘥, 𝘢𝘥𝘥𝘪𝘯𝘨 𝘵𝘩𝘢𝘵 𝘧𝘰𝘳 𝘸𝘦𝘭𝘭-𝘬𝘯𝘰𝘸𝘯 𝘤𝘰𝘮𝘱𝘢𝘯𝘪𝘦𝘴 𝘭𝘪𝘬𝘦 𝘋𝘪𝘴𝘯𝘦𝘺, 𝘪𝘵’𝘴 𝘵𝘺𝘱𝘪𝘤𝘢𝘭𝘭𝘺 𝘢 𝘴𝘵𝘳𝘢𝘵𝘦𝘨𝘪𝘤 𝘱𝘭𝘢𝘺 𝘵𝘰 𝘪𝘯𝘴𝘱𝘪𝘳𝘦 𝘧𝘢𝘯 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵𝘪𝘰𝘯 𝘪𝘯 𝘯𝘦𝘸 𝘸𝘢𝘺𝘴. “𝘛𝘩𝘦 𝘋𝘪𝘴𝘯𝘦𝘺-𝘖𝘱𝘦𝘯𝘈𝘐 𝘥𝘦𝘢𝘭 𝘸𝘢𝘴 𝘰𝘯𝘦 𝘴𝘪𝘨𝘯 𝘰𝘧 𝘵𝘩𝘢𝘵. 𝘐 𝘵𝘩𝘪𝘯𝘬 𝘪𝘵’𝘴 𝘨𝘳𝘦𝘢𝘵 𝘵𝘩𝘢𝘵 𝘪𝘯 𝘵𝘩𝘦 𝘦𝘹𝘪𝘵 𝘢𝘯𝘯𝘰𝘶𝘯𝘤𝘦𝘮𝘦𝘯𝘵, 𝘋𝘪𝘴𝘯𝘦𝘺 𝘮𝘢𝘥𝘦 𝘪𝘵 𝘤𝘭𝘦𝘢𝘳 𝘵𝘩𝘢𝘵 𝘵𝘩𝘦𝘺 𝘢𝘳𝘦 𝘸𝘪𝘥𝘦 𝘰𝘱𝘦𝘯 𝘧𝘰𝘳 𝘣𝘶𝘴𝘪𝘯𝘦𝘴𝘴 𝘵𝘰 𝘤𝘰𝘯𝘵𝘪𝘯𝘶𝘦 𝘤𝘩𝘢𝘳𝘢𝘤𝘵𝘦𝘳 𝘭𝘪𝘤𝘦𝘯𝘴𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘰𝘵𝘩𝘦𝘳 𝘱𝘢𝘳𝘵𝘪𝘦𝘴.” Not even a day after OpenAI announced the Sora news, Elon Musk's xAI stepped into the gap to announce that the company was doubling down on consumer AI videos (Bloomberg coverage in comments). Dave Davis and the rest of the Protege media team continue to unlock new revenue streams for content providers - giving access to all video and multimodal model builders rather than just one. Emily Lindemer, PhD Kristen Chapey June Yeoh
Like Comment Share
Protege

6,346 followers
5d
Report this post
Bloomberg's coverage: https://lnkd.in/eZDENH5Q
Protege

6,346 followers
5d

OpenAI cancelled Sora last week... and then their billion-dollar deal with Disney. 💡 Protege's Dave Davis explained in The Verge's coverage how Disney's response is indicative of the licensing-first world we've entered for any AI content: 𝘞𝘩𝘪𝘭𝘦 𝘴𝘰𝘮𝘦 𝘰𝘯𝘭𝘪𝘯𝘦 𝘤𝘩𝘢𝘵𝘵𝘦𝘳 𝘴𝘶𝘨𝘨𝘦𝘴𝘵𝘦𝘥 𝘪𝘵 𝘥𝘦𝘮𝘰𝘯𝘴𝘵𝘳𝘢𝘵𝘦𝘴 𝘢 𝘭𝘢𝘳𝘨𝘦𝘳 𝘧𝘢𝘪𝘭𝘶𝘳𝘦 𝘰𝘧 𝘈𝘐 𝘪𝘯 𝘦𝘯𝘵𝘦𝘳𝘵𝘢𝘪𝘯𝘮𝘦𝘯𝘵, 𝘋𝘢𝘷𝘦 𝘋𝘢𝘷𝘪𝘴, 𝘤𝘩𝘪𝘦𝘧 𝘤𝘰𝘯𝘵𝘦𝘯𝘵 𝘰𝘧𝘧𝘪𝘤𝘦𝘳 𝘢𝘵 𝘗𝘳𝘰𝘵𝘦𝘨𝘦, 𝘴𝘢𝘪𝘥 𝘵𝘩𝘢𝘵 𝘪𝘵 𝘸𝘢𝘴 𝘤𝘭𝘦𝘢𝘳 𝘵𝘩𝘢𝘵 𝘋𝘪𝘴𝘯𝘦𝘺 𝘪𝘴 𝘷𝘦𝘳𝘺 𝘮𝘶𝘤𝘩 𝘴𝘵𝘪𝘭𝘭 𝘰𝘱𝘦𝘯 𝘵𝘰 𝘭𝘪𝘤𝘦𝘯𝘴𝘪𝘯𝘨 𝘢𝘨𝘳𝘦𝘦𝘮𝘦𝘯𝘵𝘴 𝘸𝘪𝘵𝘩 𝘰𝘵𝘩𝘦𝘳 𝘤𝘰𝘮𝘱𝘢𝘯𝘪𝘦𝘴 𝘸𝘰𝘳𝘬𝘪𝘯𝘨 𝘰𝘯 𝘷𝘪𝘥𝘦𝘰-𝘨𝘦𝘯𝘦𝘳𝘢𝘵𝘪𝘰𝘯 𝘈𝘐. 𝘛𝘩𝘢𝘵 𝘤𝘰𝘶𝘭𝘥 𝘦𝘷𝘦𝘯𝘵𝘶𝘢𝘭𝘭𝘺 𝘮𝘦𝘢𝘯 𝘱𝘢𝘳𝘵𝘯𝘦𝘳𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘤𝘰𝘮𝘱𝘢𝘯𝘪𝘦𝘴 𝘭𝘪𝘬𝘦 𝘎𝘰𝘰𝘨𝘭𝘦, 𝘙𝘶𝘯𝘸𝘢𝘺, 𝘓𝘶𝘮𝘢, 𝘔𝘰𝘰𝘯𝘷𝘢𝘭𝘭𝘦𝘺, 𝘒𝘭𝘪𝘯𝘨, 𝘰𝘳 𝘚𝘦𝘦𝘥𝘢𝘯𝘤𝘦. “𝘞𝘦 𝘴𝘦𝘦 𝘢 𝘭𝘰𝘵 𝘰𝘧 𝘮𝘰𝘮𝘦𝘯𝘵𝘶𝘮 𝘪𝘯 𝘭𝘪𝘤𝘦𝘯𝘴𝘪𝘯𝘨,” 𝘋𝘢𝘷𝘪𝘴 𝘴𝘢𝘪𝘥, 𝘢𝘥𝘥𝘪𝘯𝘨 𝘵𝘩𝘢𝘵 𝘧𝘰𝘳 𝘸𝘦𝘭𝘭-𝘬𝘯𝘰𝘸𝘯 𝘤𝘰𝘮𝘱𝘢𝘯𝘪𝘦𝘴 𝘭𝘪𝘬𝘦 𝘋𝘪𝘴𝘯𝘦𝘺, 𝘪𝘵’𝘴 𝘵𝘺𝘱𝘪𝘤𝘢𝘭𝘭𝘺 𝘢 𝘴𝘵𝘳𝘢𝘵𝘦𝘨𝘪𝘤 𝘱𝘭𝘢𝘺 𝘵𝘰 𝘪𝘯𝘴𝘱𝘪𝘳𝘦 𝘧𝘢𝘯 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵𝘪𝘰𝘯 𝘪𝘯 𝘯𝘦𝘸 𝘸𝘢𝘺𝘴. “𝘛𝘩𝘦 𝘋𝘪𝘴𝘯𝘦𝘺-𝘖𝘱𝘦𝘯𝘈𝘐 𝘥𝘦𝘢𝘭 𝘸𝘢𝘴 𝘰𝘯𝘦 𝘴𝘪𝘨𝘯 𝘰𝘧 𝘵𝘩𝘢𝘵. 𝘐 𝘵𝘩𝘪𝘯𝘬 𝘪𝘵’𝘴 𝘨𝘳𝘦𝘢𝘵 𝘵𝘩𝘢𝘵 𝘪𝘯 𝘵𝘩𝘦 𝘦𝘹𝘪𝘵 𝘢𝘯𝘯𝘰𝘶𝘯𝘤𝘦𝘮𝘦𝘯𝘵, 𝘋𝘪𝘴𝘯𝘦𝘺 𝘮𝘢𝘥𝘦 𝘪𝘵 𝘤𝘭𝘦𝘢𝘳 𝘵𝘩𝘢𝘵 𝘵𝘩𝘦𝘺 𝘢𝘳𝘦 𝘸𝘪𝘥𝘦 𝘰𝘱𝘦𝘯 𝘧𝘰𝘳 𝘣𝘶𝘴𝘪𝘯𝘦𝘴𝘴 𝘵𝘰 𝘤𝘰𝘯𝘵𝘪𝘯𝘶𝘦 𝘤𝘩𝘢𝘳𝘢𝘤𝘵𝘦𝘳 𝘭𝘪𝘤𝘦𝘯𝘴𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘰𝘵𝘩𝘦𝘳 𝘱𝘢𝘳𝘵𝘪𝘦𝘴.” Not even a day after OpenAI announced the Sora news, Elon Musk's xAI stepped into the gap to announce that the company was doubling down on consumer AI videos (Bloomberg coverage in comments). Dave Davis and the rest of the Protege media team continue to unlock new revenue streams for content providers - giving access to all video and multimodal model builders rather than just one. Emily Lindemer, PhD Kristen Chapey June Yeoh
Like Comment Share
Protege

6,346 followers
5d
Report this post
OpenAI cancelled Sora last week... and then their billion-dollar deal with Disney. 💡 Protege's Dave Davis explained in The Verge's coverage how Disney's response is indicative of the licensing-first world we've entered for any AI content: 𝘞𝘩𝘪𝘭𝘦 𝘴𝘰𝘮𝘦 𝘰𝘯𝘭𝘪𝘯𝘦 𝘤𝘩𝘢𝘵𝘵𝘦𝘳 𝘴𝘶𝘨𝘨𝘦𝘴𝘵𝘦𝘥 𝘪𝘵 𝘥𝘦𝘮𝘰𝘯𝘴𝘵𝘳𝘢𝘵𝘦𝘴 𝘢 𝘭𝘢𝘳𝘨𝘦𝘳 𝘧𝘢𝘪𝘭𝘶𝘳𝘦 𝘰𝘧 𝘈𝘐 𝘪𝘯 𝘦𝘯𝘵𝘦𝘳𝘵𝘢𝘪𝘯𝘮𝘦𝘯𝘵, 𝘋𝘢𝘷𝘦 𝘋𝘢𝘷𝘪𝘴, 𝘤𝘩𝘪𝘦𝘧 𝘤𝘰𝘯𝘵𝘦𝘯𝘵 𝘰𝘧𝘧𝘪𝘤𝘦𝘳 𝘢𝘵 𝘗𝘳𝘰𝘵𝘦𝘨𝘦, 𝘴𝘢𝘪𝘥 𝘵𝘩𝘢𝘵 𝘪𝘵 𝘸𝘢𝘴 𝘤𝘭𝘦𝘢𝘳 𝘵𝘩𝘢𝘵 𝘋𝘪𝘴𝘯𝘦𝘺 𝘪𝘴 𝘷𝘦𝘳𝘺 𝘮𝘶𝘤𝘩 𝘴𝘵𝘪𝘭𝘭 𝘰𝘱𝘦𝘯 𝘵𝘰 𝘭𝘪𝘤𝘦𝘯𝘴𝘪𝘯𝘨 𝘢𝘨𝘳𝘦𝘦𝘮𝘦𝘯𝘵𝘴 𝘸𝘪𝘵𝘩 𝘰𝘵𝘩𝘦𝘳 𝘤𝘰𝘮𝘱𝘢𝘯𝘪𝘦𝘴 𝘸𝘰𝘳𝘬𝘪𝘯𝘨 𝘰𝘯 𝘷𝘪𝘥𝘦𝘰-𝘨𝘦𝘯𝘦𝘳𝘢𝘵𝘪𝘰𝘯 𝘈𝘐. 𝘛𝘩𝘢𝘵 𝘤𝘰𝘶𝘭𝘥 𝘦𝘷𝘦𝘯𝘵𝘶𝘢𝘭𝘭𝘺 𝘮𝘦𝘢𝘯 𝘱𝘢𝘳𝘵𝘯𝘦𝘳𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘤𝘰𝘮𝘱𝘢𝘯𝘪𝘦𝘴 𝘭𝘪𝘬𝘦 𝘎𝘰𝘰𝘨𝘭𝘦, 𝘙𝘶𝘯𝘸𝘢𝘺, 𝘓𝘶𝘮𝘢, 𝘔𝘰𝘰𝘯𝘷𝘢𝘭𝘭𝘦𝘺, 𝘒𝘭𝘪𝘯𝘨, 𝘰𝘳 𝘚𝘦𝘦𝘥𝘢𝘯𝘤𝘦. “𝘞𝘦 𝘴𝘦𝘦 𝘢 𝘭𝘰𝘵 𝘰𝘧 𝘮𝘰𝘮𝘦𝘯𝘵𝘶𝘮 𝘪𝘯 𝘭𝘪𝘤𝘦𝘯𝘴𝘪𝘯𝘨,” 𝘋𝘢𝘷𝘪𝘴 𝘴𝘢𝘪𝘥, 𝘢𝘥𝘥𝘪𝘯𝘨 𝘵𝘩𝘢𝘵 𝘧𝘰𝘳 𝘸𝘦𝘭𝘭-𝘬𝘯𝘰𝘸𝘯 𝘤𝘰𝘮𝘱𝘢𝘯𝘪𝘦𝘴 𝘭𝘪𝘬𝘦 𝘋𝘪𝘴𝘯𝘦𝘺, 𝘪𝘵’𝘴 𝘵𝘺𝘱𝘪𝘤𝘢𝘭𝘭𝘺 𝘢 𝘴𝘵𝘳𝘢𝘵𝘦𝘨𝘪𝘤 𝘱𝘭𝘢𝘺 𝘵𝘰 𝘪𝘯𝘴𝘱𝘪𝘳𝘦 𝘧𝘢𝘯 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵𝘪𝘰𝘯 𝘪𝘯 𝘯𝘦𝘸 𝘸𝘢𝘺𝘴. “𝘛𝘩𝘦 𝘋𝘪𝘴𝘯𝘦𝘺-𝘖𝘱𝘦𝘯𝘈𝘐 𝘥𝘦𝘢𝘭 𝘸𝘢𝘴 𝘰𝘯𝘦 𝘴𝘪𝘨𝘯 𝘰𝘧 𝘵𝘩𝘢𝘵. 𝘐 𝘵𝘩𝘪𝘯𝘬 𝘪𝘵’𝘴 𝘨𝘳𝘦𝘢𝘵 𝘵𝘩𝘢𝘵 𝘪𝘯 𝘵𝘩𝘦 𝘦𝘹𝘪𝘵 𝘢𝘯𝘯𝘰𝘶𝘯𝘤𝘦𝘮𝘦𝘯𝘵, 𝘋𝘪𝘴𝘯𝘦𝘺 𝘮𝘢𝘥𝘦 𝘪𝘵 𝘤𝘭𝘦𝘢𝘳 𝘵𝘩𝘢𝘵 𝘵𝘩𝘦𝘺 𝘢𝘳𝘦 𝘸𝘪𝘥𝘦 𝘰𝘱𝘦𝘯 𝘧𝘰𝘳 𝘣𝘶𝘴𝘪𝘯𝘦𝘴𝘴 𝘵𝘰 𝘤𝘰𝘯𝘵𝘪𝘯𝘶𝘦 𝘤𝘩𝘢𝘳𝘢𝘤𝘵𝘦𝘳 𝘭𝘪𝘤𝘦𝘯𝘴𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘰𝘵𝘩𝘦𝘳 𝘱𝘢𝘳𝘵𝘪𝘦𝘴.” Not even a day after OpenAI announced the Sora news, Elon Musk's xAI stepped into the gap to announce that the company was doubling down on consumer AI videos (Bloomberg coverage in comments). Dave Davis and the rest of the Protege media team continue to unlock new revenue streams for content providers - giving access to all video and multimodal model builders rather than just one. Emily Lindemer, PhD Kristen Chapey June Yeoh
2 Comments

Like Comment Share
Protege

6,346 followers
1w
Report this post
Our resident content licensing expert Dave Davis spoke at the European Broadcasting Union (EBU)'s AI Forum in Brussels yesterday and highlighted the opportunities out there for content owners: 💱 New Revenue: Protege reached 8-figures in media licensing revenue last year alone... and that number is quickly growing 🌍️ Global Reach: 170+ media partners engaging with AI licensing via Protege across the media landscape 🛡️ Copyright Protection: Rights protection via ethical, transparent licensing agreements that clearly outline content ownership.

2 Comments

Like Comment Share
Protege reposted this
Matt Turk
2w
Report this post
After spending the past 2 years at Cleanlab, I’m excited to share that I’m starting a new role as a Senior Machine Learning Researcher in the DataLab team at Protege. Working alongside Curtis Northcutt, Jonas Mueller, and Anish Athalye at Cleanlab was one of the most formative experiences of my career. I’m deeply grateful for the chance I had to learn from them to work on data-centric AI with such thoughtful researchers and builders, and to contribute during a period that ultimately led to Cleanlab being acquired into Handshake AI. I learned a tremendous amount about the importance of data quality, evaluation, and trustworthiness in modern AI systems to make them more accurate and reliable. Throughout my time there, my conviction only grew that the next major advances in AI will come not just from better models or more compute, but from better data. I’m excited to now be joining Bobby Samuels, Engy Ziedan, and the rest of the awesome team at Protege, where I’ll be working in the DataLab on the research and systems needed to help close this “data gap”. At DataLab, our goal is to treat the data layer of AI with the same scientific rigor that model labs apply to algorithms by building a dedicated research institution for AI data: designing high-fidelity datasets and multimodal benchmarks grounded in real-world scenarios, working closely with frontier labs on their hardest data challenges, and developing standardized ways, including “FICO scores for AI data”, to measure dataset quality, contamination, and benchmark reliability. Another important piece of this work is understanding how different kinds of data support different parts of the AI training stack. Reinforcement learning (RL) environments are a powerful form of training data that generate structured training tuples like (state, action, reward, next state) and are extremely useful for post-training optimization when the world can be simulated. But many of the highest-value domains for AI, including healthcare, enterprise workflows, and complex multimodal reasoning, cannot be faithfully simulated. Advancing models in these areas requires real-world datasets, carefully designed benchmarks, and domain-specific data for pre-training and mid-training adaptation. The idea behind DataLab is simple but important: every major leap in AI capability has historically followed a breakthrough in data (from ImageNet to large-scale web corpora). As models and compute continue to advance rapidly, closing the data gap, the gap between the data that AI systems need and the data that actually exists in usable form, may be one of the most important challenges for the field. You can read more about the vision behind DataLab here: https://lnkd.in/e_pzVaq5 Excited for what’s ahead!

The world needs an AI lab — for Data a16z.news

11 Comments

Like Comment Share
Protege reposted this
Shaper Capital

2,102 followers
2w
Report this post
When it comes to healthcare AI, is vertical depth still a moat — or has AI redrawn the lines? That was the central tension at one of the liveliest panels at our AI in Healthcare Summit last week. Brittany Walker (CRV) moderated a conversation with David Rhew, M.D. (Microsoft), Nikhil Buduma (Ambience Healthcare), Priyanka Agarwal, MD, MBA (HealthEx), and Jared Saul, MD on how horizontal and vertical players are divvying up the healthcare tech stack. A few takeaways: 👨⚕️ When you get into specific workflows, the trust, ability to support data integrations and commitment to organizational change management is critical — that goes beyond what most horizontal players can do. But AI has changed the equation, bringing horizontal platforms further up the stack than ever before. 🏥 The bar for health system adoption is higher than most people realize. Doing anything well requires a commitment to quality and tremendous depth — but AI is driving so much productivity that it is enabling startups to tackle a broader set of problems at that same level of depth. 🌋 Given the current pace of change, "the floor is lava" (hat tip Nikhil Buduma) for any company building in the space. As a young company, the advantage you have is that you're agile and can lean deeply into that change. 🩺 And on data moats: data that can be purchased is quickly getting commoditized. The interesting moats are decision traces — how a clinician reasons through a challenging case, how a clinical researcher approaches a net new disease. Watch the clip below. #AIinHealthcare #HealthcareAI #DigitalHealth #HealthTech

Horizontal v. Vertical Players: Divvying Up the Healthcare Tech Stack

Like Comment Share
Protege reposted this
Motto®

27,849 followers
2w
Report this post
Who will supply the data that trains the world’s most powerful AI models? And how do you do it ethically, transparently, and at scale? That is the challenge behind Protege. ⤷ $25M Series A ⤷ NYC-based AI data marketplace ⤷ Licensing private datasets across text, image, video, voice Every major AI system depends on training data. The most valuable datasets are not public. Protege connects the organizations that own that data with the AI developers who need it. Motto® partnered with Protege through FastTrack®, our accelerated brand program designed to help ambitious companies move from strategy to launch with speed and precision. Together, we clarified the company’s position between two critical audiences: → Data partners seeking ethical ways to monetize valuable datasets → AI builders who require high-quality, rights-guaranteed training data The strategy establishes Protege’s trajectory from a powerful source of healthcare and media datasets today to the trusted marketplace for AI-ready data across the entire development lifecycle. WHAT WE DID: → Brand sprint → Brand positioning → Strategic narrative → Verbal identity → Messaging framework → Logo system → Visual identity system → Photography and illustration direction → Digital brand guidelines → Website architecture, design, and development The result is a brand built for one of the most important infrastructure layers in AI. Congratulations to the Motto x Protege teams for launching the new brand. See the case study → https://lnkd.in/dSzRzTWz Explore FastTrack® → https://lnkd.in/d42KFZDX
- +5
19 Comments

Like Comment Share

Browse jobs

Funding

Protege 2 total rounds

Last Round

Series A Sep 13, 2025

US$ 25.0M

Investors

Footwork + 4 Other investors

See more info on crunchbase

Protege

Data Infrastructure and Analytics

New York City, New York 6,346 followers

The AI training data platform. Connecting data holders with vetted data users.

About us

Locations

Employees at Protege

Cathyann Swindlehurst

Mike Smith

James Golden

Derek Reed

Updates

Horizontal v. Vertical Players: Divvying Up the Healthcare Tech Stack

Join now to see what you are missing

Similar pages

Arintra

Aurasell AI

August Health

Squint

Tavily

Lorikeet

Bluefish

InstaLILY AI

interos.ai

Casap

Browse jobs

Engineer jobs

Senior Software Engineer jobs

Analyst jobs

Technical Lead jobs

Technical Director jobs

Civil jobs

Hardware Engineering Intern jobs

Developer jobs

Operations Engineer jobs

Associate jobs

Intelligence Specialist jobs

Business Development Associate jobs

Project Manager jobs

System Engineer jobs

Junior Software Engineer jobs

Graduate Engineer jobs

Data Engineer jobs

Virtual Assistant jobs

Graduate jobs

Customer Service Representative jobs

Funding