If you're at HumanX and building anything that touches customer conversations—support, CX, or voice—this one’s worth your time. 👇 Our CEO, Dylan Fox, is joining ASAPP, NiCE, and TLDR for a panel on: Turning Customer Conversations into Action. Expect a practical look at how teams are going from raw conversations → structured insight → real decisions and workflows. 📍 The Grove Theater 🗓 April 7 | 3:15–3:40 PM If you’re thinking about how conversation data plugs into your stack (or your business), come by!
AssemblyAI
Software Development
San Francisco, California 41,780 followers
Industry-leading Speech AI models to automatically recognize and understand speech.
About us
AssemblyAI is the best way to build Voice AI apps. We build the industry’s best speech-to-text and speech understanding models, including promptable speech recognition, that serve as critical infrastructure for top Voice AI products like Granola, Dovetail, Ashby, and Cluely. Our speech-to-text models lead the industry in accuracy and quality, so you can build reliable product experiences on top of voice data. And our Speech Understanding models help you go beyond transcription to uncover insights, identify speakers, and highlight key information. We make it simple to get started, with a developer-first API and usage-based pricing that scales effortlessly to millions of hours.
- Website
-
http://www.assemblyai.com
External link for AssemblyAI
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2017
Products
AssemblyAI
Speech Recognition Software
At AssemblyAI, we build AI models and systems that developers and product teams use to ship transformational AI-powered audio products. As an applied AI company, our mission is to empower app builders to build 10x faster, focus on their specific use cases and user needs, and win market share with a true technology partner. We've raised over $63M in funding from leading investors, including Insight Partners, Accel, and Y Combinator. Learn more at AssemblyAI.com.
Locations
-
Primary
Get directions
320 Judah St
San Francisco, California 94122, US
Employees at AssemblyAI
Updates
-
We talk about ambient scribes a lot. So when S2 E2 of The Pitt opened with one going wrong, it felt close to home. Well-timed—right before the launch of Medical Mode. A doctor walks in. Pulls up the AI-generated chart. Reads: "It says she takes Risperdal... and she takes Restoril when needed for sleep." Accuracy was the butt of the joke, "AI. Almost intelligent." That's the problem with general-purpose ASR in clinical settings. It hits 95%+ accuracy on a consult. It also might get "hydrochlorothiazide" wrong every time. Our new Medical Mode is optimized for medical entity recognition. One parameter to enable, works on pre-recorded and streaming audio. We had a little too much fun making this. Watch our (corrected) clip 👇
Clinical-grade accuracy on every drug name, dose, and diagnosis
-
Heading to HumanX next week? Don't forget to say hi! 👋 Stop by our booth to share what you're working on, check out our swag, or ask questions about how we can help you build with voice. Our AAI crew will be there waiting for you! 👀 Ryan Seams, Michael Miller, Tim Higgins, Zackary Klebanoff. 🗓️ April 6–9 📍 San Francisco Moscone Center Booth 315
-
-
AssemblyAI reposted this
We looked through 22 popular Voice AI datasets: VoxPopuli, Earnings-22, AfriSpeech. Datasets with 25K+ downloads/month. They're full of errors. Wrong company names. Wrong people names. Entire languages dropped from multilingual audio. Hundreds of sections marked <inaudible>. When we shipped Universal-3-Pro a few weeks ago, WER went up on some benchmarks. So we dug in. Listened to the audio. Read the transcripts side by side. What we found was very surprising... Our model was beating the human ground truth. And getting penalized for it! I wrote up everything we found with real audio examples you can listen to yourself. Full post: https://lnkd.in/ePfY-ZXw
-
Earlier this week our team joined founders, engineers, and PMs at Granola's London office to talk about what it actually takes to build with voice AI in production. Great city, great conversations, even better people. London, we'll be back.📍
-
AssemblyAI reposted this
When we launched Universal-3 Pro, we knew the model was good. We didn't expect it to consistently outperform human transcribers. Right after launch, customers told us: "the model is failing our evals." So we dug in. Turns out: the problem wasn’t always the model. Human labels are far from perfect. And now? In many cases, the model is actually more accurate than the ground truth. Which leads to a bigger issue: WER-based evals are breaking. Join AssemblyAI next week for a workshop where we cover: - common errors in human labeled files affecting WER - how to quickly spot/fix issues with your labeled files - going beyond WER to run a proper eval - what a vibe eval is and how to scale one up Voice AI models are evolving fast. Your eval strategy needs to keep up.
-
-
Three companies. Three different voice AI use cases. One thing they all agreed on: Transcription quality is key 🔑. Last night they got together at Granola's London office to talk about it... A few themes that kept surfacing: 🔹 Speaker diarisation isn't a nice-to-have— it's foundational 🔹 Domain-specific terminology accuracy makes or breaks real-world deployments 🔹 The real-time vs. post-call trade-off looks different depending on your product 🔹 Multilingual support and voice tonality detection are the next frontier Hearing how differently each company has shaped its pipelines was a reminder of how much surface area "voice AI" actually covers. Special shoutout to our speakers: Jonathan Kim (Granola), Adrien Wald (CoLoop), and Shane Lynn (EdgeTier)— moderated by Ryan Seams. 🌟 Thanks to everyone who came out. 📍 London, we'll be back.
-
-
General-purpose ASR: 95%+ accuracy on a clinical consult. Also general-purpose ASR: gets "hydrochlorothiazide" wrong every time. Introducing Medical Mode — a correction pass on top of Universal-3 Pro optimized for medical entity recognition. Enable it with one parameter. The real failure mode isn't the transcript. It's what comes next. Most healthcare AI pipelines feed transcripts into an LLM → SOAP notes, discharge summaries, referral letters. Wrong drug name in. Wrong drug name out. Errors don't attenuate. They propagate. Medical Mode catches it before it gets that far. Works on both pre-recorded and streaming audio. No commitments or up-charges for BAAs to meet HIPAA compliance. $0.15/hr. See our benchmarks here → https://lnkd.in/gjvknmCA Test with your own audio → https://lnkd.in/gA97USAW
-
Medical Mode is now available for clinical workflows. We built Medical Mode because a transcript that's 95% accurate can still be unusable in a clinical setting. Errors in general-purpose ASR are often concentrated on exactly the tokens clinicians care about most: drug names, dosages, and clinical terminology. "Lisprohumalog" is a phonetically reasonable guess. It's also not a real medication. Most healthcare AI products feed a transcript into an LLM to produce structured output. A wrong drug name in the transcript becomes a wrong drug name in the SOAP note, the discharge summary, the referral letter. Errors don't attenuate through the pipeline. They propagate. Medical Mode runs a correction pass optimized specifically for medical entity recognition: drug names, procedures, clinical terminology. The base model's noise handling and latency characteristics stay the same. Medical Mode just refines the output on the tokens that actually matter. Works on both Universal-3 Pro pre-recorded and Universal-3 Pro Streaming. No commitments or up-charges for BAAs to meet HIPAA compliance.