AssemblyAI’s cover photo
AssemblyAI

AssemblyAI

Software Development

San Francisco, California 41,780 followers

Industry-leading Speech AI models to automatically recognize and understand speech.

About us

AssemblyAI is the best way to build Voice AI apps. We build the industry’s best speech-to-text and speech understanding models, including promptable speech recognition, that serve as critical infrastructure for top Voice AI products like Granola, Dovetail, Ashby, and Cluely. Our speech-to-text models lead the industry in accuracy and quality, so you can build reliable product experiences on top of voice data. And our Speech Understanding models help you go beyond transcription to uncover insights, identify speakers, and highlight key information. We make it simple to get started, with a developer-first API and usage-based pricing that scales effortlessly to millions of hours.

Website
http://www.assemblyai.com
Industry
Software Development
Company size
51-200 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2017

Products

Locations

Employees at AssemblyAI

Updates

  • If you're at HumanX and building anything that touches customer conversations—support, CX, or voice—this one’s worth your time. 👇 Our CEO, Dylan Fox, is joining ASAPP, NiCE, and TLDR for a panel on: Turning Customer Conversations into Action. Expect a practical look at how teams are going from raw conversations → structured insight → real decisions and workflows. 📍 The Grove Theater 🗓 April 7 | 3:15–3:40 PM If you’re thinking about how conversation data plugs into your stack (or your business), come by!

    • No alternative text description for this image
  • View organization page for AssemblyAI

    41,780 followers

    We talk about ambient scribes a lot. So when S2 E2 of The Pitt opened with one going wrong, it felt close to home. Well-timed—right before the launch of Medical Mode. A doctor walks in. Pulls up the AI-generated chart. Reads: "It says she takes Risperdal... and she takes Restoril when needed for sleep." Accuracy was the butt of the joke, "AI. Almost intelligent." That's the problem with general-purpose ASR in clinical settings. It hits 95%+ accuracy on a consult. It also might get "hydrochlorothiazide" wrong every time. Our new Medical Mode is optimized for medical entity recognition. One parameter to enable, works on pre-recorded and streaming audio. We had a little too much fun making this. Watch our (corrected) clip 👇

  • AssemblyAI reposted this

    View profile for Dylan Fox

    AssemblyAI18K followers

    We looked through 22 popular Voice AI datasets: VoxPopuli, Earnings-22, AfriSpeech. Datasets with 25K+ downloads/month. They're full of errors. Wrong company names. Wrong people names. Entire languages dropped from multilingual audio. Hundreds of sections marked <inaudible>. When we shipped Universal-3-Pro a few weeks ago, WER went up on some benchmarks. So we dug in. Listened to the audio. Read the transcripts side by side. What we found was very surprising... Our model was beating the human ground truth. And getting penalized for it! I wrote up everything we found with real audio examples you can listen to yourself. Full post: https://lnkd.in/ePfY-ZXw

  • AssemblyAI reposted this

    View profile for Ryan Seams

    AssemblyAI4K followers

    When we launched Universal-3 Pro, we knew the model was good. We didn't expect it to consistently outperform human transcribers. Right after launch, customers told us: "the model is failing our evals." So we dug in. Turns out: the problem wasn’t always the model. Human labels are far from perfect. And now? In many cases, the model is actually more accurate than the ground truth. Which leads to a bigger issue: WER-based evals are breaking. Join AssemblyAI next week for a workshop where we cover: - common errors in human labeled files affecting WER - how to quickly spot/fix issues with your labeled files - going beyond WER to run a proper eval - what a vibe eval is and how to scale one up Voice AI models are evolving fast. Your eval strategy needs to keep up.

    • No alternative text description for this image
  • Three companies. Three different voice AI use cases. One thing they all agreed on: Transcription quality is key 🔑. Last night they got together at Granola's London office to talk about it... A few themes that kept surfacing: 🔹 Speaker diarisation isn't a nice-to-have— it's foundational 🔹 Domain-specific terminology accuracy makes or breaks real-world deployments 🔹 The real-time vs. post-call trade-off looks different depending on your product 🔹 Multilingual support and voice tonality detection are the next frontier Hearing how differently each company has shaped its pipelines was a reminder of how much surface area "voice AI" actually covers. Special shoutout to our speakers: Jonathan Kim (Granola), Adrien Wald (CoLoop), and Shane Lynn (EdgeTier)— moderated by Ryan Seams. 🌟 Thanks to everyone who came out. 📍 London, we'll be back.

    • No alternative text description for this image
    • No alternative text description for this image
  • General-purpose ASR: 95%+ accuracy on a clinical consult. Also general-purpose ASR: gets "hydrochlorothiazide" wrong every time. Introducing Medical Mode — a correction pass on top of Universal-3 Pro optimized for medical entity recognition. Enable it with one parameter. The real failure mode isn't the transcript. It's what comes next. Most healthcare AI pipelines feed transcripts into an LLM → SOAP notes, discharge summaries, referral letters. Wrong drug name in. Wrong drug name out. Errors don't attenuate. They propagate. Medical Mode catches it before it gets that far. Works on both pre-recorded and streaming audio. No commitments or up-charges for BAAs to meet HIPAA compliance. $0.15/hr. See our benchmarks here → https://lnkd.in/gjvknmCA Test with your own audio → https://lnkd.in/gA97USAW

  • Medical Mode is now available for clinical workflows. We built Medical Mode because a transcript that's 95% accurate can still be unusable in a clinical setting. Errors in general-purpose ASR are often concentrated on exactly the tokens clinicians care about most: drug names, dosages, and clinical terminology. "Lisprohumalog" is a phonetically reasonable guess. It's also not a real medication. Most healthcare AI products feed a transcript into an LLM to produce structured output. A wrong drug name in the transcript becomes a wrong drug name in the SOAP note, the discharge summary, the referral letter. Errors don't attenuate through the pipeline. They propagate. Medical Mode runs a correction pass optimized specifically for medical entity recognition: drug names, procedures, clinical terminology. The base model's noise handling and latency characteristics stay the same. Medical Mode just refines the output on the tokens that actually matter. Works on both Universal-3 Pro pre-recorded and Universal-3 Pro Streaming. No commitments or up-charges for BAAs to meet HIPAA compliance.

Similar pages

Browse jobs

Funding

AssemblyAI 6 total rounds

Last Round

Series C

US$ 50.0M

See more info on crunchbase