Gladia’s cover photo
Gladia

Gladia

Software Development

Audio AI infrastructure for voice-first platforms

About us

Gladia is the AI audio infrastructure that transcribes, and enriches every conversation through a single API—so developers can turn audio into structured, actionable data for their products. Available in 100+ languages.

Website
https://gladia.io
Industry
Software Development
Company size
11-50 employees
Headquarters
New York
Type
Privately Held
Founded
2022
Specialties
artificial intelligence, API, and speech-to-text

Products

Locations

Employees at Gladia

Updates

  • If you’re assessing STT performance and your model looks accurate, but transcripts are off, the issue is often your ground truth. More specifically, how it’s defined and normalized. Without proper normalization, a model can score well on WER and still produce outputs that aren’t usable. At Gladia, normalization is built directly into the transcription pipeline, so outputs are aligned with real-world usage from the start. Learn more about this topic in the comments.

    • No alternative text description for this image
  • Exciting news on the latest performance benchmarks of our model Solaria 1! Across Switchboard, Gladia shows on average ~29% less WER than other providers. Why this matters: If performance only looks good on clean speech, it won’t hold up in conversation. Switchboard is among the most challenging datasets out there - noisy background, overlapping speech. True conversational audio. Which goes to show that our models are designed for real, messy, customer audio, not academic datasets. You can test it on your own audio in our playground. (Link to our open-source benchmarks in the comments)

    • No alternative text description for this image
  • View organization page for Gladia

    7,584 followers

    👉 You may be familiar with common ASR datasets such as Common Voice or Earnings. But do you know the kind of audio they include? Datasets like Common Voice are a good starting point, but often short, scripted, and relatively controlled. Look beyond that, and we get very different datasets: Switchboard→ phone calls with interruptions and background noise VoxPopuli→ long and formal parliamentary speeches with varied pacing Earnings22 → multi-speaker, conference calls with financial jargon MLD (Multilingual LibriSpeech) → clean audiobooks, multilingual Knowing these differences can help you make an informed decision on which dataset best reflects the real-world performance you can expect from a model. And using your own audio, with clean ground truth and normalization, is ultimately the best WER test. Learn more about our results across datasets and top STT providers in the comments.

  • 🔊Reminder: “Bring your own audio” is happening tomorrow. Over the past few days, developers have been sending us the recordings that usually break speech-to-text systems. Some of them are… brutal. Noisy calls. People talking over each other. Mics that sound like they came from 2007. Tomorrow we’ll run those files through multiple STT APIs live and see what actually happens. If you submitted audio → this is where you find out how your file performs. If not, it should still be a fun one to watch. (Link in the comments)

    • No alternative text description for this image
  • Gladia reposted this

    This is crazy. We just opened up the leaderboard of our blind STT comparison tool…and Gladia is now # 1. Almost 200 people submitted their audio and voted on the best of the two transcripts, based purely on quality. And the majority chose us - the smallest team in the batch, the underdog. I'll admit, I was scared to have this tool go live. Being truly transparent in the industry means holding ourselves to the same scrutiny as everyone else and I stand by that decision. But if living in the US has taught me anything, it's to celebrate our wins more. So here I am trying to lean into that (my French roots are fighting me on this, but I'm trying). If you want to keep challenging the leaderboard, I'll link the tool in comments.

    • No alternative text description for this image
  • View organization page for Gladia

    7,584 followers

    The ASR leaderboard is live! 🇫🇷 ✊ Last week, we released a blind tool for you to compare STT models without bias. Now, it’s time we lay our cards on the table. See how every provider performs in the aggregate leaderboard—ELO scores, rankings, and all the metrics that matter. Is it what you expected? For a proper evaluation, we still recommend reproducing the results on your own audio using our open-source methodology. (link in the comments)

    • No alternative text description for this image
  • Our CTO, Maxime Gaudin, joined Davit Baghdasaryan, CEO of Krisp, for a conversation on what it actually takes to scale STT in production. They talked about the things you don’t always see — balancing latency, cost, and quality, what actually breaks at scale, and why running models in real world is very different from training them. Thanks again, Davit and Krips, for having us! Krisp is doing some seriously impressive work in voice AI 👏 Go check out their newsletter so you don’t miss it: https://lnkd.in/dVVx-vGQ

    Scaling STT systems | Maxime Gaudin (CTO at Gladia)

    Scaling STT systems | Maxime Gaudin (CTO at Gladia)

    voice-ai-newsletter.krisp.ai

  • Gladia reposted this

    Scaling STT systems isn't just a model problem. It's a scale, cost, and latency problem. In this episode with Maxime Gaudin, CTO at Gladia, we get into what breaks in production. Not just models, but infrastructure, GPUs, and economics. Here's what stood out 👇 - Winning isn’t just about model quality, it is surviving brutal tradeoffs between latency, cost, and scale.  - The real challenge is not training one great model, it is running it cheap enough to meet market pricing without breaking performance.  - STT is getting commoditized so fast that providers have to chase better accuracy while selling at margins that keep shrinking.  - Big models don’t matter if they are too expensive to run at scale.  - Real-time voice AI lives or dies under a hard latency budget, and staying under 300 milliseconds leaves little room for mistakes.  - The industry obsession with one model that does everything may be the wrong path if smaller specialist models can outperform it in the moments that matter.  - Every model upgrade is risky because improving one language or task can make another one worse.  - Testing speech systems is harder than people admit because teams know something broke, but don’t know what.  - General transcription errors can be patched by an LLM, but once a name, phone number, email, or address is lost, it is gone.  - The next edge in voice AI may come from tiny models trained for high-value details like PII, not from one giant model trying to handle everything.  - Email addresses sound simple until real accents, pauses, corrections, and spelling cues expose how messy spoken language really is.  - The companies that win enterprise voice AI will be the ones that orchestrate many narrow models well, not the ones chasing a single universal model.  - Infrastructure strategy is becoming a product decision because legal rules, traffic spikes, and customer use cases all change what “best” deployment looks like.  - Cloud scaling breaks in real-time spikes, like emergency calls.  - Using managed infra and large DevOps teams at once wastes money.  - Customers want one vendor for everything, even if quality drops.  - The market will reward depth over breadth if a vendor can become truly exceptional in one painful, business-critical part of the voice stack. If STT is becoming commoditized, does the real advantage shift to specialized models that win on PII?

  • View organization page for Gladia

    7,584 followers

    How do you compare speech models without brand bias? Proper testing is the way to go. But sometimes, it can also help to simply see the difference between two transcripts with your own eyes. So we built a small experiment 👀 One input audio clip, two output transcripts. You pick the winner. A blind test for speech-to-text, featuring Gladia, Deepgram, AssemblyAI, ElevenLabs, Mistral AI, and Speechmatics. Test it with any audio, speaking in different languages. See how far it can go. (Link in the comments)

Similar pages

Browse jobs

Funding

Gladia 2 total rounds

Last Round

Series A

US$ 15.9M

See more info on crunchbase