Together AI’s cover photo
Together AI

Together AI

Software Development

San Francisco, California 82,631 followers

Accelerate inference, model shaping, and pre-training on a research-optimized platform.

About us

Together AI is the AI Native Cloud, purpose-built for AI engineers and researchers with a full suite of tooling across inference, model shaping, and pre-training. AI natives can use Together AI as a full-stack AI platform — from a high- performance inference engine built for reliable and fast scaling to on-demand GPU clusters and massive-scale AI factories. Together AI continuously pushes the frontier forward by productizing cutting-edge research from our world-leading AI systems research team. By combining research velocity with production-grade infrastructure, we enable companies to reliably scale AI-native applications as fast as the field evolves. Trusted by leading AI natives like Cursor, Decagon, Eleven Labs, AI21, Hedra, and Cartesia, as well as SaaS innovators such as Salesforce, Zoom, and Zomato, Together AI powers the next generation of AI-native applications.

Website
https://together.ai
Industry
Software Development
Company size
201-500 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2022
Specialties
Artificial Intelligence, Cloud Computing, LLM, Open Source, and Decentralized Computing

Locations

  • Primary

    251 Rhode Island St

    Suite 205

    San Francisco, California 94103, US

    Get directions

Employees at Together AI

Updates

  • New from Together Research: LLMs can fix the query plans your database optimizer gets wrong, resulting in up to 4.78x faster execution. Cost-based optimizers fail when they miss semantic correlations. A filter that prunes 15M rows to 2.9M gets applied after a join instead of before — because the optimizer assumed independence where none existed. DBPlanBench exposes DataFusion's physical operator graph to an LLM, which applies targeted JSON patch edits to fix join ordering without regenerating the full plan. On TPC-H and TPC-DS workloads: → Up to 4.78x speedup on complex multi-join queries → 60.8% of queries improved by more than 5% → Build memory: 3.3 GB → 411 MB on a single benchmark query → Plans optimized at small scale transfer directly to larger databases Paper and code are open-source. Link in comments.

    • No alternative text description for this image
  • We’re excited to announce Deepgram's speech and voice models on Together AI. AI natives can now deploy Deepgram’s STT and TTS models natively on Together AI Dedicated Model Inference and run the full real-time voice stack — from transcription to reasoning to synthesis — on the AI Native Cloud. Deepgram’s lineup includes Flux, Nova-3, Nova-3 Multilingual, and Aura-2, built for low-latency voice agents operating in real-world production environments. Highlights: → Conversational STT built for turn-taking — Flux delivers 250ms end-of-turn detection for faster, more natural exchanges → Production-ready transcription and synthesis — Nova-3, Nova-3 Multilingual, and Aura-2 support noisy audio, multilingual interactions, and clear enterprise voice output → Enterprise infrastructure by default — Dedicated workloads, 99.9% uptime SLA, zero data retention, SOC 2 Type II, HIPAA-ready support, and data residency options Read the announcement: https://lnkd.in/gVHwraYT

    • No alternative text description for this image
  • New from Together Research: Aurora Speculative decoding that adapts to shifting traffic in real time. Static draft models degrade as domains change. Offline retraining can't keep pace. Aurora fixes this — an open-source, RL-based framework that learns continuously from live inference traces without interrupting serving. Key results: → 1.25x over a well-trained static speculator through online adaptation → Online training from scratch surpasses a carefully pretrained baseline → Under abrupt domain shifts, Aurora recovers quickly You don't need extensive offline pretraining. Aurora learns from the first requests it serves. Code is open-sourced. Read the blog and paper [links in comments]

    • No alternative text description for this image
  • Together AI reposted this

    Since 2023, we’ve known that Vipul Ved Prakash and his team at Together AI are building something special. As more companies across verticals adopt AI, it’s clear that generic, off-the-shelf models won’t cut it. That’s what makes Together AI so critical. They’re building a platform that enables enterprises of all shapes and sizes to pre-train their own proprietary models tailored to their individual workflows, providing the essential infrastructure that makes verticalized AI safer and more affordable. Today, Together AI returns to the Enterprise Tech 30 list as a late stage company – proving its longevity and quality in a crowded, noisy AI market. The ET30, by Wing Venture Capital and Eric Newcomer, is voted on by 90+ leading investors and corporate development leaders. It recognizes the private companies with the most potential to shape the future of enterprise technology. They see what we see – the companies who leverage AI in the right way will be the ones defining the next generation of business.  Congratulations to the entire team at Together AI on this well-deserved milestone, and all companies honored this year. #ET30 https://lnkd.in/g--BxZP2

    • No alternative text description for this image
    • No alternative text description for this image
  • New from Together Research: small models can beat GPT-4o on long context with the right system design. The instinct when context windows hit 128K or 1M tokens is to throw everything into one prompt. In practice, performance degrades as length grows. Our new paper, accepted at #ICLR2026, introduces a framework to study when and why "Divide & Conquer" works, and how to design it effectively. The core insight: long context failures come from three distinct noise sources: 1/ Model noise: confusion grows superlinearly with input length 2/ Task noise: chunks lose cross-document context 3/ Aggregator noise: the Manager fails to stitch partial answers correctly Naive "MapReduce" approaches collapse on that third point. The fix is a Planner agent that rewrites the task prompt so Workers return exactly what the Manager needs. Results: Llama-3-70B and Qwen-72B using this framework consistently outperform GPT-4o single-shot on retrieval, QA, and summarization as context length scales. The smaller models win, and they're cheaper and faster. The limit: tasks with high cross-chunk dependency — where a clue on page 1 connects to page 100 — still favor the single-shot approach. Blog, paper, and code in the comments.

    • No alternative text description for this image
  • We had a big week at #NVIDIAGTC 2026. Here’s what we shipped together with NVIDIA: • NVIDIA Dynamo 1.0 — Dynamo is already baked into our full-stack AI platform, and this release pushes inference performance even further for teams running production workloads.  • NVIDIA OpenShell via NemoClaw — We’re hosting the OpenShell runtime, so developers can tap into 150+ optimized models and build autonomous agents without sacrificing safety or scale.  • NVIDIA Nemotron 3 Super — This 120B parameter hybrid MoE model (just 12B active per token) brings long-horizon reasoning and multi-agent collaboration to production, and is now available via Together Dedicated Model Inference. • NVIDIA Parakeet TDT 0.6B V3 — Fast, reliable transcription meets Together’s inference infrastructure. Building real-time voice agents just got a lot more straightforward. Read it all here: https://lnkd.in/g-kxFKR8 Open innovation is what drives Together AI, and this is what it looks like in practice. #TogetherAI #NVIDIAGTC #OpenSource #AIInfrastructure #Inference

  • Together AI is heading to HumanX — and we're bringing the AI Native Cloud experience to booth #819. Stop by for live demos with our Solution Architects, customer story activations, research meet & greets, and a build-your-own hat bar and custom comic book station. We're also taking the stage twice: 📅 April 7 | Dan Fu, VP of Kernels — Building for compute that doesn't exist yet (but almost does) 📅 April 9 | Vipul Ved Prakash, CEO — Open or closed models: What AI natives & enterprises need to know Come find us, see what we've been building, and hear from real AI-native teams getting real results with Together AI. 👉 Booth #819. See you there: https://lnkd.in/gXZvTzky

    • No alternative text description for this image
  • View organization page for Together AI

    82,631 followers

    #NVIDIAGTC 2026 is a wrap. Here’s what we were up to 👇 • Sessions with Together AI Senior Director Yineng Zhang and co-founder Percy Liang • Deep dives with customers like Cursor & Decagon • A packed agenda of lightning talks at our booth • Announcements including availability of NVIDIA Dynamo 1.0 and NVIDIA Parakeet TDT 0.6B V3 on Together AI • Trivia, a hat bar, custom comic books and of course, a meeting with Jensen We can't wait to see you next year!

Similar pages

Browse jobs

Funding