We just open sourced a set of agent skills for Together AI. The idea is simple: Instead of copy-pasting from docs every time you need to call an API, your coding agent already knows how. The skills cover chat completions, image/video/audio generation, fine-tuning, embeddings, batch inference, evaluations, dedicated endpoints, GPU clusters, and more. They work with Claude Code, Cursor, Codex, and Gemini CLI. One command to install: npx skills add togethercomputer/skills Each skill includes reference docs, runnable Python/TypeScript scripts, and routing logic so the agent picks the right one automatically.
Together AI
Software Development
San Francisco, California 82,621 followers
Accelerate inference, model shaping, and pre-training on a research-optimized platform.
About us
Together AI is the AI Native Cloud, purpose-built for AI engineers and researchers with a full suite of tooling across inference, model shaping, and pre-training. AI natives can use Together AI as a full-stack AI platform — from a high- performance inference engine built for reliable and fast scaling to on-demand GPU clusters and massive-scale AI factories. Together AI continuously pushes the frontier forward by productizing cutting-edge research from our world-leading AI systems research team. By combining research velocity with production-grade infrastructure, we enable companies to reliably scale AI-native applications as fast as the field evolves. Trusted by leading AI natives like Cursor, Decagon, Eleven Labs, AI21, Hedra, and Cartesia, as well as SaaS innovators such as Salesforce, Zoom, and Zomato, Together AI powers the next generation of AI-native applications.
- Website
-
https://together.ai
External link for Together AI
- Industry
- Software Development
- Company size
- 201-500 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2022
- Specialties
- Artificial Intelligence, Cloud Computing, LLM, Open Source, and Decentralized Computing
Locations
-
Primary
Get directions
251 Rhode Island St
Suite 205
San Francisco, California 94103, US
Employees at Together AI
Updates
-
New from Together Research: LLMs can fix the query plans your database optimizer gets wrong, resulting in up to 4.78x faster execution. Cost-based optimizers fail when they miss semantic correlations. A filter that prunes 15M rows to 2.9M gets applied after a join instead of before — because the optimizer assumed independence where none existed. DBPlanBench exposes DataFusion's physical operator graph to an LLM, which applies targeted JSON patch edits to fix join ordering without regenerating the full plan. On TPC-H and TPC-DS workloads: → Up to 4.78x speedup on complex multi-join queries → 60.8% of queries improved by more than 5% → Build memory: 3.3 GB → 411 MB on a single benchmark query → Plans optimized at small scale transfer directly to larger databases Paper and code are open-source. Link in comments.
-
-
We’re excited to announce Deepgram's speech and voice models on Together AI. AI natives can now deploy Deepgram’s STT and TTS models natively on Together AI Dedicated Model Inference and run the full real-time voice stack — from transcription to reasoning to synthesis — on the AI Native Cloud. Deepgram’s lineup includes Flux, Nova-3, Nova-3 Multilingual, and Aura-2, built for low-latency voice agents operating in real-world production environments. Highlights: → Conversational STT built for turn-taking — Flux delivers 250ms end-of-turn detection for faster, more natural exchanges → Production-ready transcription and synthesis — Nova-3, Nova-3 Multilingual, and Aura-2 support noisy audio, multilingual interactions, and clear enterprise voice output → Enterprise infrastructure by default — Dedicated workloads, 99.9% uptime SLA, zero data retention, SOC 2 Type II, HIPAA-ready support, and data residency options Read the announcement: https://lnkd.in/gVHwraYT
-
-
The Together AI kernels team pushes performance to the next level. An investigation into how left more questions than answers, but VP of Kernels Dan Fu seemed proud. If you want the full picture, read on: https://lnkd.in/gwtkFbU9
-
New from Together Research: Aurora Speculative decoding that adapts to shifting traffic in real time. Static draft models degrade as domains change. Offline retraining can't keep pace. Aurora fixes this — an open-source, RL-based framework that learns continuously from live inference traces without interrupting serving. Key results: → 1.25x over a well-trained static speculator through online adaptation → Online training from scratch surpasses a carefully pretrained baseline → Under abrupt domain shifts, Aurora recovers quickly You don't need extensive offline pretraining. Aurora learns from the first requests it serves. Code is open-sourced. Read the blog and paper [links in comments]
-
-
Together AI reposted this
Since 2023, we’ve known that Vipul Ved Prakash and his team at Together AI are building something special. As more companies across verticals adopt AI, it’s clear that generic, off-the-shelf models won’t cut it. That’s what makes Together AI so critical. They’re building a platform that enables enterprises of all shapes and sizes to pre-train their own proprietary models tailored to their individual workflows, providing the essential infrastructure that makes verticalized AI safer and more affordable. Today, Together AI returns to the Enterprise Tech 30 list as a late stage company – proving its longevity and quality in a crowded, noisy AI market. The ET30, by Wing Venture Capital and Eric Newcomer, is voted on by 90+ leading investors and corporate development leaders. It recognizes the private companies with the most potential to shape the future of enterprise technology. They see what we see – the companies who leverage AI in the right way will be the ones defining the next generation of business. Congratulations to the entire team at Together AI on this well-deserved milestone, and all companies honored this year. #ET30 https://lnkd.in/g--BxZP2
-
-
Open or closed — how do you choose the right AI model? For AI-native companies and enterprises, it's one of the most consequential decisions you'll make. Together CEO Vipul Ved Prakash takes the stage at HumanX with Mozilla's Mark Surman and The Wall Street Journal's Rolfe Winkler to work through what actually matters, because the model you choose now shapes what you build next. 🗓️ Thursday, April 9, 11:40 a.m.
-
-
New from Together Research: small models can beat GPT-4o on long context with the right system design. The instinct when context windows hit 128K or 1M tokens is to throw everything into one prompt. In practice, performance degrades as length grows. Our new paper, accepted at #ICLR2026, introduces a framework to study when and why "Divide & Conquer" works, and how to design it effectively. The core insight: long context failures come from three distinct noise sources: 1/ Model noise: confusion grows superlinearly with input length 2/ Task noise: chunks lose cross-document context 3/ Aggregator noise: the Manager fails to stitch partial answers correctly Naive "MapReduce" approaches collapse on that third point. The fix is a Planner agent that rewrites the task prompt so Workers return exactly what the Manager needs. Results: Llama-3-70B and Qwen-72B using this framework consistently outperform GPT-4o single-shot on retrieval, QA, and summarization as context length scales. The smaller models win, and they're cheaper and faster. The limit: tasks with high cross-chunk dependency — where a clue on page 1 connects to page 100 — still favor the single-shot approach. Blog, paper, and code in the comments.
-
-
We had a big week at #NVIDIAGTC 2026. Here’s what we shipped together with NVIDIA: • NVIDIA Dynamo 1.0 — Dynamo is already baked into our full-stack AI platform, and this release pushes inference performance even further for teams running production workloads. • NVIDIA OpenShell via NemoClaw — We’re hosting the OpenShell runtime, so developers can tap into 150+ optimized models and build autonomous agents without sacrificing safety or scale. • NVIDIA Nemotron 3 Super — This 120B parameter hybrid MoE model (just 12B active per token) brings long-horizon reasoning and multi-agent collaboration to production, and is now available via Together Dedicated Model Inference. • NVIDIA Parakeet TDT 0.6B V3 — Fast, reliable transcription meets Together’s inference infrastructure. Building real-time voice agents just got a lot more straightforward. Read it all here: https://lnkd.in/g-kxFKR8 Open innovation is what drives Together AI, and this is what it looks like in practice. #TogetherAI #NVIDIAGTC #OpenSource #AIInfrastructure #Inference
-
Together AI is heading to HumanX — and we're bringing the AI Native Cloud experience to booth #819. Stop by for live demos with our Solution Architects, customer story activations, research meet & greets, and a build-your-own hat bar and custom comic book station. We're also taking the stage twice: 📅 April 7 | Dan Fu, VP of Kernels — Building for compute that doesn't exist yet (but almost does) 📅 April 9 | Vipul Ved Prakash, CEO — Open or closed models: What AI natives & enterprises need to know Come find us, see what we've been building, and hear from real AI-native teams getting real results with Together AI. 👉 Booth #819. See you there: https://lnkd.in/gXZvTzky
-