Sign in to view Ricardo’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Ricardo’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
San Francisco Bay Area
Sign in to view Ricardo’s full profile
Ricardo can introduce you to 10+ people at Slack
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
12K followers
500+ connections
Sign in to view Ricardo’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Ricardo
Ricardo can introduce you to 10+ people at Slack
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Ricardo
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Ricardo’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
About
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Activity
12K followers
-
Ricardo Bion shared thishttps://lnkd.in/gVp_NpBHow Airbnb Measures Future Value to Standardize TradeoffsHow Airbnb Measures Future Value to Standardize Tradeoffs
-
Ricardo Bion shared thisRicardo Bion shared this
-
Ricardo Bion liked thisRicardo Bion liked thisStripe's internal equity valuation increased 50% "overnight" as they announced a 409A jumping from the low 40s to the 60s. Even with some shares left unsold, I'm reminded how much luck plays a role in wealth generation despite how much we'd like to think it's driven by work and skill. I joined Stripe in mid-2021, when it was consistently ranked as a top "disruptor" on CNBC's annual list and one of the most sought after data science roles. Yet, two years later, when we had our first liquidity event, many of us were forced to sell (tax withholding regulations) at a 50% loss. Meanwhile, someone who joined around the time I left--same role, same level, and possibly less competition among top applicants given the drop--would have seen their equity value triple over the next few years. A $500k equity package becomes $250k for one person, but $1.5M for another. In my ten years post-PhD, as the nasdaq has returned roughly 450%, somehow the equity I've been given on the job is still at a net loss 😅 🤦♂️
-
Ricardo Bion liked thisRicardo Bion liked this🚀 Just completed the "Business Analytics for Leaders: From Data to Decisions" course by UC Berkeley Executive Education! This program has been an incredible journey—equipping me with the tools to turn data into actionable insights, make evidence-based decisions, and drive strategic outcomes. Grateful for the opportunity to learn from world-class faculty and collaborate with a global network of professionals. Excited to bring these new skills into my work and continue growing in the space of data-driven leadership. #BusinessAnalytics #Leadership #UCberkeley #LifelongLearning #DataDrivenDecisionMaking #ExecutiveEducationBusiness Analytics for Leaders: From Data to Decisions • Apun Hiran • Apun Hiran: Business Analytics for Leaders: From Data to Decisions by EmeritusBusiness Analytics for Leaders: From Data to Decisions • Apun Hiran • Apun Hiran: Business Analytics for Leaders: From Data to Decisions by Emeritus
-
Ricardo Bion liked thishttps://lnkd.in/gVp_NpBHow Airbnb Measures Future Value to Standardize TradeoffsHow Airbnb Measures Future Value to Standardize Tradeoffs
-
Ricardo Bion liked thisRicardo Bion liked thisHow to make an impact in your first 90 days 25 quick wins ideas from some of my favorite product/growth leaders in tech including Elena Verna, Geoff Charles, Jules Walter, Madhavan Ramanujam, Claire Vo, Jiaona Zhang (JZ), Christopher Miller, Melissa Tan, and many more Don't miss today's 🔥 post by Kyle Poyar: https://lnkd.in/gKssce6F
-
Ricardo Bion liked thisRicardo Bion liked thisSuper excited to announce The Nocturnists Satellites Program! Live storytelling is so powerful, as anyone who has been to a Nocturnists or Moth event can attest to. And now three California-based Satellite partners interested in putting on their own health care storytelling shows will have access to coaching and promotional support from The Nocturnists team, as well as a stipend to use toward the event. Learn more about the program at the link below. The deadline for applications is April 1, 2024, at midnight (PST).California Satellites Program Grants — the NOCTURNISTSCalifornia Satellites Program Grants — the NOCTURNISTS
-
Ricardo Bion liked thisRicardo Bion liked thisNew RFP! California Health Care Foundation is now accepting applications for quality improvement projects that address anti-Black racism in California’s health care delivery system. Applicants may apply for grants of up to $150,000 for a two-year grant period to implement and assess the impact of efforts to improve equitable medical care for Black Californians in public or private clinics, hospitals, and medical practices. The deadline to apply is January 31, 2024. An informational webinar will be held at 1pm (PT) on Friday, December 1, 2023. Register here: https://lnkd.in/gZtF_U_7Request for Proposals — Addressing Anti-Black Racism in California Health Systems - California Health Care FoundationRequest for Proposals — Addressing Anti-Black Racism in California Health Systems - California Health Care Foundation
-
Ricardo Bion liked thisRicardo Bion liked thisAfter 6.5 years, today is my last day at Airbnb. Every day I got to lead the marketplace data science team was an immense privilege and I am so proud of what we built together to help our Hosts and guests (Future Incremental Value & Experimentation Guardrails to just name two)! Thank you Ricardo Bion for believing in me when it mattered most. Thank you Greg W. Greeley and Nathan Blecharczyk for being the most incredible sponsors. Thank you Shanni Weilert for your masterclass in leadership. And thank you to some of the original marketplace thinkers Carla Pellicano, Bar Ifrach, Jen Dolson, Nicholas Roth, Moon Kim…. I learned so much from you. On marketplaces, leadership, and life :) On to the next adventure! https://lnkd.in/gD6VKRFY https://lnkd.in/gx7ZJm7MHow Airbnb Measures Future Value to Standardize TradeoffsHow Airbnb Measures Future Value to Standardize Tradeoffs
Experience & Education
-
Slack
** ** **** ******* *** *********
-
*********
****** ******** ** **** *******
-
******
******** ** **** *******
-
******** **********
****** ** ********** ******* undefined undefined
-
********** ***** ***** ** **************
******** ****** ******* undefined
View Ricardo’s full experience
See their title, tenure and more.
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Publications
Languages
-
English
Native or bilingual proficiency
-
Portuguese
Native or bilingual proficiency
-
French
Professional working proficiency
-
Spanish
Full professional proficiency
-
Italian
Full professional proficiency
Recommendations received
1 person has recommended Ricardo
Join now to viewView Ricardo’s full profile
-
See who you know in common
-
Get introduced
-
Contact Ricardo directly
Other similar profiles
Explore more posts
-
Rakesh Patni
GTM Advisory • 4K followers
News Alert! Why Anthropic’s multi‑agent system matters: It transforms Claude from a single assistant into a coordinated team—strategically delegating to specialized sub‑agents that search in parallel, summarize separately, and then synthesize findings. This architecture drives a +90% boost in research performance compared to solo agents. 🛠️ It breaks through context-window limits and enables dynamic pivots mid-task. ⚠️ It’s not just about smarter LLMs—it demands serious engineering: robust prompt design, tool reliability, orchestration, observability, and cost‑control to manage the 15× token burn. Bottom line: For complex, open-ended research, Claude’s multi‑agent design marks a shift—from “LLM solo” to “LLM command center.” That’s where real AI agency scales. Learn how the team at Anthropic designed and built their research multi-agent system: https://lnkd.in/gPPZBETQ
6
-
Tayyab Raza
VisionRD • 1K followers
RAG is NOT one-size-fits-all. Most people think Retrieval-Augmented Generation (RAG) is just about "chatting with your PDFs." But in the real world, data is messy, structured, and spread across different platforms. If you treat a SQL table the same way you treat an email, your AI will fail. Accuracy requires precision, not just similarity. I’ve put together a breakdown of how to build a Multi-Agent RAG system that handles: SQL Databases: Using Schema Agents for 100% math precision. Community Forums: Thread stitching to preserve consensus. Chat Messages: Chronological grouping to maintain context. Emails: Smart chunking and re-ranking to kill the noise. The goal? Moving from "General AI" to Production-Grade Systems. Check out the slides below for the full architecture! 👇 What’s your biggest challenge when moving RAG into production? Let’s swap notes in the comments! 💬 #RAG #LLMSystems #AIArchitecture #VectorDatabases #SystemDesign #ProductionAI
38
15 Comments -
Anas Moujahid
X-Arc • 3K followers
I read a lot of news about AI Agents failing in production. You’ve probably heard the stat that less than 5% of them ever make it live. People imagine a digital employee that figures things out, navigates ambiguity, and executes 20 steps perfectly. If you strip this down to first principles (specifically probability theory), you see exactly why these agents fail. It’s called Error Propagation. If an AI model is 95% accurate at a single task, that sounds great. But an AI Agent by definition usually requires a chain of sequential steps. Let's say 10 steps. The math is simple: 0.95 ^ 10 = 0.59 Your agent now has a 59% success rate. It is effectively a coin flip. This is why we tell our clients: Let's not build General Agents. Let's focus on Narrow Chains. When we reduce the scope, we reduce the steps. When we reduce the steps, we fight the entropy. We never sell "magic digital employees". We sell high-probability agentic systems, constrained by their environment to solve very specific problems. And that actually works. The market is obsessed with Intelligence. We are obsessed with Reliability. Physics favors the latter. Grateful for the team helping make this possible :)
25
1 Comment -
Rahul Tibrewal
RingCentral • 2K followers
Building on my last post about the shift from traditional automation to agentic task execution, my latest blog dives deeper into the core characteristics that make agentic workflows so transformative: autonomy, adaptability, and seamless multi-agent collaboration. I share real-world challenges we’ve encountered in customer implementations, along with design patterns and best practices that have helped teams unlock the full potential of agentic AI. Looking ahead, I highlight the future trends already reshaping the enterprise landscape. The future of workflows isn’t just on the horizon - it is unfolding right now. If you’re interested in how agentic AI is driving efficiency, adaptability, and innovation at scale, check out the blogs: Part 2: https://lnkd.in/eMizRi4M Part 1: https://lnkd.in/ewRAQ9NV #AgenticAI #WorkflowAutomation #EnterpriseAI #FutureOfWork
19
-
Shubham Vora
Nutsovertech • 20K followers
I met one AI development agency founder, and they said our 50% RAG apps failed in production last year. Because teams don’t know how to handle: chunking + embeddings + vector DB + reranking + hallucinations + evaluation + monitoring. So I'm sharing a RAG Playbook that shows exactly how to build production-ready RAG systems, with practical examples. ✅ What you’ll learn from this playbook This playbook helps you go from: “RAG demo is working” → “RAG system is reliable in production” You’ll learn how to: - reduce hallucinations using prompting techniques - use advanced chunking methods (not random splits) - select the right embedding model for your use case - choose the best vector database (based on real needs) - add reranking to improve relevance - follow clear steps to build an enterprise-grade RAG system - run evaluation before production using key scenarios - monitor and optimize performance after deployment - track RAG quality using 4 powerful metrics If you’re building a RAG-based feature for your product, this playbook will save you weeks of trial and error. ✅ Save this post (you’ll need it later) 🔁 Repost for your team/network ➕ Follow Shubham Vora to learn more on AI agents.
139
77 Comments -
Walter Maguire
2K followers
Need to apply the power of graph to your business? Tired of needing an army of data scientists to do it? Rocketgraph solves this. But seeing is believing. So I've recorded a set of two-minute tutorials to highlight the platform and how it creates a user-experience that will enable your business analysts to use graph...without being data scientists. This video kicks the series off with our POV on the state of graph tools and sets the context for the rest of the tutorials. #rocketgraph #AI #graph https://lnkd.in/gapEpnC8
10
-
Siddharth Shah
Palo Alto Networks • 4K followers
New Research: How Do Leading LLMs Respond to Mental Health Crises? I'm sharing results from our latest research comparing how six major LLMs Claude, Gemini, DeepSeek, ChatGPT, Grok 3, and LLAMA respond when presented with prompts simulating high-risk mental health disclosures. Our Approach: Working with licensed therapists, we developed 68 standardized prompts across critical domains including suicidal ideation, domestic violence, psychosis, and child safety concerns. We then evaluated responses using five clinically-grounded criteria: explicit risk acknowledgment, empathy expression, encouragement to seek help, provision of specific resources, and invitation to continue dialogue. To our knowledge, this represents the first systematic benchmark of LLM crisis response capabilities using evaluation criteria developed by mental health professionals, a critical gap in AI safety research, until now. Key Findings: Claude demonstrated the strongest overall safety profile (0.88), consistently acknowledging risk and maintaining dialogue. See the chart below for comparison of overall safety profile of all evaluated LLMs. - Most models showed empathy but failed to provide concrete resources or keep conversations open - Grok 3 notably provided zero specific resources across all test scenarios - Only Claude and Grok 3 regularly invited continued conversation, a critical factor in crisis intervention Why This Matters Now: With reports of LLMs being consulted for mental health support millions of times monthly, and recent tragic cases of vulnerable individuals being misguided by AI systems, understanding these safety gaps isn't academic. It's urgent. Our research reveals a concerning truth: No current general-purpose LLM meets satisfactory clinical safety standards for crisis response. The variance between models shows that safety isn't an emergent property of scale; it requires deliberate, value-driven design choices. I want to express deep gratitude to Tony Rousmaniere, PsyD and all my other collaborators who made this meaningful work possible. This represents some of the most important research I've been part of, addressing real world risks that affect vulnerable populations daily. Looking Forward: Our previous national survey found that nearly half of people with a mental health diagnosis who use LLMs report turning to them for mental health support. With this widespread adoption now a documented reality, the current research underscores the critical need for frontier AI companies to prioritize mental health safety guardrails. My collaborators and I would welcome the opportunity to partner with researchers at Anthropic, Google, OpenAI and other leading companies to advance this vital work. Read the blog: https://lnkd.in/g7HpWCVA Full paper: https://lnkd.in/gjb7t5i5 #MentalHealth #AISafety #ResponsibleAI
32
2 Comments -
Laughlin Rigby
Wheresight • 7K followers
AI & Market Research - Can AI take surveys for us? I haven't posted about too many uses of AI with Market Research to date, however I think this case study should get a mention. A recent study by PyMC Labs and Colgate‑Palmolive finds that large language models (LLMs) can closely simulate human responses in consumer product surveys—hitting ~90% of human test–retest reliability across 57 surveys (9.3k human responses). How it worked: Instead of asking a bot for a number, the team asked for a short written answer with a why. That text was then compared to five Likert “anchor” statements using embedding similarity (their Semantic Similarity Rating / SSR method). Result: 90% of human test-retest reliability. They mirrored distributions of responses across demographic segments such as age and income. Researchers emphasise that this doesn’t mean LLMs think like people, but their outputs can align with human behavioural patterns under defined conditions Want to try this? 1) Start with a low-risk, high-volume concept test you already run. 2) Write clear Likert anchors (1–5) as plain sentences. 3) Prompt an LLM for a short rationale per concept and then map that text to anchors using embedding similarity. 4) Create a small human holdout to benchmark; only proceed if your synthetic vs human gap is tight. 5) Be transparent: label synthetic vs human, document prompts, and keep humans in the loop for final calls. Imagine being able to test new tourism or retail products or ad concepts instantly across synthetic “audiences”, before spending on full panels. Huge potential for insight and speed — as long as we keep the human layer where it matters most: judgement, context, and creativity. This isn’t about replacing people; it’s more about "simulate → iterate → validate" so teams can test more ideas, faster, and spend human budget where judgement matters most. Full story below 👇 #AI #MarketResearch #Insights #DigitalTransformation #SyntheticData #Tourism #Innovation https://lnkd.in/eRBWtRcs
6
1 Comment -
Erin Davison Medeiros
Vision Insights • 629 followers
We're facing more and more questions about the use of synthetic data in market research. I'd recommend this blog post, which really resonated with me. So much of the conversation is "how closely can this replicate human data?" But there are so many other considerations researchers should be thinking about. https://lnkd.in/eSQfSTPj
31
3 Comments -
Adam McCabe
Convictional • 1K followers
We wrote a new essay detailing our journey towards AI powered analytics in Convictional: https://lnkd.in/gX4QCgqm TLDR: With LLMs, true self-serve analytics seemed finally within reach. No more over-busy dashboards built for everyone and exactly right for no one. Even better, the LLM could help answer questions about the data. However, a serious trust problem is introduced. Most business users are already skeptical of data and its interpretations, and when you throw in LLM hallucinations (bad SQL, incorrect join logic, or poor interpretation of data definitions) the margin of trust shrinks even further. One or two errors and the solution is written off. Data is a crucial input to decision making, so we’ve been committed to finding a solution that can provide analytics at the levels of accuracy needed. Over the last year we tried a number of approaches, with our initial achieving an unacceptable ~50% accuracy rate, and our current productionized technique approaching 100% accuracy. The key was acknowledging the human role in the solution. We found that using a semantic layer, defined by humans and queries by the LLM was the unlock. By no longer asking the LLM to write the SQL, and instead rely on robust pre-defined metrics and configurations, it could focus on actually answering the user’s questions instead of resolving SQL. If you’re using dbt’s semantic layer (or plan to) and interested in trying it out, just touch base and we can help get you set up! cc Jake Beresford Matthew H. Chequers, Ph.D.
14
-
Jason Stanley
ServiceNow • 8K followers
This looks useful for folks in research or scanning what's coming out on the frontier: Lumi from Google's PAIR research team, a tool for better understanding papers on arxiv. Reading smarter, not harder -- that tagline brings me back to Mortimer Adler's classic 'How to Read a Book'! From their GH repo: Lumi uses AI to help you quickly read and understand arXiv papers. Features include: ✏️ AI-augmented annotations - read summaries at multiple granularities 🔖 Smart highlights - highlight text + ask questions 🖼️ Figure explanations - ask Lumi about images in the paper Demo: https://lnkd.in/ex7erTBb GH repo: https://lnkd.in/emz_Ghcs Medium article: https://lnkd.in/enky8dKC GitHub discussions: https://lnkd.in/efAmiCcp
19
-
Abhiram Kramadhati
Cisco • 3K followers
Anthropic published new research last week that deserves more attention than it's getting in my opinion. They introduced a measure called "observed exposure" i.e. not what AI could theoretically do to a job, but what it's actually doing based on real usage data of their models. A lot of the hype and paranoia in the market today is based on what we "think" AI might do rather than understanding what it is actually doing. So looking at this gap is important. And the gap between the two is significant. These aren't abstract forecasts, they're based on how Claude is being used in professional settings today. So essentially it is looking at areas and assessing what is theoretically possible vs how much is being done today. Bigger the delta, bigger the chance for AI acceleration because the capability exists and it's just a matter of organizations catching up. And that explains why some of the industries are facing much heavier headwinds than others as reflected in their stock prices. What surprised me (or maybe not?): no meaningful increase in unemployment for highly exposed workers yet. But what is showing a clear slowdown is hiring of entry level jobs in these exposed sectors. Hiring of workers aged 22 to 25 into exposed occupations has slowed down roughly 14% compared to pre-ChatGPT levels. This means the jobs aren't disappearing but the entry points are certainly diminishing. I absolutely have no idea what this means to the dynamics of our labour market. We're still very early. The gap between AI's theoretical capability and actual deployment remains big. Given that most planning and spending is happening due to the theoretical side, it explains why we see so much investment going into AI but little to show for it yet in many sectors. Lots and lots of things cooking in the kitchen, but nothing out on the plates yet. The hope is that deployments follow investments soon. Trials and POCs with AI are easy. Production grade is a whole new ballgame. Link to the Anthropic report: https://lnkd.in/gEznfm3s
44
-
Yoav Goldhorn
Stealth • 958 followers
There are very few things that excite me more than applying my experience in new ways. The Master of Analytics program in NC State University allows me to do just that, applying lessons I've learned in intelligence research to data science. While there are endless topics in #machine_learning I am eager to write about, one topic that felt especially close to heart was how language is shaping our decision-making. In "Why Lingo Matters in Analytics", I chose to tackle one specific aspect of this: how our framing of the data must be as accurate and formal as our actual data. Our framing must be accurate and formal, because all throughout history people were hearing and reading ideas via *words*, not numbers; and thus we evolved to make decisions based on *words* and not based on numbers. When a data analyst/scientist makes a recommendation to a stakeholder, the actionable part is always in the words, never in the numbers. The numbers are there only to show rigor and make our claims verifiable. To quantify. To answer the "how much", but never the "what" or "why". The true benefit of AI is allowing us to reach conclusions much faster (and make funny animal videos faster too, I guess). But to reach these conclusions safely, and make the right decisions, we need the right framing to judge the numbers by. Have a fun read!
9
-
George Oates
Flickr Foundation • 1K followers
Yes, enjoyed this very much from Jed Sundwall. He's speaking plainly about how we must always think of the who will use our data and and how in the context of the "perfectly FAIR, utterly useless data" we keep spewing around everywhere. I've always been a fan of figuring *use* as a priority, in favour of the hard-to-know impact, especially when you hear factoids flying around that a very large portion of stored data—likely somewhere between 60-80%—is rarely or never accessed after initial storage.
4
-
Irina Malkova
Salesforce • 6K followers
Every data team knows the whiplash of a foundational initiative. You spend a year lobbying for a beautiful foundational layer. You finally get it greenlit for the next fiscal. You build it for six months. Mid-year, the CEO asks if you can pivot to “this other urgent business thing.” You can’t explain why you can’t. The pivot gets prioritized. Development stops. Entropy takes your half-built foundational layer to a better place beyond this world 🪦 Data teams: please don’t let this happen to Retrieval. We're all about to get that 2026 approval for foundational knowledge management. Please do things differently and protect your beautiful knowledge graph, ontology, semantic layer investments. How: 🌟 North Star → Get crystal clear on what you’re ultimately building. If you haven't written a white paper, you’re not clear. 🏎️ Fast Start → Pick a small, high-ROI use case that forces you one step toward the North Star. Ship only that - not the whole platform! Go fast. Measure value. 📣 Loud Part → Communicate success like your life depends on it. To your team. To your boss. To your CEO. Over and over. Beat your CEO to the punch - they should know about the ROI before they get those mid-year ideas. Repeat for the next use case. Shipping should get easier as foundation becomes sturdier. At Salesforce, our first use case was lead routing. It made millions. We’re not done - but now our focus is on shipping, not fighting entropy. h/t Miklós Molnár for talking about the power of fast high ROI use case
48
10 Comments -
Jared Chung
CareerVillage.org • 3K followers
Anthropic's new AI labor market impacts data dropped Thursday PM. Just 36hrs later the AIR team finished updating every occupation on AIResilience.org. This new data changes things... Here's what happened to AIR data when we integrated Anthropic's Observed Exposure dataset (for those unfamiliar, AIR aggregates AI exposure datasets and displays them for job seekers and students to access): → Occupation coverage: 42.6% → 51.6% of careers → Cross-source confidence tightened ~10% across every tier Across 464 careers with both Anthropic and at least one other source, we see moderate alignment to observed exposure scores. The correlation with the average of other sources is r = +0.46, and the correlation with our AIR top-line score is r = +0.38. To some extent this might be due to how the different measures are constructed -- that's still something we're looking into. By source, correlation with Anthropic is highest for Microsoft (r = +0.48, n = 345), followed by our AIMR model v1.0 (r = +0.29, n = 464) and the WRTJ dataset (r = +0.26, n = 438). We also grouped the 464 careers into quartiles based on our exposure score excluding Anthropic. Anthropic’s mean observed exposure declines steadily across those quartiles, from 67.1% in the most exposed quartile to 30.9% in the least exposed quartile. This suggests our ranking is directionally consistent with Anthropic’s exposure measure, though the relationship is moderate. Again, it's extremely early days so I wouldn't expect strong relationships between outside-in estimates at this point in time. 🔺 Anthropic suggests less resilience than other sources: • Judges: 14.4% resilience from Anthropic, ~83% everywhere else. • Bioinformatics Technicians: 3.7% vs ~62%. • Coroners: 45.3% vs ~94%. 🔻Anthropic suggests more resilience than other sources: • Bank Tellers: 90.1% on Anthropic, ~10% everywhere else. • Tax Examiners: 87.9% vs ~10%. • Cargo & Freight Agents: 94.2% vs ~22%. Remember that only the Anthropic data is based on actual task-level usage. So their data could beexposing a lag (or surge) in adoption for a particular task vs. capability. Where do we go from here? 👉 For researchers: I'd love to see if Google DeepMind or OpenAI see the same usage patterns. Still so many questions about both capabilities and adoption to answer! 👉 For job seekers: Check your occupation in The AI Resilience Report at www.airesilience.org. It covers 1,000+ U.S. occupations, is free, and is built by the nonprofit CareerVillage.org. 👉 For AI Agent builders: If you want your agents to have access to the latest datasets on occupation-level exposure, integrate our API or MCP servers. Kudos to Maxim Massenkoff and Peter McCrory for the excellent write-up, and the easy-to-navigate data on Github, and to the AIR team for their fast movement to get this data up for job seekers to access easily.
30
4 Comments -
Pamela Bhattacharya
Microsoft • 5K followers
I spend a lot of my time working on synthetic evals at work. Not because they’re trendy, but because after enough launches, incidents, and support reviews, you start seeing the same failure patterns repeat. This post is my attempt to write down what I’ve learned about using synthetic evals to test AI systems—not just models—before customers find the gaps for us. https://lnkd.in/gr8bq8HY
80
4 Comments -
Brian Bickell
TextQL • 3K followers
There’s really nothing like taking a new major feature release out to the market on the first day of two back to back conferences. When we planned to roll out Cube D3, our agentic analytics platform, built on our market leading universal semantic layer, I was excited to get the in-person feedback. I spent more time working the booth at both Snowflake Summit and Databricks Data + AI Summit than most partnerships guys would, watching to see what resonated and what we still needed to refine. At first, most understood what we were doing, or at worst kind of disinterestedly said “oh another chatbot”. D3 being able to build and expose visual assets, as well as answer questions and provide result sets caught many folks' attention. That part clicked, because they could get from the demo we had on offer, to D3 being able to rapidly prototype visualizations that could be kicked out into popular front-end frameworks and hosted however they liked. What connected with everyone and pulled in even the most cynical was when we explained our semantic SQL. Semantic SQL is the rather simple looking SQL that D3 (or any consumer via our SQL API) is writing that Cube is rewriting into the complex warehouse SQL that eventually hits your data source of choice. Complex business metrics are defined once upstream providing for trust, governance and consistency. Compared to traditional text-to-sql approaches, we are breaking apart the place where things typically go wrong - generation of highly complex analytical SQL, without context for *exactly* what a user means when they ask for a metric. The result is the user can still ask for ad-hoc analysis built upon these metrics, but they are always going to get compiled down to the approved metric definitions under the hood without any LLM guesswork. We also expose a reasoning trace every step of the way so you can inspect why D3 did what it did. This is becoming standard for AI applications and we think it’s a great practice to incorporate. Cube D3 is currently in preview but if you’re interested drop me a line and I’ll help you get access.
26
1 Comment
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content