Lambda is cheap… until it runs 20 million times per hour. We analyzed 400+ serverless workloads. 60% had NO cost guardrails. Serverless is supposed to be cheap. But in reality it's the fastest way to burn cash invisibly (if you don’t set guardrails from day one.) Here’s what I saw: 🚩 No concurrency limits. One bad loop → 3,000 concurrent Lambdas → instant spike → ops team wakes up confused. 🚩 Event storms with zero throttling. SNS, SQS, DynamoDB Streams triggering like crazy—no retry logic, no backpressure. 🚩 No budget alerts. No anomaly detection. People trust the cloud bill after the fire. Not before. 🚩 Over-reliance on default memory configs. 2GB Lambdas running a 150MB function, 1000x a minute. Multiply that by 30 regions. 🚩 Huge cold start waste. Dev teams chasing speed. But paying for idle spin-ups in traffic patterns they never profiled. I think the worst part is that most teams had no visibility into which invocation patterns were driving 80% of cost. They just assumed “It’s serverless, it must be optimized.” Here’s the framework we now apply to every client before scaling serverless to prod. 𝗧𝗵𝗲 𝗦𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 𝗖𝗼𝘀𝘁 𝗚𝘂𝗮𝗿𝗱𝗿𝗮𝗶𝗹𝘀 𝗖𝗵𝗲𝗰𝗸𝗹𝗶𝘀𝘁: ✅ Set concurrency caps for all Lambdas ✅ Define sane retry/backoff policies on event sources ✅ Profile and right-size memory + duration (not guess) ✅ Use real-time cost anomaly detection (per function) ✅ Tag all workloads with ownership + purpose (for chargeback clarity) Serverless doesn’t need to be expensive. But it will be (if you treat it like a free lunch.)
Challenges in Serverless Computing
Explore top LinkedIn content from expert professionals.
Summary
Serverless computing lets developers build and run applications without managing the underlying servers, but it comes with several unique challenges, such as managing costs, complexity, and security. Understanding these hurdles is essential for anyone using serverless platforms like AWS Lambda or Google Cloud Run.
- Control costs: Set up cost guardrails and regularly review function configurations to avoid unexpected spikes in your cloud bill.
- Manage complexity: Carefully plan how your application handles state, retries, and orchestration to prevent hidden reliability problems and operational headaches.
- Prioritize security: Audit permissions, track event triggers, and run security exercises after each deployment to protect sensitive data and stay prepared for incidents.
-
-
Serverless has one fundamental problem: State. Platforms like Cloud Run and Lambda are fantastic at compute. But they don't manage execution state. Here's how companies waste huge $$$: 𝐒𝐭𝐚𝐫𝐭 𝐬𝐢𝐦𝐩𝐥𝐞 Lambdas, Cloud Run, etc. Stateless handlers. HTTP in, HTTP out. Life is simple. 𝐀𝐝𝐝 𝐪𝐮𝐞𝐮𝐞𝐬 "Some requests are getting lost." You add a queue (e.g., SQS) to buffer work. Now you have at-least-once delivery and basic flow control. But you also now have two systems to reason about: compute and messaging. -> You’ve introduced distributed coordination. 𝐀𝐝𝐝 𝐫𝐞𝐭𝐫𝐲 𝐥𝐨𝐠𝐢𝐜 "Sometimes the container disappears mid-request." Cold starts, OOM kills, rolling deployments, provider preemption. So you add retries at the application level, usually retrying the whole request. -> You are now duplicating work and entering the space of consistency anomalies. 𝐀𝐝𝐝 𝐚 𝐃𝐋𝐐 "Some retries never succeed." Flaky dependencies, bad inputs, partial failures. You can't drop the request, so you route it to a Dead Letter Queue and build tooling to monitor and reprocess it. -> More custom infrastructure, more operational surface area. 𝐀𝐝𝐝 𝐢𝐝𝐞𝐦𝐩𝐨𝐭𝐞𝐧𝐜𝐲 𝐤𝐞𝐲𝐬 "Some customers were charged twice." A queue lease expired. A function timed out. A client retried. So you implement idempotency keys, deduplication tables, and cleanup jobs. -> State now leaks into every boundary of the system. What started as "just put SQS in front" becomes weeks of careful distributed systems engineering. You realize you are re-implementing parts of reliability theory in application code. 𝐀𝐝𝐝 𝐚𝐧 𝐨𝐫𝐜𝐡𝐞𝐬𝐭𝐫𝐚𝐭𝐨𝐫 "This is getting hard to reason about." You introduce Step Functions, Airflow, Temporal, or a custom DAG engine. Now you have multiple sources of truth: the application, the workflow engine, and a very upset Head of Finances. Keeping them consistent becomes a permanent concern. The result? You didn't build "serverless", but: a message broker, a retry framework, a dedup system, a saga engine, a workflow scheduler, a recovery pipeline. All glued together :) Durable execution makes your program itself the state machine: every step is checkpointed transactionally in the database, so crashes, restarts, and redeploys resume execution exactly where it left off. Instead of rebuilding reliability with queues, retries, and deduplication, the runtime guarantees exactly-once progress and deterministic replay by construction. Don't take my word for it and try out this guide on deploying a DBOS app on Google Cloudrun.
-
Serverless doesn't shrink your attack surface. It relocates it. 𝘓𝘦𝘢𝘳𝘯𝘪𝘯𝘨 𝘚𝘦𝘳𝘷𝘦𝘳𝘭𝘦𝘴𝘴 𝘚𝘦𝘤𝘶𝘳𝘪𝘵𝘺 by Joshua Arvin Lat makes the core argument clearly: teams that move fast with serverless architectures often move fast past the security model those architectures actually require. 𝗧𝗵𝗲 𝗲𝘅𝗽𝗼𝘀𝘂𝗿𝗲 𝗹𝗮𝘆𝗲𝗿 When critical systems and sensitive data migrate to serverless infrastructure without a corresponding shift in security posture, the gap becomes an incident waiting 'on a timeline'. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝘄𝗼𝗿𝘁𝗵 𝗮𝗽𝗽𝗹𝘆𝗶𝗻𝗴 🔹 Audit your serverless functions for privilege escalation paths before attackers do. IAM misconfiguration in serverless is not a configuration smell; it is an exploitable vector. 🔹 Run offensive and defensive exercises against your own infrastructure. Understanding how attacks unfold against vulnerable serverless apps is the gap between theoretical compliance and operational security. 🔹 Track regression on function permissions and event trigger scope after every deployment. Silent scope creep in serverless is one of the least monitored failure modes in cloud security. 𝗥𝗶𝘀𝗸 𝘁𝗼 𝘄𝗮𝘁𝗰𝗵 🔹 Teams with strong cloud fluency often underestimate serverless-specific attack surfaces precisely because the infrastructure feels abstracted. That confidence is the vulnerability. 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗴𝗮𝘁𝗲 𝗳𝗼𝗿 𝘀𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 𝗿𝗲𝗮𝗹𝗶𝘁𝘆 𝗔𝘂𝗱𝗶𝘁𝗮𝗯𝗶𝗹𝗶𝘁𝘆: Can you reconstruct exactly what a function accessed, when, and under what permissions? ⚙️ 𝗣𝗿𝗶𝘃𝗶𝗹𝗲𝗴𝗲 𝘀𝗰𝗼𝗽𝗲: Are function roles scoped to minimum required access, and is that verified post-deploy? 𝗗𝗲𝗳𝗲𝗻𝘀𝗲 𝘃𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻: Have you tested your defenses using the same offensive techniques an attacker would use? 🔍 𝗖𝗿𝗼𝘀𝘀-𝗰𝗹𝗼𝘂𝗱 𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝘆: Does your security posture hold equally across AWS, Azure, and GCP, or are there platform-specific gaps? 𝗜𝗻𝗰𝗶𝗱𝗲𝗻𝘁 𝗿𝗲𝗮𝗱𝗶𝗻𝗲𝘀𝘀: If a serverless function is compromised tonight, do you have detection and containment playbooks ready to execute? ⚠️ If your team runs serverless workloads on two or more cloud providers, where are the security assumptions you've carried over from one platform that don't actually apply to another? #ServerlessSecurity #CloudSecurity #AWSecurity #AzureSecurity #GoogleCloudSecurity #PenetrationTesting #CloudArchitecture #PrivilegeEscalation #SecurityEngineering #ResponsibleAI
-
A client spent over $5,000 a month on a single Lambda function - and most of it was wasted. Here's what happened. The function handled a CPU-heavy task, so the team cranked up the memory to 10GB, hoping to speed things up. The team knew that, with Lambda, more memory = more CPU. What they didn't realise was that the CPU scaling was not vertical - you unlock another vCPU for every 1.8GB of memory. At 10GB, the function had 6 vCPUs. But this was a Node.js function... Since Node.js is single-threaded, it'd only use one CPU unless you spawn child processes. That meant most of the CPU power they paid for were not utilized by their code. The function didn’t need that much memory either, so most of their spending was wasted. It was an honest mistake, but it had a big impact. The fix was simple, right-size the memory setting and cost went back to an acceptable level. But it was a bitter lesson for the team... I hope sharing their story saves you from making the same mistake! Lambda gives you a single lever to control performance and cost. I like that simplicity, but it also makes it easy to get things wrong by a huge margin! And sometimes, you just need extra CPU without all the extra memory as well. And before you blame serverless for high costs, remember that container workloads can run into the same issue. A misconfigured instance size or auto-scaling setting would lead to just as much waste. At least in this case, the cost was tied to actual usage - the function was fairly busy - and not idle servers doing nothing. I've worked in places that spent 10s of thousands dollars a month on EC2 clusters running at 5% CPU! (but no one bats an eyelid..) The good news is that, there are tools to help with this kind of problem. You can use the Lambda Power Tuning tool, although it requires a bit of effort to fine-tune each function. There's also the AWS Compute Optimizer - it just needs to be enabled, and it will give you recommendations over time. The main downside is that it only suggests changes when it's very confident in them, so it might take a while. And if you’re running a Lambdalith (e.g. a monolithic function that handles all the routes in an API), then these tools won’t be very effective. A function doing multiple things with very different performance needs is much harder to optimize. That's something to keep in mind if you decide to go down this path.
-
My thoughts around #serverless are ever evolving. After 8 years of writing code that is hosted in serverless compute and integrating with various serverless components, I've boiled things down to this. 1) I don't love Lambda hosting APIs if I have API complexity including east/west traffic 2) Serverless "solutions" are too easy to get started and too hard to manage at scale. Not traffic, at code changes. I tend to not use them anymore 3) Serverless "components" are the sweet spot. I wish there were more fundamental serverless building blocks. Queues, Storage, Streams, Databases, Caches. The things we build solutions on top of. Best of breed would be amazing 4) People still equate serverless to functions far too often. I hope #3's growth eases that 5) Observability is still too hard. But it's a lot better Bottom line is this. I think we pushed serverless always and even serverless first too far. I 100% know (not believe) that serverless fits in pretty much any scenario, but I think we do ourselves no favors as developers by not reaching for EC2 or Containers when it's appropriate. Lambda is too often a starting point even when it would be simpler to run a system process hosting a web framework. We convince ourselves that the pain of setting up and running EC2 vs Lambda in IaC is compared to the pain in managing a Lambda sprawl when an Express server would have been just fine. We modify code way more than we setup IaC And lastly, something like SQS and DynamoDB are blocks that remain regardless of the compute. You can even mix and match. Take the EC2 web server and pair it with DynamoDB. Serverless can be the backbone and key part, but the compute, well it doesn't always have to be Lambda. Love to hear your thoughts!
-
Just published: "Serverless MCP: Stateless Execution for Enterprise AI Tools" Most teams build MCP servers with persistent connections and session state. For enterprise workflows—where tools orchestrate across Salesforce, Stripe, and other systems of record—there's a better way. What serverless architecture eliminates: - Server affinity and connection limits - Session state synchronization - Cache staleness and stale reads - Complex failure recovery (no connection state to reconstruct) What stateless execution forces: - Backend systems as source of truth (your CRM, ERP, payments—not cached copies) - Idempotent operations by design (no duplicate charges, no duplicate records) - Self-contained requests (any worker handles any call) - Cleaner separation between protocol and execution layers The article explains: - The three architectural choices that define serverless MCP - When stateless execution matters (and when it doesn't) - Server architecture comparison (side-by-side) - How to decide which pattern fits your system Includes a complete open-source reference implementation (Dewy Resort sample app) demonstrating the patterns. Read it here: https://lnkd.in/gTKSDg6d Understanding the tradeoffs matters more than following trends.
-
Serverless platforms like Cloud Functions and Cloud Run are game-changers for speed and scalability, but their ephemeral nature creates a massive Governance Gap for organizations! The short execution cycles and fragmentation make traditional operations impossible. In my latest piece, I dive into the core challenges facing every serverless team: 1. Tracing: How do you track a request across dozens of microsecond functions? 2. Cost Allocation: How do you accurately attribute GB-second costs to specific business units? 3. Security: How do you enforce Least Privilege when permissions are the new perimeter? https://lnkd.in/gHJT6Wvz I explore practical, serverless-native strategies to close these gaps, moving beyond old-school governance models. If your organization is scaling its serverless footprint, this is essential reading. #Serverless #CloudComputing #FinOps #DevOps #CloudSecurity #Governance #CloudRun #CloudFunctions
-
In my first 3 months building serverless apps, I failed to use caching. (Here’s why + a breakdown of what I missed.) First I tried this setup: - Lambda triggered by API Gateway - Reads + writes to DynamoDB - Static assets pushed via S3 - That’s it. No caching anywhere. It worked… until it didn’t. - I hit latency spikes - Costs shot up from repeat queries - Users started seeing delays - Performance wasn’t predictable I thought “Serverless should scale,” but I missed one thing: Serverless still needs caching. Here’s what I learned - and implemented: - CloudFront for static + dynamic edge delivery - API Gateway Caching for common GETs - Lambda in-memory cache (warm container tricks) - ElastiCache (Redis) for session storage - DynamoDB DAX for sub-millisecond lookups After adding those layers, everything changed: - Latency dropped from 400ms → 60ms - Costs dropped by 32% - My app felt fast and stable My worst mistake was skipping caching early on. Why? Because it felt like “extra work” instead of essential planning. Now I teach clients to design with caching from the start. The stack might be serverless - but the performance still needs architecture. Which caching layer are you learning to implement right now? 𝐃𝐌 me "roadmap" if you're serious about your cloud career and ready to fast-track your results. 👉Join our Growth Circle for more free resources - https://nfcgo.to/start Follow Riyaz Sayyad for more tips and insights into AWS Cloud
-
📣 Happy to share a recap of my SF Systems talk (Title: "Databases and Serverless are Made for Each Other")! I've put together a summary of the session along with some great Q&A moments. It was based on the CACM paper that Peter Kraft and I recently published. Huge thanks to the organizers and everyone who joined—it was an amazing experience! Abstract: Supporting stateful applications is one of the biggest challenges in serverless computing. Serverless functions are inherently ephemeral and stateless, while stateful applications require long-running processes and complex interactions. Developers typically address this gap by using external orchestrators to manage state and dispatch functions, but this approach introduces complexity, limits scalability, and impacts performance. In this talk, I explored a new approach that leverages databases and transactions to natively manage execution state within a fully serverless architecture. By integrating durable execution and messaging into a lightweight library, this method enables efficient execution and management of long-running, stateful serverless applications. I'd love to hear your feedback and thoughts!