⭐ 𝗗𝗮𝘁𝗮 & 𝗘𝗻𝗴 𝗜𝗻𝗻𝗼𝘃𝗮𝘁𝗼𝗿𝘀 𝗦𝗽𝗼𝘁𝗹𝗶𝗴𝗵𝘁: 𝗩𝗲𝗱 𝗣𝗿𝗮𝗸𝗮𝘀𝗵, 𝗣𝗿𝗶𝗻𝗰𝗶𝗽𝗮𝗹 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿 𝗮𝘁 𝗚𝗶𝘁𝗹𝗮𝗯 ⭐
We’re excited to highlight ved prakash Principal Data Engineer at GitLab and one of 17 data leaders globally on Snowflake ’s Data Superhero Council. Ved is redefining how enterprises operationalize real-time data and govern AI at scale.
At GitLab, Ved built Project Siphon, an enterprise-scale CDC backbone delivering real-time streaming, sub-hour latency, and disciplined cost control. He shares lessons through talks (Berlin Buzzwords, DSC Europe) and DataAI Chronicles on next-gen real-time and AI-integrated data platforms.
𝗛𝗼𝘄 𝗵𝗮𝘀 𝘁𝗵𝗲 𝗿𝗼𝗹𝗲 𝗼𝗳 𝗮 𝗱𝗮𝘁𝗮 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗹𝗲𝗮𝗱𝗲𝗿 𝗲𝘃��𝗹𝘃𝗲𝗱 𝘀𝗶𝗻𝗰𝗲 𝘆𝗼𝘂 𝘀𝘁𝗮𝗿𝘁𝗲𝗱 𝘆𝗼𝘂𝗿 𝗰𝗮𝗿𝗲𝗲𝗿?
Early on, leadership meant keeping pipelines running and reports accurate across ETL, queries, and infrastructure. Today it’s strategic: data leaders make multi-million dollar build vs buy decisions, shape company-wide tech direction, and tie architecture to ROI and cost efficiency. The biggest shift is owning platform economics, from Snowflake spend and AI governance to real-time systems that change how the business operates.
𝗪𝗵𝗮𝘁’𝘀 𝗮 𝗿𝗲𝗰𝗲𝗻𝘁 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗮𝗹 𝗱𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝘆𝗼𝘂’𝗿𝗲 𝗽𝗿𝗼𝘂𝗱 𝗼𝗳 — 𝗮𝗻𝗱 𝘄𝗵𝘆?
I’m proud of the AI Cost Control Framework we built for Snowflake Cortex at GitLab as Cortex credit usage scaled without visibility. It monitors consumption in real time, applies configurable thresholds, alerts before budgets are breached, and attributes spend to specific teams and services. As AI embeds into data platforms, proactive cost governance becomes table stakes for scaling innovation with financial discipline.
𝗪𝗵𝗮𝘁’𝘀 𝘁𝗵𝗲 𝗯𝗶𝗴𝗴𝗲𝘀𝘁 𝗺𝗶𝘀𝗰𝗼𝗻𝗰𝗲𝗽𝘁𝗶𝗼𝗻 𝘆𝗼𝘂 𝘀𝗲𝗲 𝗮𝗯𝗼𝘂𝘁 𝗖𝗗𝗖 𝗼𝗿 𝘀𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 𝘀𝘆𝘀𝘁𝗲𝗺𝘀?
CDC isn’t “just faster data.” It’s an architectural transformation involving data movement, consistency guarantees, schema evolution, and how deletes, updates, and backfills coexist with streaming changes. Downstream systems, from dbt models to BI tools to ML workloads, also need to adapt to incremental semantics. CDC requires monitoring, alerting, deep database knowledge, and tight coordination across data, platform, and application teams.
𝗪𝗵𝗲𝗿𝗲 𝗱𝗼 𝘆𝗼𝘂 𝘁𝗵𝗶𝗻𝗸 𝗔𝗜 𝘄𝗶𝗹𝗹 𝗵𝗮𝘃𝗲 𝘁𝗵𝗲 𝗯𝗶𝗴𝗴𝗲𝘀𝘁 𝗶𝗺𝗽𝗮𝗰𝘁 𝗼𝗻 𝗱𝗮𝘁𝗮 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗼𝘃𝗲𝗿 𝘁𝗵𝗲 𝗻𝗲𝘅𝘁 𝗳𝗶𝘃𝗲 𝘆𝗲𝗮𝗿𝘀?
1. Intelligent orchestration and self-healing pipelines (fewer 2am pages)
2. Code generation and platform acceleration (less boilerplate, more time for architecture)
3. Intelligent cost optimization (dynamic optimization for Snowflake/Databricks, especially where streaming and CDC costs can spike)
At Artie, we’re excited to feature leaders like Ved, who are pushing the boundaries of what modern data engineering looks like.