Apache Data Lakehouse Weekly: June 4 to June 11, 2026
The lakehouse community spent this week arguing about versions, and the arguments mattered. Parquet contributors produced the single largest thread ac…
Latest Team Management news from Tech News
The lakehouse community spent this week arguing about versions, and the arguments mattered. Parquet contributors produced the single largest thread ac…
Hello everyone! Following up on my previous post , Day 1 of my Modern Data Stack migration was an absolute rollercoaster of refactoring and deep data …
When we started working on Krenalis , we spent a lot of time reviewing how customer data typically flows through a modern data stack. One pattern kept…
The Term Everyone Uses and Nobody Defines Your CTO came back from a conference and said the team needs to "go agentic." A vendor pitched you an "agent…
Enterprise data platforms are really good at one thing: creating copies of the same data everywhere. Different teams copy the same curated folders int…
In neighbourhood retail markets, local Kirana stores, and hyper-local fulfilment centres, inventory management isn’t an administrative task—it’s a hig…
Most candidates treat the take-home assessment as a coding test. It is not. It is a professional communication test that happens to include coding. Th…
I graduated in November 2025. My only formal work experience is a 6-month IT internship. I have never worked at a tech company, never contributed to a…
I published a public data engineering project that demonstrates a cloud-based ETL pipeline for analyzing web analytics search keyword revenue. The pro…
What Changed in Data Engineer Job Descriptions Around 2023? For years, a Data Engineer job description was a known quantity: Python for pipeline code,…
Over the past decade, the core evolution of data engineering has been the deconstruction and reconstruction of traditional data warehouse architecture…
Dipankar Mazumdar is the Director of Developer Relations at Cloudera, leading global developer initiatives across lakehouse architecture and AI. He pr…
Introduction Good forecasts help with capacity planning and quieter alerts. But one traffic spike or memory leak can make any forecast useless. The go…
Originally published at https://shai-kr.github.io/data-ninja-ai-lab/blog/2026-05-24-fabric-ai-functions-data-workflows.html Most enterprise GenAI demo…
The Problem We Were Actually Solving As a data engineer, I've spent years building data infrastructure to support high-growth businesses. But my lates…
The Problem We Were Actually Solving In late 2024 our small creator platform had 8,400 monthly active users and 1,400 paying creators. We were based i…
We wanted to sell software licenses to customers in over 120 countries without any payment restrictions. The platform stores we had been using didn't …
Part 3 of 5 in the series: When Your AI Pipeline Grows Up In the previous post, we worked through the pipeline architecture that gets features from ra…
A Metaplane alternative is a data observability tool that monitors warehouse tables for freshness, volume, schema, and distribution issues, the same j…
If you've ever inherited a healthcare database with columns named DOB , PatientID , or CLAIM_NUMBER — this guide is for you. Healthcare data engineeri…
One sentence shows up in internal-system projects again and again: “leadership wants real-time data.” I usually slow the discussion down when I hear t…
Modern software systems are expected to run consistently across multiple environments such as development laptops, testing servers, cloud platforms an…
There is no shortage of frameworks for thinking about data quality . There is, however, a significant shortage of practical guidance for actually buil…
The release wave that defined late April carried straight into early May, with Arrow shipping two more votes in seven days, Polaris settling into post…
A case study in identity resolution, multi-source data stitching, and why vertical infrastructure beats horizontal platforms for specialized industrie…