The Data Engineer Roadmap for 2026 (in an AI-Native World)
This is the narrated version of our free, interactive Data Engineer Roadmap . Same areas, same order, with a focus on the one thing each layer asks of…
Latest DevOps news from Tech News
This is the narrated version of our free, interactive Data Engineer Roadmap . Same areas, same order, with a focus on the one thing each layer asks of…
Germany has no Companies House Unlike the UK's free official API, German company data is fragmented across regional courts and published through the H…
When working with ClickHouse®, writing a query is usually straightforward. Writing an efficient query, however, requires understanding how ClickHouse …
Traditional databases just can't keep up with high concurrency and low latency at the same time. The term "real-time" has become kind of meaningless. …
If you're building a modern data stack that requires either high-throughput transaction processing or large-scale analytical workloads, you've likely …
Recently, I completed my first full Data Engineering project: building an end-to-end ETL pipeline using real-world Australian weather data spanning 10…
If you work anywhere near payments, banking, crypto, or fintech in Europe, a new acronym is about to land in your backlog: AMLA — the EU's Authority f…
One thing that confused me when I first started learning ClickHouse was the word FINAL . Because eventually you'll come across both: SELECT * FROM eve…
Over the years, I've seen many data platforms start with good intentions. A few scripts are created to move data from one system to another, and every…
Introduction As a data engineer, most of your work will happen on Linux servers. Whether you are managing databases, running data pipelines, or proces…
The lakehouse community spent this week arguing about versions, and the arguments mattered. Parquet contributors produced the single largest thread ac…
Hello everyone! Following up on my previous post , Day 1 of my Modern Data Stack migration was an absolute rollercoaster of refactoring and deep data …
Most lineage tools produce beautiful diagrams that don't answer the one question that matters: 'What breaks if this data is wrong?' Here's how to move…
lakehouse has two storage areas ; Files and Tables Files Store structured, queryable data by sql Supports schema definitions and ACID transactions Tab…
If you're learning data engineering, you'll probably meet Apache Kafka very early. You'll see it in job descriptions, system design diagrams, real-tim…
The era of passive data analytics is over. Today, the most forward-thinking data teams aren't just building dashboards to show what happened yesterday…
When you are starting out in Data Engineering, it is easy to focus entirely on writing pristine Python code, designing SQL schemas, or learning comple…
Introduction In the evolving landscape of data engineering, DuckLake is emerging as a powerful solution for building data lakes with ACID transactions…
CSV files are one of the most common formats for storing and exchanging data. Whether you’re working with logs, analytics data, application exports, o…
Original Japanese article : AWS Lake Formationの使い方について整理してみる Introduction I'm Aki, an AWS Community Builder ( @jitepengin ). Previously, I wrote an ar…
The best way to actually understand data engineering is to build something that breaks, fix it, and watch it successfully run. In this article, we bui…
A scraper can pass every check you wrote and still be wrong about the one thing you actually care about: how much it collected. No exception. No 500. …
When you build an AI system for marketing performance monitoring, one tempting idea is to let the LLM decide everything. Campaign pacing is off. Creat…
It's important to monitor Dataform - jobs executed by Dataform can be the primary source of BigQuery costs in a modern data platform. Forgetting to in…
A few months ago I spent the better part of a day chasing a bug that turned out not to be a bug at all. A downstream dashboard showed revenue had jump…
If you're exploring a career in data, you've probably seen both titles everywhere — job boards, LinkedIn, bootcamp brochures. They both work with data…
When we started working on Krenalis , we spent a lot of time reviewing how customer data typically flows through a modern data stack. One pattern kept…
The Term Everyone Uses and Nobody Defines Your CTO came back from a conference and said the team needs to "go agentic." A vendor pitched you an "agent…
Enterprise data platforms are really good at one thing: creating copies of the same data everywhere. Different teams copy the same curated folders int…
Most data teams have already made two decisions, even if they haven't written them down yet. The first is that Apache Iceberg will be the table format…