Why Audit Trails Matter in ClickHouse®: Building Accountability, Compliance, and Security
When teams evaluate database platforms, the conversation usually revolves around performance, scalability, query optimization, and storage efficiency.…
Latest Testing & QA news from Tech News
When teams evaluate database platforms, the conversation usually revolves around performance, scalability, query optimization, and storage efficiency.…
Last year I was on a team that pushed 40 million events per day through Kafka. We had consumer lag alerts, rebalancing incidents, and a whole runbook …
This is the narrated version of our free, interactive Data Engineer Roadmap . Same areas, same order, with a focus on the one thing each layer asks of…
I Built a Consistent Hashing Ring in Pure Python and Finally Understood How Cassandra Distributes Data I've been using Cassandra and Redis Cluster for…
Most data engineering teams do not struggle because they lack smart people. They struggle because too much of the delivery process is still repetitive…
Traditional databases just can't keep up with high concurrency and low latency at the same time. The term "real-time" has become kind of meaningless. …
If you work anywhere near payments, banking, crypto, or fintech in Europe, a new acronym is about to land in your backlog: AMLA — the EU's Authority f…
One thing that confused me when I first started learning ClickHouse was the word FINAL . Because eventually you'll come across both: SELECT * FROM eve…
The lakehouse community spent this week arguing about versions, and the arguments mattered. Parquet contributors produced the single largest thread ac…
Most lineage tools produce beautiful diagrams that don't answer the one question that matters: 'What breaks if this data is wrong?' Here's how to move…
CSV files are one of the most common formats for storing and exchanging data. Whether you’re working with logs, analytics data, application exports, o…
A scraper can pass every check you wrote and still be wrong about the one thing you actually care about: how much it collected. No exception. No 500. …
A few months ago I spent the better part of a day chasing a bug that turned out not to be a bug at all. A downstream dashboard showed revenue had jump…
My scraper died at row 12,000 of 50,000, three hours in. The crash itself was cheap. A process gets OOM-killed, a quota trips, a machine reboots, it h…
The Term Everyone Uses and Nobody Defines Your CTO came back from a conference and said the team needs to "go agentic." A vendor pitched you an "agent…
Enterprise data platforms are really good at one thing: creating copies of the same data everywhere. Different teams copy the same curated folders int…
Most data teams have already made two decisions, even if they haven't written them down yet. The first is that Apache Iceberg will be the table format…
Handling Time Zone Differences in Forex APIs: A Practical Developer’s Guide When I started building a multi-source forex data pipeline for a brokerage…
In neighbourhood retail markets, local Kirana stores, and hyper-local fulfilment centres, inventory management isn’t an administrative task—it’s a hig…
The JobSense project needed a FastAPI backend that served 604 job embeddings via semantic search, a Pydantic validation layer that stopped bad data be…
Most candidates treat the take-home assessment as a coding test. It is not. It is a professional communication test that happens to include coding. Th…
I graduated in November 2025. My only formal work experience is a 6-month IT internship. I have never worked at a tech company, never contributed to a…
If your Kafka Docker Compose still has a ZooKeeper service in it, your setup is already legacy. As of Kafka 4.0 (released March 2025), ZooKeeper is go…
I run every data pipeline I build on Linux. PostgreSQL, Airflow, dbt, Docker, FastAPI — all of it runs on Linux, even when my laptop is Windows. Under…
AI doesn't begin with algorithms. It begins with data, decisions, documentation, and governance. If you can't explain where your data came from, how i…
You want Claude — or Cursor, or ChatGPT, or any MCP-aware agent — to answer questions about your Snowflake data. You also do not want the agent to rea…
What Changed in Data Engineer Job Descriptions Around 2023? For years, a Data Engineer job description was a known quantity: Python for pipeline code,…
Over the past decade, the core evolution of data engineering has been the deconstruction and reconstruction of traditional data warehouse architecture…
Dipankar Mazumdar is the Director of Developer Relations at Cloudera, leading global developer initiatives across lakehouse architecture and AI. He pr…
Every data engineer knows Apache Airflow. But how many have built a workflow orchestrator from scratch? Understanding the internals — topological sort…