Three Ways to Set Up CDC from Postgres to ClickHouse
You cannot run analytical queries on the same Postgres primary that serves your application without paying for it in CPU and connections. A read repli…
Tech news from the best sources
You cannot run analytical queries on the same Postgres primary that serves your application without paying for it in CPU and connections. A read repli…
One UPDATE statement. One trigger. One automatic audit record — no extra code required. Triggers are one of those SQL features that can seem a little …
What Changed in Data Engineer Job Descriptions Around 2023? For years, a Data Engineer job description was a known quantity: Python for pipeline code,…
Over the past decade, the core evolution of data engineering has been the deconstruction and reconstruction of traditional data warehouse architecture…
Dipankar Mazumdar is the Director of Developer Relations at Cloudera, leading global developer initiatives across lakehouse architecture and AI. He pr…
Data Normalization Across Dublin Rental Portals: How to Make Listings Comparable Dublin rental listings are fragmented even across the main portals. D…
Every data engineer knows Apache Airflow. But how many have built a workflow orchestrator from scratch? Understanding the internals — topological sort…
Introduction Good forecasts help with capacity planning and quieter alerts. But one traffic spike or memory leak can make any forecast useless. The go…
Originally published at https://shai-kr.github.io/data-ninja-ai-lab/blog/2026-05-26-copy-job-cdc-sql-estate-ga.html . Copy Job CDC with SQL estate is …
Introduction ClickHouse is a columnar OLAP database. It runs aggregate queries across billions of rows in seconds. MySQL is what most apps run on for …
Originally published at https://shai-kr.github.io/data-ninja-ai-lab/blog/2026-05-24-fabric-ai-functions-data-workflows.html Most enterprise GenAI demo…
AI x Crypto Systems disclosure: this article was prepared with AI assistance as an editorial helper. The ideas, facts, code, sources, and conclusions …
Introduction Apache Kafka and Apache Cassandra pair effectively because they complement each other's strengths: Kafka handles high throughput, real-ti…
Introduction Ever wondered how banks are able to detect and stop fraud in real-time? This is how they do it. Banks process thousands of transactions e…
Parsing the Unparsable: Building a Layout-Aware Computer Vision Pipeline for 50,000+ Stone SKUs Executive Summary The stone and marble industry operat…
Every data engineer knows the struggle: finding a project that's both technically impressive and genuinely useful. Today I'll walk you through AfriDat…
This is Part 14 of a 15-part Apache Iceberg Masterclass . Part 13 covered streaming approaches. This article is a practical walkthrough of working wit…
This is Part 12 of a 15-part Apache Iceberg Masterclass . Part 11 covered metadata tables. This article covers the two main ways to access Iceberg dat…
This is Part 10 of a 15-part Apache Iceberg Masterclass . Part 9 covered how tables degrade. This article covers the four maintenance operations that …
Original Japanese article : AWS Glue Workflowの使い方について整理してみる Introduction I'm Aki, an AWS Community Builder ( @jitepengin ). Previously, I wrote an art…
The Problem We Were Actually Solving As a data engineer, I've spent years building data infrastructure to support high-growth businesses. But my lates…
Kafka compression waste is usually a batch depth problem, not a codec problem. Better batching improves producer compression, which reduces consumer C…
Apache Kafka is widely recognized as the go-to way system for real-time event streaming. Modern systems across banking, e-commerce, healthcare, gaming…
The Problem We Were Actually Solving We were building a platform for digital creators across Africa, a region with a diverse array of economic conditi…
The Problem We Were Actually Solving Our primary goal was to create a seamless purchasing experience for customers worldwide. With Stripe's reputation…
The Problem We Were Actually Solving Our team focused on creating an inclusive marketplace for creators and buyers alike. We ensured our digital produ…
The Problem We Were Actually Solving My goal was to build a seamless ebook purchase experience for users in various regions, where PayPal, Stripe, Gum…
The Problem We Were Actually Solving In late 2024 our small creator platform had 8,400 monthly active users and 1,400 paying creators. We were based i…
The Problem We Were Actually Solving Our goal was to enable seamless, borderless transactions for digital products using Bitcoin. Sounds simple, but i…
We wanted to sell software licenses to customers in over 120 countries without any payment restrictions. The platform stores we had been using didn't …