Cross-Site Agent Intelligence: Why We Built the ARP Profile
There is a quiet pattern hiding inside every production engineering team that runs AI agents in 2026. Site A’s content agent learns that Claude 3.5 oc…
Latest AI & ML news from Tech News
There is a quiet pattern hiding inside every production engineering team that runs AI agents in 2026. Site A’s content agent learns that Claude 3.5 oc…
Why OpenTelemetry Won Three years ago, the observability landscape was fragmented. Jaeger for tracing, Prometheus for metrics, Fluentd for logs, each …
TL;DR Ingero Fleet v0.10 FOSS is live. We validated the full pipeline end-to-end on two 3-node Lambda Cloud clusters: 3x A100 SXM4 (x86_64) and 3x GH2…
AI agents are distributed systems. They fan out across LLM calls, tool invocations, memory lookups, and multi-step reasoning loops — often asynchronou…
Book: RAG Pocket Guide Also by me: LLM Observability Pocket Guide My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code an…
You Don't Need Datadog (Yet) I see startups spending $5,000/month on Datadog with 8 engineers. That's $625 per engineer per month for monitoring. At t…
In an earlier post I argued that event-driven agents reduce scope, cost, and decision dispersion because they narrow the decision space before the mod…
A single slow GPU – a straggler – in a 1,000-node training cluster idles 999 healthy GPUs at every AllReduce barrier. The job does not crash. There is…
OpenTelemetry eBPF Instrumentation (OBI) — The Complete Guide: KubeCon EU 2026 Beta Launch, Zero-Code Observability, and the 1.0 GA Roadmap Published …
The Day Prometheus Fell Over Prometheus memory usage spiked from 8GB to 32GB overnight. OOM-killed. Monitoring was down for 20 minutes while we scramb…
A practical guide In the first part , I covered the two initial signals to diagnose that something is wrong : Latency Traffic Those two alone explain …
*Originally published on cubeapm.com As organizations adopt cloud-native architectures, Kubernetes, and microservices, systems have become more distri…
TL;DR APM = metrics + traces + logs — Use all three together. Auto-instrument first — Agents cover HTTP, DB, queues. Add custom tags ( order_id , cust…
Book: Observability for LLM Applications — paperback and hardcover on Amazon · Ebook from Apr 22 Also by me: Thinking in Go (2-book series) — Complete…
Book: Observability for LLM Applications — paperback and hardcover on Amazon · Ebook from Apr 22 My project: Hermes IDE | GitHub — an IDE for develope…
Крупная логистическая компания. Многолетний архив — договоры с перевозчиками, регламенты, переписка по инцидентам. Менеджер хочет узнать: были ли у на…
Enterprise buyers treat a public status surface as a signal of operational maturity—not marketing polish. This guide covers what to publish, how to st…
Evaluation is price-setting. Observation is reading. Get the entry point wrong and wherever you arrive, you end up back at evaluation. Why Start With …
Why event-driven agents reduce scope, cost, and decision dispersion Most agent systems do not control their costs because they spend tokens letting th…
The State of Observability 2026 report is out — here's what 407 DevOps engineers and SREs actually told us. Let's be honest. Most of us are juggling m…
Part 3 of a series on building a metrics pipeline into ClickHouse Read Part 2: Understanding Vector Pipelines Where Things Got Real By this point, the…
The Kubernetes Monitoring Maze Kubernetes gives you a thousand metrics out of the box. Most teams monitor all of them and understand none of them. Aft…
The Kubernetes Monitoring Maze Kubernetes gives you a thousand metrics out of the box. Most teams monitor all of them and understand none of them. Aft…
The Kubernetes Monitoring Maze Kubernetes gives you a thousand metrics out of the box. Most teams monitor all of them and understand none of them. Aft…
The Kubernetes Monitoring Maze Kubernetes gives you a thousand metrics out of the box. Most teams monitor all of them and understand none of them. Aft…
In the original Eval Gap post , we laid out the problem: the distance between "works in demo" and "works in production" kills AI products. Four mechan…
Я Шевкопляс Дмитрий, технический руководитель проекта Swapno — сервис для обмена автомобилями ключ-в-ключ, без дилеров. Механика — как в Tinder: свайп…
TL;DR** — The EU AI Act Article 12 deadline for high-risk AI logging is August 2, 2026. Singapore's IMDA Agentic AI Framework is already in force (Jan…
В статье показываем контракты будущей OLTP-СУБД: как разделены слои ядра, зачем нужен per-tablespace page size, почему конфигурация уходит в adaptive …
How Observability Engineering Cut Incident Response Time by 85% in Production Part 1 of 3: Structured Logs and Correlation IDs Part of a three-part se…