Cross-Site Agent Intelligence: Why We Built the ARP Profile
There is a quiet pattern hiding inside every production engineering team that runs AI agents in 2026. Site A’s content agent learns that Claude 3.5 oc…
Latest Open Source news from Tech News
There is a quiet pattern hiding inside every production engineering team that runs AI agents in 2026. Site A’s content agent learns that Claude 3.5 oc…
Why OpenTelemetry Won Three years ago, the observability landscape was fragmented. Jaeger for tracing, Prometheus for metrics, Fluentd for logs, each …
TL;DR Ingero Fleet v0.10 FOSS is live. We validated the full pipeline end-to-end on two 3-node Lambda Cloud clusters: 3x A100 SXM4 (x86_64) and 3x GH2…
AI agents are distributed systems. They fan out across LLM calls, tool invocations, memory lookups, and multi-step reasoning loops — often asynchronou…
Book: RAG Pocket Guide Also by me: LLM Observability Pocket Guide My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code an…
You Don't Need Datadog (Yet) I see startups spending $5,000/month on Datadog with 8 engineers. That's $625 per engineer per month for monitoring. At t…
In an earlier post I argued that event-driven agents reduce scope, cost, and decision dispersion because they narrow the decision space before the mod…
A single slow GPU – a straggler – in a 1,000-node training cluster idles 999 healthy GPUs at every AllReduce barrier. The job does not crash. There is…
OpenTelemetry eBPF Instrumentation (OBI) — The Complete Guide: KubeCon EU 2026 Beta Launch, Zero-Code Observability, and the 1.0 GA Roadmap Published …
The Day Prometheus Fell Over Prometheus memory usage spiked from 8GB to 32GB overnight. OOM-killed. Monitoring was down for 20 minutes while we scramb…
*Originally published on cubeapm.com As organizations adopt cloud-native architectures, Kubernetes, and microservices, systems have become more distri…
Утро: 3,8 ТБ памяти на кластеры Prometheus. Вечер: 0,6 ТБ. Между ними — переход на Deckhouse Prom++. Мы потратили месяцы на внимательный ана…
TL;DR APM = metrics + traces + logs — Use all three together. Auto-instrument first — Agents cover HTTP, DB, queues. Add custom tags ( order_id , cust…
Book: Observability for LLM Applications — paperback and hardcover on Amazon · Ebook from Apr 22 Also by me: Thinking in Go (2-book series) — Complete…
Book: Observability for LLM Applications — paperback and hardcover on Amazon · Ebook from Apr 22 My project: Hermes IDE | GitHub — an IDE for develope…
Крупная логистическая компания. Многолетний архив — договоры с перевозчиками, регламенты, переписка по инцидентам. Менеджер хочет узнать: были ли у на…
Enterprise buyers treat a public status surface as a signal of operational maturity—not marketing polish. This guide covers what to publish, how to st…
Evaluation is price-setting. Observation is reading. Get the entry point wrong and wherever you arrive, you end up back at evaluation. Why Start With …
Why event-driven agents reduce scope, cost, and decision dispersion Most agent systems do not control their costs because they spend tokens letting th…
The State of Observability 2026 report is out — here's what 407 DevOps engineers and SREs actually told us. Let's be honest. Most of us are juggling m…
Part 3 of a series on building a metrics pipeline into ClickHouse Read Part 2: Understanding Vector Pipelines Where Things Got Real By this point, the…
In the original Eval Gap post , we laid out the problem: the distance between "works in demo" and "works in production" kills AI products. Four mechan…
TL;DR** — The EU AI Act Article 12 deadline for high-risk AI logging is August 2, 2026. Singapore's IMDA Agentic AI Framework is already in force (Jan…
How Observability Engineering Cut Incident Response Time by 85% in Production Part 1 of 3: Structured Logs and Correlation IDs Part of a three-part se…
For many teams, Let’s Encrypt expiry reminder emails were a quiet but important safety net. When those reminders stopped, something subtle changed: Ce…
Text Generation Inference (TGI) has a very specific energy. It is not the newest kid in the inference street, but it is the one that already learned h…
When something goes wrong in my applications, logging is almost always the first tool I reach for. I'll throw a few log statements at the start and en…
Implementing Visual Audit Trails for LLM Agents in Production — A Step-by-Step Guide Your LLM agent is live in production. It's handling 500+ customer…