AI Observability: Logs, Prompts, Tool Calls, And Cost
Here's a five-line function. It calls an LLM, logs the answer, returns it. async function ask ( question : string ) { const res = await openai . respo…
Latest Testing & QA news from Tech News
Here's a five-line function. It calls an LLM, logs the answer, returns it. async function ask ( question : string ) { const res = await openai . respo…
Your agent demo took an afternoon. The reason it isn't in production nine months later has nothing to do with the model. I've watched this play out at…
A production-focused redesign of a Stage 6 LGTM observability platform, moving from a single-service Anvila monitoring setup to a reusable, secure, hi…
Disclosure: This article may later include affiliate links or service CTAs. Recommendations are based on workflow fit, not commissions. LLM Spend Audi…
When should I reach for a log, a trace, or a metric? I hit that question constantly when I instrument code, and I watch coding agents hit it too. It s…
TL;DR 3am page: GPU training pipeline missed its SLA. Datadog shows 95% GPU utilization. nvidia-smi agrees. Everything looks green, but the job is 3x …
Harness Base Definition: The Control System Outside the Model Previously, we split Agent into several minimal parts: Model: judge the next step Loop: …
I have been asking agent builders what they want in a run receipt after an AI agent finishes a task. The answers were better than my original schema. …
TL;DR We built a serverless Lambda pipeline that ships FSx for ONTAP audit logs to Sumo Logic's JP (Tokyo) region deployment. For Japanese enterprises…
TL;DR We built a serverless Lambda pipeline that ships FSx for ONTAP audit logs to Dynatrace via the Log Ingest API v2. The real value: Dynatrace's Da…
TL;DR We built a serverless pipeline that ships FSx for ONTAP audit logs to Honeycomb, where its high-cardinality query engine turns file access data …
В прошлой статье мы разбирали kubectl describe pod : как читать вывод, в котором Kubernetes уже часто сам написал причину проблемы — в Events, Conditi…
TLDR Monitoring AI agents in production requires distributed tracing: a single user request fans out into 10 or more internal operations, and logs alo…
Introduction Good forecasts help with capacity planning and quieter alerts. But one traffic spike or memory leak can make any forecast useless. The go…
Book: LLM Observability Pocket Guide: Picking the Right Tracing & Evals Tools for Your Team Also by me: Thinking in Go (2-book series) — Complete …
Authored by Marco Aquilanti Today we're introducing BrontoScope , one of the Bronto AI Labs initiatives aimed at reducing user toil, increasing team e…
Running large language model inference servers in production exposes gaps that neither stock Prometheus dashboards nor the official documentation of v…
Istio 1.30 Deep Dive — Agentgateway, Ambient Multicluster, TrafficExtension API, and 4 CVE Patches (JWKS RSA Leak, XDS Debug Auth) On May 18, 2026, th…
Authored by Benoit Gaudin Every second, your CDN is generating thousands of logs that tell a critical story about your application's performance, secu…
В инфраструктуре Яндекса работают тысячи микросервисов, которые каждую секунду генерируют миллионы временных рядов — метрик. Это могут быть количества…
When something goes wrong in our systems, the first thing that usually comes to mind is the question "why?" To find the answer, we turn to two main to…
TL;DR: I open-sourced rock288/go-mongo-boilerplate — a Go 1.25 service template that ships the boring production stuff (observability, retry, DLQ, SSR…
TL;DR FSx for ONTAP file access audit logs are usually consumed through EC2-based patterns — mounted audit volumes and agent-based forwarders such as …
Authored by Feargal Karney & Mati Remi The Bronto REST API now exposes everything our own UI is built on. That means you can build a custom interf…
Coding agents produce causal DAGs, not logs I've been building tracing hooks for coding agents — Claude Code, Codex CLI, Copilot, and others. The goal…
Эта статья совсем не технический анализ, а увлекательный рассказ о том, как маленький, но очень перспективный стартап стал топовым приложением, а такж…
In Part 3, we separated signals on purpose: metrics tell you where to look logs and traces tell you what happened audit tells you what can be proven l…
In the Mlops world, people have long used DAGs/graphs or at least the consensus has been that best practice was to use them. With AI and agents, the t…
KubeCon + CloudNativeCon EU 2026 · Amsterdam · March 23–26 More than 13,000 engineers gathering around infrastructure might sound excessive until you …
Three numbers before we start: Average detection time with traditional monitoring: 4.2 hours Average detection time with predictive observability: 11 …