AI & ML — Tech News

EN

Auditing What Your Email Agent Actually Did

Debugging a misbehaving email agent at 2am is a special kind of miserable. Your application logs say the LLM "decided to follow up." Cool — with whom?…

security ai email observability

EN

hosted coding agents make observability a product feature

The laptop was never the interesting part of coding agents. It was just the first convenient runtime. Your laptop has the repository, the shell, the s…

ai agents observability aws

EN

AI For Debugging Production Issues

It's 2:47am. The pager has just gone off for the third time in twenty minutes. Checkout latency is spiking. The error rate on /api/orders is climbing.…

ai observability devops debugging

EN

Boosting Observability in NestJS with RedisX Metrics

Observability isn't just a buzzword; it's a necessity, especially when diving into distributed systems. If you're using NestJS, you might want to take…

observability metrics nestjs redisx

EN

My AI-agent waste detector scored zero false positives. Then I ran it on a real trace.

My detector passed every synthetic test with zero false positives. Then I pointed it at one real trace and found a crack. This is the honest version o…

ai buildinpublic observability llm

EN

nvidia-smi Reports 97% Utilization While the GPU Sits Idle

TL;DR A GPU shows 97% utilization in nvidia-smi , but training throughput is a fraction of what benchmarks promise. The GPU is not computing; it is wa…

gpu ebpf observability mlops

EN

AI Observability: Logs, Prompts, Tool Calls, And Cost

Here's a five-line function. It calls an LLM, logs the answer, returns it. async function ask ( question : string ) { const res = await openai . respo…

observability ai llm opentelemetry

RU

Сервер работает. Продукт — уже нет

Большинство команд следят за серверами, базами данных и доступностью приложений. Но самые дорогие инциденты часто происходят совсем в другом месте. Ис…

мониторинг saas devops observability stripe openai webhook api интеграции надежность

EN

The Reason Your Agent Demo Isn't in Production Has Nothing to Do With the Model

Your agent demo took an afternoon. The reason it isn't in production nine months later has nothing to do with the model. I've watched this play out at…

agents ai observability testing

EN

Deploying Vector High-Performance Observability Data Pipeline on Ubuntu 24.04

Vector is a high-performance observability data pipeline from Datadog that collects, transforms, and routes logs, metrics, and traces across heterogen…

observability docker devops logging

EN

Engineering Design Document: Reusable Observability Platform V2

A production-focused redesign of a Stage 6 LGTM observability platform, moving from a single-service Anvila monitoring setup to a reusable, secure, hi…

devops observability architecture sre

EN

The Invisible Tax Every Oracle-Database Enterprise Pays

What it really costs when nobody can say exactly what happened inside your critical systems, and why that cost never appears as a line item. Series · …

saas observability end2end oracledatabase

EN

The 4-layer voice-agent latency stack, traced with OTel spans

** How I instrument ASR, LLM, TTS, and the client with OpenTelemetry, and which number in each layer I actually look at ** TL;DR. A voice agent is fou…

ai observability voice rust

EN

LLM Spend Audit: The 45-Minute Diagnostic for Startups

Disclosure: This article may later include affiliate links or service CTAs. Recommendations are based on workflow fit, not commissions. LLM Spend Audi…

ai startup llm observability

EN

Errors, traces, logs, metrics: when to reach for what

When should I reach for a log, a trace, or a metric? I hit that question constantly when I instrument code, and I watch coding agents hit it too. It s…

monitoring devops logging observability

EN

klag just got a bunch better — here’s what’s new.

Over half year ago I’ve open sourced klag, a lightweight Kafka consumer lag exporter. I’ve been working in data streaming systems and data infrastruct…

kafka observability mcp

EN

GPU Incident at 3am: eBPF Tracing from Page to Root Cause in 60 Seconds

TL;DR 3am page: GPU training pipeline missed its SLA. Datadog shows 95% GPU utilization. nvidia-smi agrees. Everything looks green, but the job is 3x …

gpu ebpf observability sre

RU

[Перевод] Что kubectl debug вам не показывает: незаметный пробел в данных

Команда VK Cloud перевела статью для тех, кто разбирает инциденты в Kubernetes с помощью kubectl debug. Автор разбирает незаметный пробел в данных: по…

vk cloud kubernetes kubectl ephemeral containers отладка observability SRE DevOps перевод vk tech

EN

Per-project LLM cost attribution with OTel spans: the wiring

TL;DR. If your LLM bill is one line item on a cloud invoice, you cannot answer "which team spent that." We fixed this by tagging every gateway span wi…

devops observability opentelemetry mlops

RU

OTel Collector в кастомизации Битрикс24: подключаем Observability

Рассказываем про инструмент для наблюдения за кастомизациями Битрикс24 — телеметрическую инфраструктуру на базе OpenTelemetry Collector. Для проектов …

open telemetry мониторинг observability логирование трассировка grafana clickhouse docker devops битрикс24

EN

Harness Base Definition: The Control System Outside the Model

Harness Base Definition: The Control System Outside the Model Previously, we split Agent into several minimal parts: Model: judge the next step Loop: …

agents harness agentruntime observability

RU

Как фронтендер из Bay Area, который почти не пишет код руками, сделал на Rust трекер расходов для ИИ-агентов — и зачем

Я фронтенд-разработчик. Работаю в Bay Area, в компании, которая выдаёт всем инженерам корпоративные подписки на Claude Code и Cursor. То есть лично из…

rust claude code cursor ai open source observability мониторинг vibe coding opentelemetry sqlite

EN

Five Fields AI Agent Run Receipts Probably Need

I have been asking agent builders what they want in a run receipt after an AI agent finishes a task. The answers were better than my original schema. …

observability

EN

FSx for ONTAP Audit Logs with Data Residency in your region with Sumo Logic

TL;DR We built a serverless Lambda pipeline that ships FSx for ONTAP audit logs to Sumo Logic's JP (Tokyo) region deployment. For Japanese enterprises…

aws sumologic observability amazonfsxfornetappontap

EN

AI-Powered Root Cause: Correlating File Access with APM via Dynatrace

TL;DR We built a serverless Lambda pipeline that ships FSx for ONTAP audit logs to Dynatrace via the Log Ingest API v2. The real value: Dynatrace's Da…

aws dynatrace observability amazonfsxfornetappontap

EN

High-Cardinality File Access Analysis with Honeycomb + OTel

TL;DR We built a serverless pipeline that ships FSx for ONTAP audit logs to Honeycomb, where its high-cardinality query engine turns file access data …

aws honeycomb observability amazonfsxfornetappontap

EN

Sidemark: Active Telemetry Comments for C#

OpenTelemetry has quietly become table stakes. That's a good thing, but if you've instrumented a real codebase, you know the tax. A method that does o…

dotnet opensource opentelemetry observability

RU

Медленные запросы в Impala: как анализировать profile и не выносить SQL наружу

Когда Impala-запрос начинает выполняться заметно дольше обычного, первое место, куда обычно идут смотреть, — query profile. Там есть план выполнения, …

impala apache impala cloudera cloudera manager hadoop sql query optimization data engineering observability bigdata

EN

Decoding the Observability Pipeline: A Java Architect's Guide to Metrics, Logs, and Traces

If you’ve spent any time modernizing a Java-based microservices architecture recently, you’ve likely hit the "Observability Wall." The ecosystem is dr…

java observability architecture

RU

Как дебажить distroless-контейнер в Kubernetes без shell: ephemeral containers на практике

В прошлой статье мы разбирали kubectl describe pod : как читать вывод, в котором Kubernetes уже часто сам написал причину проблемы — в Events, Conditi…

devops kubernetes observability дебаг девопс distroless ephemeral кубер docker dockerfile