Boosting Observability in NestJS with RedisX Metrics
Observability isn't just a buzzword; it's a necessity, especially when diving into distributed systems. If you're using NestJS, you might want to take…
Latest DevOps news from Tech News
Observability isn't just a buzzword; it's a necessity, especially when diving into distributed systems. If you're using NestJS, you might want to take…
My detector passed every synthetic test with zero false positives. Then I pointed it at one real trace and found a crack. This is the honest version o…
TL;DR A GPU shows 97% utilization in nvidia-smi , but training throughput is a fraction of what benchmarks promise. The GPU is not computing; it is wa…
Here's a five-line function. It calls an LLM, logs the answer, returns it. async function ask ( question : string ) { const res = await openai . respo…
Большинство команд следят за серверами, базами данных и доступностью приложений. Но самые дорогие инциденты часто происходят совсем в другом месте. Ис…
Your agent demo took an afternoon. The reason it isn't in production nine months later has nothing to do with the model. I've watched this play out at…
Vector is a high-performance observability data pipeline from Datadog that collects, transforms, and routes logs, metrics, and traces across heterogen…
A production-focused redesign of a Stage 6 LGTM observability platform, moving from a single-service Anvila monitoring setup to a reusable, secure, hi…
What it really costs when nobody can say exactly what happened inside your critical systems, and why that cost never appears as a line item. Series · …
** How I instrument ASR, LLM, TTS, and the client with OpenTelemetry, and which number in each layer I actually look at ** TL;DR. A voice agent is fou…
Disclosure: This article may later include affiliate links or service CTAs. Recommendations are based on workflow fit, not commissions. LLM Spend Audi…
When should I reach for a log, a trace, or a metric? I hit that question constantly when I instrument code, and I watch coding agents hit it too. It s…
Over half year ago I’ve open sourced klag, a lightweight Kafka consumer lag exporter. I’ve been working in data streaming systems and data infrastruct…
TL;DR 3am page: GPU training pipeline missed its SLA. Datadog shows 95% GPU utilization. nvidia-smi agrees. Everything looks green, but the job is 3x …
Команда VK Cloud перевела статью для тех, кто разбирает инциденты в Kubernetes с помощью kubectl debug. Автор разбирает незаметный пробел в данных: по…
Эта статья о проблемах, с которыми сталкивается инженер при попытке объединить зоопарк старого оборудования с современным подходом к его мониторингу. …
TL;DR. If your LLM bill is one line item on a cloud invoice, you cannot answer "which team spent that." We fixed this by tagging every gateway span wi…
1. Введение Всем привет! Меня зовут Яблоков Олег, я — ведущий инженер ИТ-отдела Navio и отвечаю за систему мониторинга основной инфраструктуры …
Рассказываем про инструмент для наблюдения за кастомизациями Битрикс24 — телеметрическую инфраструктуру на базе OpenTelemetry Collector. Для проектов …
Harness Base Definition: The Control System Outside the Model Previously, we split Agent into several minimal parts: Model: judge the next step Loop: …
Я фронтенд-разработчик. Работаю в Bay Area, в компании, которая выдаёт всем инженерам корпоративные подписки на Claude Code и Cursor. То есть лично из…
I have been asking agent builders what they want in a run receipt after an AI agent finishes a task. The answers were better than my original schema. …
TL;DR We built a serverless Lambda pipeline that ships FSx for ONTAP audit logs to Sumo Logic's JP (Tokyo) region deployment. For Japanese enterprises…
TL;DR We built a serverless Lambda pipeline that ships FSx for ONTAP audit logs to Dynatrace via the Log Ingest API v2. The real value: Dynatrace's Da…
TL;DR We built a serverless pipeline that ships FSx for ONTAP audit logs to Honeycomb, where its high-cardinality query engine turns file access data …
OpenTelemetry has quietly become table stakes. That's a good thing, but if you've instrumented a real codebase, you know the tax. A method that does o…
После инцидента команда почти всегда хочет видеть больше: добавить поле в лог, сохранить еще одну метку, оставить дашборд «на всякий случай». В момент…
Когда Impala-запрос начинает выполняться заметно дольше обычного, первое место, куда обычно идут смотреть, — query profile. Там есть план выполнения, …
If you’ve spent any time modernizing a Java-based microservices architecture recently, you’ve likely hit the "Observability Wall." The ecosystem is dr…
В прошлой статье мы разбирали kubectl describe pod : как читать вывод, в котором Kubernetes уже часто сам написал причину проблемы — в Events, Conditi…