Boosting Observability in NestJS with RedisX Metrics
Observability isn't just a buzzword; it's a necessity, especially when diving into distributed systems. If you're using NestJS, you might want to take…
Latest Architecture news from Tech News
Observability isn't just a buzzword; it's a necessity, especially when diving into distributed systems. If you're using NestJS, you might want to take…
TL;DR A GPU shows 97% utilization in nvidia-smi , but training throughput is a fraction of what benchmarks promise. The GPU is not computing; it is wa…
Here's a five-line function. It calls an LLM, logs the answer, returns it. async function ask ( question : string ) { const res = await openai . respo…
A production-focused redesign of a Stage 6 LGTM observability platform, moving from a single-service Anvila monitoring setup to a reusable, secure, hi…
When should I reach for a log, a trace, or a metric? I hit that question constantly when I instrument code, and I watch coding agents hit it too. It s…
TL;DR 3am page: GPU training pipeline missed its SLA. Datadog shows 95% GPU utilization. nvidia-smi agrees. Everything looks green, but the job is 3x …
Эта статья о проблемах, с которыми сталкивается инженер при попытке объединить зоопарк старого оборудования с современным подходом к его мониторингу. …
Harness Base Definition: The Control System Outside the Model Previously, we split Agent into several minimal parts: Model: judge the next step Loop: …
TL;DR We built a serverless Lambda pipeline that ships FSx for ONTAP audit logs to Sumo Logic's JP (Tokyo) region deployment. For Japanese enterprises…
TL;DR We built a serverless Lambda pipeline that ships FSx for ONTAP audit logs to Dynatrace via the Log Ingest API v2. The real value: Dynatrace's Da…
TL;DR We built a serverless pipeline that ships FSx for ONTAP audit logs to Honeycomb, where its high-cardinality query engine turns file access data …
If you’ve spent any time modernizing a Java-based microservices architecture recently, you’ve likely hit the "Observability Wall." The ecosystem is dr…
Unlocking Insights with Observability: My Journey with OpenTelemetry As a Full Stack Engineer specializing in DevOps, AI Infrastructure, and Cloud, I'…
Introduction Good forecasts help with capacity planning and quieter alerts. But one traffic spike or memory leak can make any forecast useless. The go…
Микросервис работает, но иногда тормозит — и вы не знаете, где копать. Логи чистые, метрики в норме, а пользователи жалуются.…
Book: LLM Observability Pocket Guide: Picking the Right Tracing & Evals Tools for Your Team Also by me: Thinking in Go (2-book series) — Complete …
Book: LLM Observability Pocket Guide: Picking the Right Tracing & Evals Tools for Your Team Also by me: Thinking in Go (2-book series) — Complete …
Book: LLM Observability Pocket Guide: Picking the Right Tracing & Evals Tools for Your Team Also by me: Thinking in Go (2-book series) — Complete …
Book: LLM Observability Pocket Guide: Picking the Right Tracing & Evals Tools for Your Team Also by me: Thinking in Go (2-book series) — Complete …
Running large language model inference servers in production exposes gaps that neither stock Prometheus dashboards nor the official documentation of v…
Istio 1.30 Deep Dive — Agentgateway, Ambient Multicluster, TrafficExtension API, and 4 CVE Patches (JWKS RSA Leak, XDS Debug Auth) On May 18, 2026, th…
Authored by Mike Neville-O'Neill Let's face it — logging is broken. Not just a little broken, but fundamentally misaligned with the needs of modern en…
Authored by Benoit Gaudin Every second, your CDN is generating thousands of logs that tell a critical story about your application's performance, secu…
Authored by Benoit Gaudin In Part I (Ingestion) and Part II (Storage) of this series, I explored the challenges of designing, running, and managing a …
В инфраструктуре Яндекса работают тысячи микросервисов, которые каждую секунду генерируют миллионы временных рядов — метрик. Это могут быть количества…
When something goes wrong in our systems, the first thing that usually comes to mind is the question "why?" To find the answer, we turn to two main to…
TL;DR: I open-sourced rock288/go-mongo-boilerplate — a Go 1.25 service template that ships the boring production stuff (observability, retry, DLQ, SSR…
TL;DR FSx for ONTAP file access audit logs are usually consumed through EC2-based patterns — mounted audit volumes and agent-based forwarders such as …
Authored by Feargal Karney & Mati Remi The Bronto REST API now exposes everything our own UI is built on. That means you can build a custom interf…
The Problem It's 2 AM. An alert fires. Cart service is throwing errors. You've got five minutes before someone escalates. The runbook says: "Check the…