Tech News — Latest News

RU

FinOps на практике. Серия 2: от пилота к регламентам, или как удержать экономию на облаке

Это вторая статья цикла. В первой части мы разбирали, как начать FinOps с точки ноль: посадить финансы и инженеров за один стол, договориться об общей…

finops cloud cost cloud billing cost allocation showback resource tagging cloud infrastructure devops sre kubernetes

EN

Fix Docker Exit Code 137 (OOMKilled): Why It Happens and How to Stop It

Your container died and docker ps -a shows something like Exited (137) 4 minutes ago . Nine times out of ten that's the kernel's OOM killer, not your …

docker devops sre tutorial

EN

Your uptime SLA means nothing when the physical process can't wait for your rollback

There’s a conversation that happens when IT developers first encounter operational technology. It usually goes something like this: “What’s your uptim…

devops sre programming iot

EN

AWS puts gray zone failures into the EKS control loop

What AWS is calling out The New Stack, on July 10, published a walk-through of what running Kubernetes across a very large EKS fleet has taught AWS ab…

aws eks kubernetes sre

EN

🚀 Calling all DevOps, SRE, and Platform Engineers! Let’s build the future of AI for DevOps together.

Over the last few years, I've been exploring AI agents, and one thing became obvious. There are hundreds of AI agents available today, but almost all …

devops ai programming sre

EN

Service Level Objectives for Complex Microservices

Why SLOs Break in Microservices A SLO that works for a monolith often collapses when you distribute the same logic across 30 services. The math of ava…

sre slo microservices reliability

EN

Building a Culture of Reliability: Beyond the SRE Handbook

You Can't Hire Your Way to Reliability I've seen companies hire 5 SREs and expect reliability to magically improve. It doesn't. Reliability is a cultu…

sre culture reliability engineering

EN

The expensive half of your incident bot is the half you didn't build

An incident bot caught the CrashLoopBackOff at 3:12 a.m., proposed delete_pod, and the on-call approved it half asleep at 3:14. The new pod went Runni…

devops sre observability kubernetes

EN

SRE AI Agent Safe Failure Implementation

Building Trustworthy AI Agents in Site Reliability Engineering Site Reliability Engineering is entering a new phase where agentic AI can assist with a…

ai sre

EN

End of week. Here's the thing I kept coming back to:

LinkedIn Draft — Insight (2026-07-10) End of week. Here's the thing I kept coming back to: SLOs work when they create conversations, not when they cre…

observability sre devops platformengineering

EN

Incident Communication: The Status Page That Builds Trust

Silence Destroys Trust During our worst outage, we went 35 minutes without updating the status page. Twitter filled the void. Theories ranged from dat…

incidents communication sre devops

RU

Система растет там, где можно ошибаться. История из минского ИТ-хаба

Привет, Хабр! Я Артем, групп-лид в Т-Банке. Я пришел в Т почти пять лет назад и с тех пор так или иначе всегда работал в домене «Кэшбэки» — одной из в…

sre карьера в it офис

EN

How to Configure Apache as a Reverse Proxy with mod_proxy

The shape of a typical modern enterprise deployment involves Apache serving as a TLS-terminating reverse proxy sitting in front of upstream applicatio…

apache devops sre sysadmin

RU

Книга: «Основы DevOps и Software Delivery. Практика развертывания и сопровождения ПО в продакшене»

Привет, Хаброжители! В то время как в большинстве книг о DevOps лишь поверхностно рассматриваются теория и культура, в этом практическом руководстве о…

devops software delivery sre dns vpc фулстек-разработка

RU

KEDA как финансовый гардрейл: scale-to-zero, лимиты реплик и автоскейлинг по событиям в Kubernetes

Разберем KEDA именно как практический FinOps-гардрейл для Kubernetes: где HPA уже не хватает, как устроен ScaledObject, как безопасно подходить к scal…

keda kubernetes autoscaling finops hpa devops sre cloud-cost prometheus rabbitmq

EN

Docker Containerization Habits That Keep Production Calm

Most of the container incidents I've helped clean up didn't come from anything exotic. They came from small shortcuts that felt reasonable on a Tuesda…

docker devops containers sre

EN

Debugging Containers From the Terminal: A Practical Docker CLI Workflow

A container that's misbehaving is one of those problems where your instinct works against you. The pressure pushes you toward the dramatic move — rest…

docker devops cli sre

EN

Why Your Microservices Need Circuit Breakers (And How to Add Them)

The Cascading Failure That Took Down Everything Our payment service went down for 3 minutes. No big deal, right? Except every service that called paym…

microservices reliability sre devops

EN

How We Built an AI That Never Forgets Production Incidents

How We Built an AI That Never Forgets Production Incidents Can AI become your smartest Site Reliability Engineer? We decided to find out. Every softwa…

ai automation showdev sre

EN

I let an AI handle an outage. It invented a hack that never happened, then spiraled

One evening, a monitoring alert went off: a server behind a web service was down. I handed the incident to an AI coding agent. Half experiment, half l…

ai llm sre incident

EN

SLOs That Product Managers Actually Understand

The SLO Translation Problem You define an SLO: 99.95% availability with p99 latency under 200ms. Engineering loves it. Product managers glaze over. Th…

sre slo product reliability

EN

Something I wish someone had told me five years earlier:

LinkedIn Draft — Insight (2026-07-03) Something I wish someone had told me five years earlier: Distributed tracing: the gap between having it and usin…

observability sre devops platformengineering

EN

I built a production risk scanner in one day, here's what it caught

If you're an SRE or DevOps engineer — try blastradar.vercel.app and tell me what you actually think. The tool BlastRadar scores any code diff for prod…

devops sre programming ai

EN

Self healing and secure. Good combo.

Build software that heals itself in the agentic era Gabe LG Gabe LG Gabe LG Follow Jul 1 Build software that heals itself in the agentic era # ai # ag…

agents ai security sre

EN

Google SRE Review - Cheat Sheet

If you're a software engineer, architect, engineering manager, or platform engineer, I consider the Google SRE Book to be one of the handful of books …

google sre devops

EN

Planning network checks before running them: a local-first workflow pattern

Many operations tasks do not begin as tickets, dashboards, or scripts. They begin as intent. Someone says: Check whether this subnet looks normal. Or:…

sre devops automation aiops

EN

Kubernetes resource requests and limits explained: scheduling, throttling, and OOMKill

This is part of the Platform engineering with Go series: a growing collection of posts on Kubernetes, Go tooling, and infrastructure automation. View …

devops k8s kubernetes sre

EN

Log Management at Scale: How We Cut Costs 70% Without Losing Signal

$12,000/Month for Logs Nobody Reads Our logging bill was $12,000/month. We were ingesting 2TB/day. When I asked the team what percentage of logs they …

logging observability devops sre

EN

The Ultimate Guide to Production-Grade AI Agents

Production-grade AI agents are systems that execute multi-step workflows autonomously while maintaining reliability, security, and observability guara…

agents ai production sre

EN

Blameless Postmortems in Practice

Most teams claim they do blameless postmortems. Then the incident happens. "Jane didn't validate the input." "The on-call missed the alert." "We shoul…

devops management sre