Architecture — Tech News

EN

You're Not Paying for Compute. You're Paying for Memory Bandwidth

TL;DR— Inference cost conversations obsess over FLOPs and token prices, but the real constraint on LLM serving is memory bandwidth— specifically the c…

ai llm inference mlops

RU

От legacy до промышленной платформы: инженерная эволюция OSA в «Магнит»

Как мы провели проект через четыре «эпохи» — от ручных запусков на Windows‑планировщике до Spark + k8s на масштабе сети …

data engineer bigdata mlops data science osa pyspark облачные вычисления архитектура системы datalake

EN

Can your AI agent actually manage ML infrastructure?

I’ve spent enough time in production environments to know that 'chatting with an AI' is a useless metric if the AI can't touch the actual hardware or …

ai mcp mlops devops

EN

Your Guardrails Are a Firewall. Your Failures Are a Cascade

TL;DR— Most production AI teams build safety layers using the content-moderation mental model: classify input, classify output, block or pass. But the…

ai llm mlops reliability

EN

Three weeks before the enterprise contract, the voice agent wasnt operator-ready.

Three weeks before the enterprise contract, the voice agent wasn't operator-ready Look. We had 99.2% uptime in staging. We had eval coverage on 1,400 …

voiceagents llmproduction mlops ai

EN

Azure Databricks for MLOps and Feature Engineering at Scale with Apache Spark, Delta Lake, and MLflow

Raw data doesn't win model competitions. Features do. And when your raw data is tens of billions of rows sitting across multiple sources, you can't af…

azure databricks spark mlops

EN

What Is an Agent Registry? (And What We Broke Before We Had One)

TL;DR An AI agent registry is a centralized catalog of every agent in your organization — what each agent does, what tools it can access, what version…

ai agents mlops devops

EN

MLOps for LLM: A Case Study on Dresscode

I've recently participated in the Gemma 4 challenge here on DEV.to, but fell short compared to many amazing projects. I really liked LIKAS . I encoura…

mlops llm gemma ai

EN

Channels-last memory format cut our conv backbone latency 22%

TL;DR: Switching our convolutional segmentation backbone to PyTorch's channels-last memory format cut inference latency by about 22% on A100s, with no…

pytorch computervision machinelearning mlops

EN

The SDXL VAE overflow that decoded black images in fp16

TL;DR: The SDXL VAE decoder pushes activations past 65504, the max value fp16 can hold, so the last decode step overflows to inf and you get a fully b…

pytorch computervision machinelearning mlops

EN

AI Workloads Are Reshaping Kubernetes in 2026: GPU Scheduling, MLOps, and the Platform Engineering Reckoning

How GPU scheduling complexity and MLOps integration are forcing platform teams to rearchitect Kubernetes clusters before operational debt becomes insu…

kubernetes gpuscheduling mlops platformengineering

EN

Why multi-agent orchestration is harder than it looks

One AI agent answering a question is useful. Five agents that divide a complex task, pass state to each other, and act on live enterprise systems is a…

ai agents mlops llm

RU

Почему пилоты ИИ не масштабируются? У них нет системы управления

Пилот ИИ может хорошо работать на демонстрации и всё равно быть не готовым к промышленной эксплуатации. В статье разберем, какие элементы управления н…

искусственный интеллект системная инженерия управление проектами управление рисками цифровая трансформация архитектура предприятия ai governance mlops

EN

RLAIF Is Eating RLHF — Here Are the Four Places Human Feedback Still Wins

RLAIF is having a moment. Walk through any alignment paper or vendor pitch from the last six months and you'll see the same claim: replace your human …

ai machinelearning llm mlops

EN

nvidia-smi Reports 97% Utilization While the GPU Sits Idle

TL;DR A GPU shows 97% utilization in nvidia-smi , but training throughput is a fraction of what benchmarks promise. The GPU is not computing; it is wa…

gpu ebpf observability mlops

EN

I Processed 2.4 Billion Tokens Across 52 AI Models for $0.52. Here's the Full Breakdown.

I run a production multi-agent AI system on a single M1 Mac in Jamaica. 6 autonomous agents. 26 cron workflows. 5-layer persistent memory. All contain…

agenticai openrouter mlops costoptimization

EN

I Built a Production RAG System on My M1 Mac for $0

I Built a Production RAG System on My M1 Mac for $0 Most RAG tutorials stop at "it answers questions." But answering questions is table stakes. The re…

rag mlops ai python

RU

Миф о «равных весах»: что на самом деле скрывается внутри малых моделей

Последние годы развитие LLM шло по пути экстенсивного масштабирования: считалось, что чем больше весов и данных, тем умнее модель. В индустрии даже сл…

mlops selectel llm-модели qwen phi-4 mistral gpt-oss deepseek ai ai-агенты

EN

I Built a Complete AI Infrastructure Stack from Scratch — Here's What I Learned

I Built a Complete AI Infrastructure Stack from Scratch — Here's What I Learned Most AI projects start at the top of the stack. You grab an LLM API, w…

distributedsystems mlops cpp go

EN

QAT vs PTQ on our edge vision model: 6 months of A/B data

TL;DR: We ran post-training quantisation (PTQ) and quantisation-aware training (QAT) side by side on the same defect-classification model deployed on …

machinelearning computervision mlops pytorch

EN

Part 2: Enterprise Decision Intelligence Architecture: AI Governance, Threshold Policy Engines, and Operational AI Systems

Part 1 showed how to evaluate binary classification thresholds in Python. This part asks the harder enterprise question: What happens when that thresh…

ai architecture governance mlops

EN

Why 91% of AI Agents Fail in Production (And What the 9% Do Differently)

Everyone is building AI agents right now. Autonomous systems that reason, plan, and act without humans in the loop. Agents that write code, manage wor…

ai mlops systemdesign productionai

EN

Why your diffusion model is slow at batch size 1 (and what actually helps)

TL;DR: Single-image diffusion inference is bottlenecked by kernel launch overhead and attention memory traffic, not raw FLOPs. torch.compile with mode…

machinelearning pytorch computervision mlops

EN

When AI Meets Reality: Why “Hello World” Isn’t Enough for LLM Systems

Most AI tutorials stop at “Hello World.” You wire up a model, send a prompt, get a response, and feel like you’ve built something. But the moment you …

ai architecture llm mlops

EN

What GenAI Actually Costs in Production

The first number anyone quotes when asked what generative AI costs is a per-token figure. It is a comfortable number — small, unambiguous, available o…

llm mlops aiengineering cost

EN

The Missing Engineering Stack for Production AI Agents

The "build an agent in 5 minutes" tutorials get you to a demo. They don't get you to production. Here's the field guide for the four primitives that d…

agentskills promptengineering mcp mlops

RU

Инженерный подход к MLOps: как принципы расчётной механики ложатся в архитектуру AutoML

«Если что-то может пойти не так, это обязательно случится» . Мы не пытаемся предотвратить отказ, мы проектируем систему так, чтобы отказ одного элемен…

система mlops архитектура системы solid

EN

Beyond Monitoring: Building AI-Powered Predictive Observability for Retail Data Pipelines published

Three numbers before we start: Average detection time with traditional monitoring: 4.2 hours Average detection time with predictive observability: 11 …

dataengineering observability mlops dataquality