DevOps — Tech News

EN

nvidia-smi Reports 97% Utilization While the GPU Sits Idle

TL;DR A GPU shows 97% utilization in nvidia-smi , but training throughput is a fraction of what benchmarks promise. The GPU is not computing; it is wa…

gpu ebpf observability mlops

RU

[Перевод] GPU-автоскейлинг на Kubernetes с KEDA: создание внешнего скейлера

Если вы запускаете GPU-нагрузки (графические ускорители) на Kubernetes — vLLM, Triton, обучающие задачи или более новые стеки агентного инференса, — в…

kubernetes keda gpu autoscaling external scaler nvml vllm triton helm greenops

EN

CUDA for AMD Lemonade, Intel Arc Pro Linux Gains, XPU Manager 2.0

CUDA for AMD Lemonade, Intel Arc Pro Linux Gains, XPU Manager 2.0 Today's Highlights Today's top GPU news highlights include AMD's Lemonade SDK gainin…

gpu nvidia hardware

EN

G4 Fractional VMs are now available on Google Cloud!

In 2025 Google Cloud added G4 , powered by NVIDIA's RTX PRO 6000 Blackwell Server Edition GPUs to their offering, allowing them to offer hardware not …

gpu googlecloud nvidia infrastructure

EN

Vortex 3.0 RISC-V GPGPU, Pragtical SDL GPU Backend, NVIDIA RTX Spark Launch

Vortex 3.0 RISC-V GPGPU, Pragtical SDL GPU Backend, NVIDIA RTX Spark Launch Today's Highlights Today's top stories highlight significant advancements …

gpu nvidia hardware

EN

GPU_WORKLOAD_MISMATCH: A Novel Security Finding Category for AI Container Workloads

Defensive Publication: GPU_WORKLOAD_MISMATCH A Novel Security Finding Category for AI Container Workloads Author: Carnell Smith, Champtron Systems LLC…

cybersecurity docker ai gpu

EN

Linux 7.1 Boosts Intel Arc, Flatpak Integrates ROCm, Vintage AMD Driver Refined

Linux 7.1 Boosts Intel Arc, Flatpak Integrates ROCm, Vintage AMD Driver Refined Today's Highlights Recent developments enhance GPU performance and acc…

gpu nvidia hardware

EN

I Tested 9 Serverless GPU Providers for AI Inference in 2026. Here's What I'd Actually Use

TL;DR If you're shipping AI inference and tired of babysitting GPUs, serverless is the way out. You deploy the model, the platform scales it from zero…

ai machinelearning serverless gpu

EN

GPU Incident at 3am: eBPF Tracing from Page to Root Cause in 60 Seconds

TL;DR 3am page: GPU training pipeline missed its SLA. Datadog shows 95% GPU utilization. nvidia-smi agrees. Everything looks green, but the job is 3x …

gpu ebpf observability sre

EN

An AMD GPU Beat My Mac on Llama 8B. The Same GPU Lost on Phi-3.

I wrote a post yesterday about why GPUs barely help small text embeddings at batch=1. Different workload, same machines. This time I ran a local LLM i…

performance benchmarks machinelearning gpu

EN

Your GPU Probably Isn't Helping Your Retrieval System

Most "just use a GPU" advice is wrong for how anyone actually runs small models. I spent yesterday benchmarking a 33M parameter embedding model across…

performance benchmarks machinelearning gpu

EN

Best Local AI Models for Each VRAM Tier (4 GB to 80 GB) in 2026

This article was originally published on runaihome.com Every "best local AI model" article skips the question that actually matters: best for what VRA…

localai vram hardware gpu

RU

Из чего состоит GPU-кластер: обзор серверов с L40S, A16 и AMD EPYC на платформе mClouds

Привет, Хабр! Мы облачный провайдер mClouds, и у нас работает GPU-платформа с видеокартами NVIDIA L40S , A16 и другими. Задействуется для задач AI-раз…

сервер цод nvidia gpu видеокарты dell amd epyc облачный сервер виртуализация работа с данными

EN

How to not Lose $500M via API Bills: Run Private AI for 100 Engineers Under $1 Million

Last week a company nobody can name spent $500 million in a single month on Anthropic's Claude API. Not $500K. Not $5M. Half a billion dollars. In one…

ai gpu startup privacy

EN

5090 vs 4090 for AI Workloads: Buy, Rent, or Validate in the Cloud?

Originally published at https://blog.runc.ai/5090-vs-4090/ . Key Takeaways RTX 5090 is the stronger flagship on paper, especially when your AI workflo…

gpu ai cloud hardware

RU

[Перевод] Дезагрегированный инференс LLM в Kubernetes: префилл, декодирование и планирование подов

С ростом сложности рабочих нагрузок инференса больших языковых моделей (LLM) единый монолитный процесс обслуживания упирается в свои пределы. У префил…

vk cloud llm kubernetes inference gpu nvidia дезагрегированный инференс оркестрация автомасштабирование планирование подов

RU

DRAйверы для GPU: как Kubernetes научился выделять устройства через стандартный API

Device Plugin в Kubernetes сводит GPU к счётчику на узле: планировщик видит только количество устройств, но не их профиль, объём памяти или режим шари…

gpu kubernetes deckhouse kubernetes platform ai ml dra machine learning

EN

How to Detect GPU Waste in a Kubernetes Cluster

GPU waste in Kubernetes does not announce itself. Your cluster shows healthy utilization. Your dashboards are green. But 20–40% of your GPU capacity i…

kubernetes gpu mlops devops

EN

Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It)

Last month I was helping a friend debug a training loop that was running at maybe 15% GPU utilization on an A100. Fifteen percent. On a card that cost…

pytorch performance machinelearning gpu

EN

RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix

RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix Today's Highlights Today's top GPU news features detailed un…

gpu nvidia hardware

EN

RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains

RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains Today's Highlights NVIDIA's upcoming RTX 5090 cooling solutions are detailed, wh…

gpu nvidia hardware

EN

Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops

I bought an RTX PRO 6000 Blackwell Max-Q. 96GB VRAM, Blackwell architecture, pro workstation GPU. Even as a Max-Q variant, this is an absurdly large p…

gpu ai machinelearning python

EN

Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline

TL;DR Gemma 4 31B expands a single-line idea into a 10-beat structure. HiDream generates 11 images at 2048², LTX-2 A2V/I2V renders 11 clips, Irodori-T…

python ai machinelearning gpu

EN

Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture

When integrating LTX-2.3 (a 22B audio-to-video model) into a voice roleplay product, I ran straight into a VRAM wall. The classic dead-end: running it…

gpu python machinelearning ai

EN

GPU Bottleneck Analyzer, NVIDIA Rubin VRAM Demands, and Qwen VRAM Optimization

GPU Bottleneck Analyzer, NVIDIA Rubin VRAM Demands, and Qwen VRAM Optimization Today's Highlights This week's top GPU news features a new open-source …

gpu nvidia hardware

EN

Best GPU for Llama 70B in 2026 (48GB+ VRAM Required)

This article was originally published on Best GPU for LLM . The full version with interactive tools, FAQ, and live pricing is on the original site. Qu…

gpu llama 70b vram

EN

What Inference-Platform Benchmark Posts Leave Out

DCGM stops at host-level GPU counters. Kernel-side eBPF adds the per-rank, per-tenant signals platform writeups never publish. TL;DR Cloudflare’s rece…

machinelearning ai gpu performance

EN

RTX 5080 Launched, Rust for CUDA, & LLM GPU Scheduling Deep Dive

RTX 5080 Launched, Rust for CUDA, & LLM GPU Scheduling Deep Dive Today's Highlights This week's top GPU news highlights a new GeForce RTX 5080 var…

gpu nvidia hardware

EN

Which serverless GPU platforms actually have fast cold starts for AI inference — p99, not p50

been testing this properly for a few months because i kept seeing wildly different claims and couldn’t find real data anywhere. specifically for infer…

gpu machinelearning infrastructure devops

EN

DeepSeek-V4-Flash Benchmarks, FlashRT CUDA Runtime, & V100 LLM Performance

DeepSeek-V4-Flash Benchmarks, FlashRT CUDA Runtime, & V100 LLM Performance Today's Highlights This week highlights significant advancements in GPU…

gpu nvidia hardware