Architecture — Tech News

EN

The Same RTX 5090, but the GPU Sat Idle — a CPU-Bound Go Solver and the Case for L2 Cache

This is the second in a short series that benchmarks a single RTX 5090 by re-running published Go solvers — programs that don't just play Go but prove…

cpu gpu benchmark hardware

EN

WebGPU Explained: The Browser’s New Graphics and Compute Engine

A practical introduction to WebGPU, WGSL, render pipelines, compute shaders, and the future of high-performance graphics on the web. Your browser can …

webgpu webdev javascript gpu

EN

GPUs for AI in 2026: NVIDIA, AMD, Intel Compared

The AI hardware landscape has shifted significantly in 2026, with NVIDIA, AMD, and Intel all competing for developers who need GPUs capable of running…

gpu ai nvidia hardware

EN

Linux 7.2 Improves Multi-GPU Displays, M3 Support, Mesa Rusticl Defaults Arm Mali

Linux 7.2 Improves Multi-GPU Displays, M3 Support, Mesa Rusticl Defaults Arm Mali Today's Highlights This week's hardware and driver news highlights i…

gpu nvidia hardware

EN

Does a Second GPU Increase Ollama's Context Window? (Quadro P2000 + RTX 3090 Tested)

TL;DR Short version: no. I dropped a much older GPU ( Quadro P2000, 5GB, Pascal, 2016 ) next to an RTX 3090 (24GB, Ampere) on the same box, ran the sa…

llm ollama vllm gpu

RU

От Triton Inference Server к NVIDIA Dynamo: как изменился inference для агентов в 2026

Привет, Хабр! Меня зовут Александра, я Data Scientist в компании Рафт. В этой статье я разберу NVIDIA Dynamo — новый open‑source фреймв…

llm ai-agent nvidia dynamo gpu ai it-инфраструктура

EN

Resurrecting Kepler: Getting Modern LLMs Running on a GTX 770 (Kernel 7.x)

⚠️ Experimental hack : Use on non-critical systems. Ensure you have backups. This patches a proprietary binary at the instruction level — no warranty,…

cuda linux llm gpu

EN

96% of cuBLAS, no `unsafe`: what cuTile Rust proves

GPU programming usually asks Rust developers to surrender the borrow checker at the launch boundary: references collapse into raw pointers, and aliasi…

cutile rust gpu inference

EN

Blackwell MLPerf Dominance, Intel Nova Lake Compute Runtime, & Weston 16 Vulkan HDR

Blackwell MLPerf Dominance, Intel Nova Lake Compute Runtime, & Weston 16 Vulkan HDR Today's Highlights NVIDIA's Blackwell architecture showcased u…

gpu nvidia hardware

EN

Why You Need to Become a Neuro-Punk Right Now

A short essay on why the developer community should invest as much effort as possible into LLMs that are free from corporations and states. ML researc…

ai llm gpu

EN

nvidia-smi Reports 97% Utilization While the GPU Sits Idle

TL;DR A GPU shows 97% utilization in nvidia-smi , but training throughput is a fraction of what benchmarks promise. The GPU is not computing; it is wa…

gpu ebpf observability mlops

EN

CUDA for AMD Lemonade, Intel Arc Pro Linux Gains, XPU Manager 2.0

CUDA for AMD Lemonade, Intel Arc Pro Linux Gains, XPU Manager 2.0 Today's Highlights Today's top GPU news highlights include AMD's Lemonade SDK gainin…

gpu nvidia hardware

EN

Flash Attention: what it does and why it matters

Flash Attention: what it does and why it matters Your training job is paying for an A100 at $3/hour. The loss is going down, gradients are flowing, an…

llm ai deeplearning gpu

EN

Vortex 3.0 RISC-V GPGPU, Pragtical SDL GPU Backend, NVIDIA RTX Spark Launch

Vortex 3.0 RISC-V GPGPU, Pragtical SDL GPU Backend, NVIDIA RTX Spark Launch Today's Highlights Today's top stories highlight significant advancements …

gpu nvidia hardware

EN

Linux 7.1 Boosts Intel Arc, Flatpak Integrates ROCm, Vintage AMD Driver Refined

Linux 7.1 Boosts Intel Arc, Flatpak Integrates ROCm, Vintage AMD Driver Refined Today's Highlights Recent developments enhance GPU performance and acc…

gpu nvidia hardware

EN

I Tested 9 Serverless GPU Providers for AI Inference in 2026. Here's What I'd Actually Use

TL;DR If you're shipping AI inference and tired of babysitting GPUs, serverless is the way out. You deploy the model, the platform scales it from zero…

ai machinelearning serverless gpu

EN

GPU Incident at 3am: eBPF Tracing from Page to Root Cause in 60 Seconds

TL;DR 3am page: GPU training pipeline missed its SLA. Datadog shows 95% GPU utilization. nvidia-smi agrees. Everything looks green, but the job is 3x …

gpu ebpf observability sre

EN

An AMD GPU Beat My Mac on Llama 8B. The Same GPU Lost on Phi-3.

I wrote a post yesterday about why GPUs barely help small text embeddings at batch=1. Different workload, same machines. This time I ran a local LLM i…

performance benchmarks machinelearning gpu

RU

64 прямоугольника хватит всем

«Студент-программист реализовал на FPGA полноценную игровую приставку с нуля за полтора месяца, не имея опыта цифрового проектирования». Для меня само…

fpga игровая консоль плис Брус-16 микроархитектура аппаратная реализация cpu gpu verilog tang nano 9k

EN

Where Tensor-Parallel Inference Hits the NVLink Wall

Where tensor-parallel inference hits the NVLink wall 2026-05-31 · GPU / distributed systems Tensor parallelism splits each layer across GPUs, so every…

cuda gpu machinelearning performance

EN

AMD Linux 7.2 Graphics & SteamOS VRR Drivers, NVIDIA Vera CPU Benchmarks

AMD Linux 7.2 Graphics & SteamOS VRR Drivers, NVIDIA Vera CPU Benchmarks Today's Highlights This week's top stories feature significant driver upd…

gpu nvidia hardware

EN

How to not Lose $500M via API Bills: Run Private AI for 100 Engineers Under $1 Million

Last week a company nobody can name spent $500 million in a single month on Anthropic's Claude API. Not $500K. Not $5M. Half a billion dollars. In one…

ai gpu startup privacy

EN

5090 vs 4090 for AI Workloads: Buy, Rent, or Validate in the Cloud?

Originally published at https://blog.runc.ai/5090-vs-4090/ . Key Takeaways RTX 5090 is the stronger flagship on paper, especially when your AI workflo…

gpu ai cloud hardware

EN

CUDA 13.3 Lands, AI Writes Blackwell Kernels, & FP4 VRAM Optimization for LLMs

CUDA 13.3 Lands, AI Writes Blackwell Kernels, & FP4 VRAM Optimization for LLMs Today's Highlights NVIDIA releases CUDA Toolkit 13.3, bringing new …

gpu nvidia hardware

RU

[Перевод] Масштабирование LLM: от одного чипа до ЦОДа. Глава 3. Траснформеры

Это продолжение цикла статей о масштабировании тренировки и инференса LLM. Предыдущая статья А теперь перейдем к чему-то более практическому, а именно…

ai ml gpu gpu вычисления трансформеры анализ и проектирование систем

EN

20 Years of GPUs in Numbers: How FLOPS & TDP Grew, and Who Led the NVIDIA vs AMD Race (open dataset, 13.5k GPUs)

We run a GPU spec catalog, and over a couple of years it grew into a database of 13,566 GPUs — from the GeForce 256 (1999) all the way to Blackwell an…

gpu machinelearning hardware datascience

RU

[Перевод] Масштабирование LLM: от одного чипа до ЦОДа. Глава 2. Шардинг

Это продолжение цикла статей о масштабировании тренировки и инференса LLM. Предыдущая глава находится по этой ссылке . Итак, с основами разобрались, д…

ai ml gpu gpu вычисления анализ и проектирование систем

EN

RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains

RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains Today's Highlights NVIDIA's upcoming RTX 5090 cooling solutions are detailed, wh…

gpu nvidia hardware

EN

Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops

I bought an RTX PRO 6000 Blackwell Max-Q. 96GB VRAM, Blackwell architecture, pro workstation GPU. Even as a Max-Q variant, this is an absurdly large p…

gpu ai machinelearning python

EN

Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline

TL;DR Gemma 4 31B expands a single-line idea into a 10-beat structure. HiDream generates 11 images at 2048², LTX-2 A2V/I2V renders 11 clips, Irodori-T…

python ai machinelearning gpu