Web — Tech News

All topics AI agents ai api architecture automation aws beginners career claude database devchallenge devops javascript learning linux llm machinelearning mcp opensource performance productivity programming python react security showdev tutorial typescript webdev

All EN RU

I built an open-source alternative to Microsoft's KAITO that works on ANY Kubernetes cluster

Six months ago, my team needed to deploy DeepSeek-R1 for internal use. We have a Kubernetes cluster — like everyone does in 2026 — so I started lookin…

kubernetes vllm devops opensource

Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5%

Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5% Your chatbot deploys 70B Llama …

llm ai infrastructure vllm

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break You just deployed a 70B Llama fine-tune on 8x H100s, and your serv…

llm ai vllm performance

Как я разогнал Qwen3.6-27B до 73 токен/с в llama.cpp: параметры, которые реально работают

Локальные LLM сейчас — это действительно мощный инструмент. Они уже вплотную приблизились к проприетарным моделям вроде Claude, особенно в задачах код…

LLM llama.cpp javascript AI vllm

Как мы построили корпоративную LLM-платформу: архитектура, грабли и выводы

Обычно внедрение AI в компаниях происходит по такому сценарию: собрали одного ассистента, показали руководству, получили аплодисменты. Потом второго, …

ai llm openwebui langflow langfuse litellm vllm openai

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

From the Best GPU for LLM archive. The canonical version has interactive calculators, an up-to-date GPU comparison table, and live pricing. Three tool…

ollama llamacpp vllm comparison

DGX Spark на 256K контексте: тестирую конфигурации vLLM, реальные замеры и почему NVFP4 в mainline сломан

NVIDIA продаёт спарку с лозунгом «один петафлоп на FP4». Я купил коробку, поставил vLLM, запустил инференс и получил 40 токенов в …

vllm dgx spark gb10 blackwell nvfp4 llm инференс локальный ии

Как я собрал на DGX Spark приватный AI-сервер, и теперь рассказываю, что туда вошло

У меня на столе стоит небольшая золотистая коробочка размером чуть больше Mac mini. Внутри — приватный AI-сервер: чат с локальной 26B-моделью, поисков…

dgx spark gb10 arm64 vllm dify ragflow rag llm