Tech News — Latest News

All EN RU

Running a 1.5B-Parameter LLM Entirely On-Device for Mental Health — The NilaMind Architecture

How I built a mental health companion that never connects to the internet, and why the most important safety decisions have nothing to do with the AI.…

llamacpp opensource android

llama-bench skipped FA on capable GPUs — b9437 corrects it

What flipped in b9437 Build b9437 , published on May 30, 2026 at 20:56 UTC , ships two targeted default-value corrections to llama-bench . Flash atten…

llamacpp llm gguf flashattention

How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026)

You already know what --n-gpu-layers does. It moves transformer layers onto your GPU. This post is the next step: how to actually pick the number. If …

localllm llamacpp gpu vram

How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio

Originally published at deepu.tech . In my release post for LlamaStash I made a claim I need to back up. The wrapper adds zero overhead vs running lla…

ai llamacpp benchmark llm

Benchmarking the Claude Agent SDK on a local LLM: Haiku and Sonnet tier performance

The Claude Agent SDK exposes three budget tiers ( haiku , sonnet , opus ) and reads its routing target from environment variables on every call. That …

llm claude llamacpp benchmark

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

I tested Speculative decoding (Multi-Token Prediction, MTP) performance in Qwen 3.6 27B and 35B on an RTX 4080 with 16 GB VRAM. For a broader view of …

selfhosting llm ai llamacpp

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

From the Best GPU for LLM archive. The canonical version has interactive calculators, an up-to-date GPU comparison table, and live pricing. Three tool…

ollama llamacpp vllm comparison

Discontinued Optane Local LLM Powers a Kimi K2.5 Desktop Run

A user on r/LocalLLaMA reported on May 12 that an Optane local LLM desktop build ran Moonshot’s Kimi K2.5 at about 4 tokens per second using discontin…

intel optane kimik25 llamacpp