Architecture — Tech News

All EN RU

Running a 1.5B-Parameter LLM Entirely On-Device for Mental Health — The NilaMind Architecture

How I built a mental health companion that never connects to the internet, and why the most important safety decisions have nothing to do with the AI.…

llamacpp opensource android

How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio

Originally published at deepu.tech . In my release post for LlamaStash I made a claim I need to back up. The wrapper adds zero overhead vs running lla…

ai llamacpp benchmark llm

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

I tested Speculative decoding (Multi-Token Prediction, MTP) performance in Qwen 3.6 27B and 35B on an RTX 4080 with 16 GB VRAM. For a broader view of …

selfhosting llm ai llamacpp