Local-first: a Model on Your Own Machine, Zero Cloud
This is the concrete, runnable walkthrough for Post 1 of the Portway series . The goal: stand up a single model behind an OpenAI-compatible endpoint o…
Tech news from the best sources
This is the concrete, runnable walkthrough for Post 1 of the Portway series . The goal: stand up a single model behind an OpenAI-compatible endpoint o…
Problem: I had aider running on Lubuntu, three API keys configured, a detailed architecture diagram, and a clear goal — build a modular forensic data …
The Tesla P40 is a seductive piece of hardware: 24GB of VRAM for a fraction of the cost of a modern RTX card. But after three weeks of fighting with i…
Most AI apps quietly send your data to the cloud. DiaryGPT does the opposite — and this is the full technical story. The Problem With AI + Private Dat…
"What were our top 10 customers last quarter by revenue, as a bar chart?" DB-GPT translates that to SQL, runs it against your database, and renders th…
Most RAG tools make you choose between simplicity and power. MaxKB doesn't try to be powerful — it tries to be simple, and it nails it. 20K+ GitHub st…
The short version, in case the title was being coy: at num_ctx=2048 , Gemma 4 E2B produces three sequential outputs in a single response — a mostly-ha…
From the Best GPU for LLM archive. The canonical version has interactive calculators, an up-to-date GPU comparison table, and live pricing. Three tool…
Today we are shipping CrawlForge v4.2.2 , our biggest release since launch. It brings three new tools, a standalone command-line interface, and a quie…
Originally published at hafiz.dev API costs add up fast during AI development. You prompt an agent 50 times debugging a tool, that's 50 API calls. You…
This is a submission for the Gemma 4 Challenge: Build with Gemma 4 Two months ago I shipped local-LLM features in TextStack — an open-source reader fo…
If you're running local LLMs through Ollama, finding the right model is annoying. The official model page scrolls forever, capability tags are inconsi…
TL;DR: I built a full-stack knowledge pipeline around a corpus of 2,514 academic PDFs focused on urban art. The system combines ChromaDB vector search…
Local LLMs in 2026 work on three hardware lanes: 32-core CPU with 64GB+ RAM hits 10-25 tokens per second on Qwen 3 14B, an RTX 4090 hits 30-80 tokens …
RAG Without the Chatbot: pgvector + Ollama for Operational Data Most RAG tutorials start with "upload a PDF and ask questions about it." That's fine f…
[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements Intro Day 3: I'm going to hand a year of credit card statements over to a local …
I started where a lot of us do: a LangChain RAG walkthrough. You chunk some text, embed it, retrieve top‑k chunks, and wire an LLM to answer questions…