Architecture — Tech News

EN

Function Calling With a Local LLM to Drive Foundry: Fuzz, Read, Repeat

The first time I let a local model drive Foundry unsupervised, it spent eleven turns trying to fix a fuzz test by renaming the test function. Not chan…

ai solidity testing ollama

EN

Qwen2.5-Coder vs DeepSeek-Coder for Solidity Review: What I Actually See Locally

I keep a folder of ten small Solidity contracts with bugs I planted myself: a classic reentrancy in a withdraw function, a missing access modifier on …

ai solidity security ollama

EN

Ollama Model Loading RCE: Three Years of the Same Bug Class, One Self-Hosted LLM Runtime

TL;DR Ollama has shipped three remote-code-execution vulnerabilities in three years, all at the model-loading boundary. CVE-2024-37032 ("Probllama", W…

rce ollama llmsecurity modelloading

EN

Building an Autonomous Agent on an M1 Mac, by Choice

For about 3 months I've been running an autonomous agent — one that thinks up and writes its own social media posts and comments — unattended, 4 sessi…

discuss ollama llm agents

EN

Does a Second GPU Increase Ollama's Context Window? (Quadro P2000 + RTX 3090 Tested)

TL;DR Short version: no. I dropped a much older GPU ( Quadro P2000, 5GB, Pascal, 2016 ) next to an RTX 3090 (24GB, Ampere) on the same box, ran the sa…

llm ollama vllm gpu

EN

LLM Quantization Levels Compared: Q4_K_M vs Q8_0 vs FP16 [2026]

Originally published at kunalganglani.com — read it there for inline code, hero image, and live links. LLM Quantization Levels Compared: Q4_K_M vs Q8_…

localllm quantization gguf ollama

EN

Running a Whole RAG Agent Offline: LangGraph + Ollama + Embedded Qdrant (Zero API Keys)

Most RAG tutorials open with "set your OPENAI_API_KEY ." This one doesn't need it. In Part 1 I claimed the LLM and embeddings are behind a swappable b…

langchain llm rag ollama

EN

I Built an AI Content Team That Posts to My Blog While I Sleep

I used to write blog posts the old way. Open a blank page. Stare at it. Write something. Rewrite it three times. Publish. Repeat every two weeks when …

ai automation productivity ollama

EN

Hermes-Crew Hybrid: A Hybrid Architecture for Secure Multi-Agent AI Workflows

Hermes-Crew Hybrid: A Hybrid Architecture for Secure Multi-Agent AI Workflows I built a hybrid system that combines a central orchestrator (Hermes) wi…

ai security crewai ollama

EN

Open Notebook Review: Self-Hosted NotebookLM Alternative

Originally published on andrew.ooo — visit the original for any updates, code snippets that aged out, or follow-up posts. TL;DR Open Notebook (by Luis…

opennotebook notebooklmalternative selfhosted ollama

EN

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)

A reader on my last post said Ollama was leaving a lot on the table — that a tuned backend with multi-token prediction (MTP) could roughly double my 3…

ollama llm performance machinelearning

EN

Fitting WhisperX large-v3 + a 24B LLM on one 3090: a reproducible context-capping recipe

This is the technical, reproducible version of a fix I shipped on my own homelab. If you want the narrative version, that's on Medium. This one is the…

homelab ollama localllm devops

EN

Do You Have a Homelab? Secure Your Local LLM Artifacts

We used to build homelabs around Linux servers, Docker containers, and NAS drives. It was about uptime, RAID levels, and monitoring CPU temps. Now, th…

homelab llmsecurity sbom ollama

EN

Local-first: a Model on Your Own Machine, Zero Cloud

This is the concrete, runnable walkthrough for Post 1 of the Portway series . The goal: stand up a single model behind an OpenAI-compatible endpoint o…

ollama python ai llm

EN

I Tried Building a Complex Security Tool with a 1.5B Local Model — Here's What Broke

Problem: I had aider running on Lubuntu, three API keys configured, a detailed architecture diagram, and a clear goal — build a modular forensic data …

ollama aider localai cybersecurity

EN

Tesla P40 in a Homelab: 24GB of Inference on a Budget

The Tesla P40 is a seductive piece of hardware: 24GB of VRAM for a fraction of the cost of a modern RTX card. But after three weeks of fighting with i…

teslap40 nvidia proxmox ollama

EN

Building a Private RAG System: Lessons from a Local-First AI Journal

Most AI apps quietly send your data to the cloud. DiaryGPT does the opposite — and this is the full technical story. The Problem With AI + Private Dat…

ai privacy ollama llm

EN

Using Ollama with the Laravel AI SDK: Run Local LLMs for Free

Originally published at hafiz.dev API costs add up fast during AI development. You prompt an agent 50 times debugging a tool, that's 50 API calls. You…

laravel aisdk aidevelopment ollama

EN

I shipped local LLM features two months ago. Production never ran them once.

This is a submission for the Gemma 4 Challenge: Build with Gemma 4 Two months ago I shipped local-LLM features in TextStack — an open-source reader fo…

devchallenge gemmachallenge gemma ollama

EN

No More Hallucinated Citations: A Domain-Specific RAG System with Ollama, ChromaDB and AI Agents

TL;DR: I built a full-stack knowledge pipeline around a corpus of 2,514 academic PDFs focused on urban art. The system combines ChromaDB vector search…

rag ollama chromadb aiagents

EN

Local LLMs in 2026: What Actually Works on Consumer Hardware

Local LLMs in 2026 work on three hardware lanes: 32-core CPU with 64GB+ RAM hits 10-25 tokens per second on Qwen 3 14B, an RTX 4090 hits 30-80 tokens …

ai localllm ollama qwen

EN

pgvector + Ollama Setup

RAG Without the Chatbot: pgvector + Ollama for Operational Data Most RAG tutorials start with "upload a PDF and ask questions about it." That's fine f…

java langchain4j ollama postgres

EN

[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements Intro Day 3: I'm going to hand a year of credit card statements over to a local …

localllm ai dgxspark ollama

EN

Build a RAG agent with LangChain and Ollama

I started where a lot of us do: a LangChain RAG walkthrough. You chunk some text, embed it, retrieve top‑k chunks, and wire an LLM to answer questions…

python rag langchain ollama

RU

Гефестыч: наш опыт автоматизации Code Review через LLM. «Грабли», решения, код

Привет, Хабр! Меня зовут Данил Чечков, я Team Lead команды High End Meta Backend в «Леста Игры». Мы занимаемся всей web-составляющей «Мира кораблей». …

llm pydantic-ai openwebui llama.cpp ollama rag code review self-hosted atlassian