Fitting WhisperX large-v3 + a 24B LLM on one 3090: a reproducible context-capping recipe
This is the technical, reproducible version of a fix I shipped on my own homelab. If you want the narrative version, that's on Medium. This one is the…
Latest Architecture news from Tech News
This is the technical, reproducible version of a fix I shipped on my own homelab. If you want the narrative version, that's on Medium. This one is the…
[Day 7] Does Giving an AI More "Thinking Time" Really Make It Smarter? Training an OpenMythos-Style Mini Model on DGX Intro Day 7! Reddit kept surfaci…
I run a team of AI agents on a Mac I bought in 2022. They handle my Slack, run research, draft content, monitor infrastructure, and spawn sub-agents f…
Choosing the Right Local AI Stack for SOC Alert Triage: Model, Engine, and Harness Practical guidance for cybersecurity engineers who want local AI to…
Tom Tunguz wrote a post this week called Localmaxxing . His thesis: open-weight models on prosumer hardware now match cloud-tier quality for a sliver …
Local LLMs in 2026 work on three hardware lanes: 32-core CPU with 64GB+ RAM hits 10-25 tokens per second on Qwen 3 14B, an RTX 4090 hits 30-80 tokens …
[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements Intro Day 3: I'm going to hand a year of credit card statements over to a local …
[Day 1] DGX Spark Came Home — I Made It Draw a Cat So... what is "local LLM" again? Honestly, I'm still figuring out what "local LLM" even means. But …