Reduce LLM Token Waste in RAG with Markdown
TL;DR Feeding raw HTML to Large Language Models wastes tokens on markup, scripts, and styling. By rendering dynamic web pages in a headless browser an…
Latest Architecture news from Tech News
TL;DR Feeding raw HTML to Large Language Models wastes tokens on markup, scripts, and styling. By rendering dynamic web pages in a headless browser an…
TL;DR — Every .NET RAG project quietly ships a Python sidecar to do one job: chunk documents. I got rid of mine. DocNest .NET is an idiomatic C# / .NE…
Claude LLM Execution Harnesses, RAG Rerank, & Browser-based Edge AI Today's Highlights This week's top stories delve into advanced LLM orchestrati…
Meta: Learn how to eliminate LLM hallucinations in career coaching apps using Agentic Workflows and RAG, as seen in the architecture of CVChatly. The …
Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go P…
Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go P…
AI Agents Level Up Workflows: Terraform MCP, WebMCP, Pinecone Integrations Today's Highlights This week showcases significant advancements in AI agent…
Your LLM has 128K tokens. Your document has 150K words. Something has to give. What do you do? A) Chunk the document into fixed-size pieces and embed …
When people say they are "adding RAG" to a workflow, the conversation often jumps too quickly to infrastructure choices. Should this use a vector data…
Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go P…
Local AI Coding Agents, Secure Production Deployment, and Angular-Specific AI Skills Today's Highlights This week's top stories highlight practical wa…
There is a version of token cost optimization that I do not recommend: cutting token counts by reducing the quality of your system prompt, your retrie…
Introduction Large Language Models (LLMs) such as ChatGPT, Gemini, and Claude are incredibly powerful. They can answer questions, generate code, summa…
AI Agent Security, Open-Source Code Generation, and Frontier Models on Bedrock Today's Highlights This week highlights a new security scanner for AI a…
An AI answer can look clean, confident, and helpful while hiding the exact detail your team will need later: where did this claim come from? For AI Sa…
There is a design assumption baked into almost every vector database and AI memory implementation that sounds reasonable until you watch it grow nodes…
Naive RAG passes the demo and fails the audit. The citation-guard pattern keeps fintech AI honest: retrieve with citations, quote numbers, abstain whe…
Project Documentation: TradeMemory Exploring Memory-Augmented AI for Trading Journaling Tech Stack: MERN + Groq (Qwen-3) + Hindsight Cloud Vector SDK …
I Built a Production RAG System on My M1 Mac for $0 Most RAG tutorials stop at "it answers questions." But answering questions is table stakes. The re…
FreshContext in agent workflows: judgment at the context handoff After sharing FreshContext publicly, a few comments helped me sharpen where it fits. …
We’ve all been there: every year, you get a physical, receive a thick PDF full of blood markers, glance at the "normal range" checkmarks, and toss it …
Benchmarking AI Agents, Gemma 4 On-Device Workflows & AI System Security Today's Highlights This week, we dive into critical aspects of applied AI…
Part 6 of a series on building reliable AI systems In the previous parts of this series, we explored: Testing AI systems Evaluation pipelines RAG eval…
LLM-powered Learning, Handwritten Digit Recognition, and AI Career Guidance Today's Highlights This week's top stories showcase practical AI applicati…
Technical Note #01: Why I Built RAG From Scratch Before Using LangChain Part of the Agentic Finance Beast Technical Notes series Published: June 7, 20…
At 1 a.m., the customer group chat exploded: “Does your customer service bot have only a 7-second memory? I just gave it the order number, and the nex…
Introduction: The Place of Large Models in RAG and Lingering Questions Retrieval-Augmented Generation (RAG) systems extend the information retrieval c…
Dropbox Nova for AI Coding Agents, OpenAI's Codex Sandbox, & Puppeteer MCP Server Today's Highlights This week, we dive into Dropbox's Nova platfo…
A support agent tells a customer their plan is still Enterprise, even though finance downgraded it last week. A coding copilot forgets a repo conventi…
Pattern Defined Precise Definition: Context Compression is an inference pattern that utilizes a specialized "selector" model or a ranker to distill la…