Claude Code with Local LLMs and ANTHROPIC_BASE_URL: Ollama, LM Studio, llama.cpp, vLLM
Native Anthropic endpoints, tool-call compatibility, and context-window sizing for local Claude Code. Last tested: April 2026. See Changelog at the bo…
Latest Testing & QA news from Tech News
Native Anthropic endpoints, tool-call compatibility, and context-window sizing for local Claude Code. Last tested: April 2026. See Changelog at the bo…
A comprehensive, actionable guide to the principles, techniques, and architecture behind sipeed/picoclaw — written so you can build a similar system f…
В последние 10 лет я стал работать быстрее, но по сути ничего не менялось: я всё так же вручную писал код и тесты. С приходом ИИ я стал искать способы…
LLM-as-Judge is a pattern where one language model evaluates another model's outputs against defined criteria. An automatic quality gate: every respon…
Artificial Intelligence has progressed far beyond its early rule-based origins. What once depended on predefined logic has evolved into systems that c…
TL;DR UCLA Tauric Research released TradingAgents v0.2.4 (2026-04-25) — a LangGraph-based multi-agent LLM framework that mimics a real trading firm wi…
За месяц в одиночку написал production-систему анализа цен и продуктов конкурентов с Claude Code. До этого пятнадцать лет управлял командами и бизнеса…
Book: LLM Observability Pocket Guide Also by me: AI Agents Pocket Guide My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude C…
Недавно я задался вопросом: можно ли организовать полноценный agent dev loop (то есть, цикл разработки агентов), используя только локальные модел…
What I am currently reading These are the current online posts that I enjoyed reading and made me think. AI If you are not the model, you are the harn…
Выбирая LLM для своего первого пет-проекта, я случайно создал бенчмарк для LLM "Испытание Дали" по трем параметрам: качество, скорость и стоимость. Эт…
REPLIES R1 Citation patterns in ChatGPT are not backlink signals in new packaging. Pages that get cited consistently have dense entity co-occurrence —…
An Agent shouldn't be locked to a single LLM provider. Different tasks suit different models — simple questions use cheap models, complex reasoning us…
The Catalyst: One Language, Many Attack Surfaces The comfortable fiction is: “We wrote English rules, so the model is safe.” The truth: LLMs are multi…
GPT-5.5 landed April 23, 2026. I've been in the benchmark data since the moment it dropped — and I need to tell you the number OpenAI didn't put in an…
If coding agents aren't your primary battlefield, "harness engineering" probably feels like a distant concept. Scrolling through a timeline full of ar…
Book: Prompt Engineering Pocket Guide My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: …
Book: RAG Pocket Guide Also by me: LLM Observability Pocket Guide My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code an…
I'm going to be honest with you. Most engineers using AI assistants today are shipping at the same speed as before. They have Cursor. They have Claude…
Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Database Playbook: Choosing the Right Store for Every S…
Fixing your robots.txt and disabling Cloudflare Bot Fight Mode is step one. Most developers stop there and wonder why they still don't appear in ChatG…
A Sunday-morning postmortem on teaching a 3B model to do enterprise IT triage with GRPO. It's 1 AM on a Sunday. The Meta × PyTorch OpenEnv Hackathon s…
When deploying large language models to production, measuring performance accurately is critical. Whether you're using vLLM, SGLang, TensorRT-LLM, or …
Расскажу историю длиною в полгода, на которой прекрасно прочувствовал все прелести современных инструментов и способов эксплуатации llm. Идея до жути …
Introduction "This may be the story of how it all began." —— Andrej Karpathy This is the No.48 article in the "One Open Source Project a Day" series. …
How to Test LLM-Powered Applications Effectively Testing a CRUD app is deterministic. You input X, you expect Y, you assert equality. Testing an LLM-p…
DeepSeek V4: Million-Token Context That Actually Works Most long-context models are benchmarks in search of a use case. DeepSeek V4 flips the script—i…
## Most early LLM apps start the same way: “Let’s just put everything into one prompt and let the model handle it.” So we write a prompt that tries to…
LLM Planning, AI Arguments, and Building Persistent Worlds LLM planning is gaining focus, while new tools are emerging to address agent identity and t…
A reference on why long-running agents fail at depth, the math behind why errors compound, and the architectural patterns that respond to it. title: "…