AI & ML — Tech News

All EN RU

Claude Fable 5 Scores 95% on SWE-bench, Then Hands Off to Opus 4.8

The headline number is 95% on SWE-bench Verified. That's the score attached to Claude Fable 5, Anthropic's new general-access model in the Mythos clas…

anthropic claude benchmarks safety

Cross-Machine Memory Query: About 20 Milliseconds, Most Days

I wrote about hardware benchmarks twice this week. Different problem this time. Same machines. I have a Mac for daily work, a Linux box that runs a fe…

performance benchmarks machinelearning wireguard

An AMD GPU Beat My Mac on Llama 8B. The Same GPU Lost on Phi-3.

I wrote a post yesterday about why GPUs barely help small text embeddings at batch=1. Different workload, same machines. This time I ran a local LLM i…

performance benchmarks machinelearning gpu

Your GPU Probably Isn't Helping Your Retrieval System

Most "just use a GPU" advice is wrong for how anyone actually runs small models. I spent yesterday benchmarking a 33M parameter embedding model across…

performance benchmarks machinelearning gpu

pypdf vs PdfPig: Text Extraction at Scale

Overview PDF text extraction is a common pre-processing step in data pipelines — ingesting research papers, legal documents, or reports before embeddi…

dotnet csharp performance benchmarks

SurrealDB 3.x by the numbers

Author: Tobie Morgan Hitchcock One engine, multi-workloads, full durability. You can explore the full results, methodology, and per-database breakdown…

surrealdb database benchmarks news

What ground truth caught that unit tests missed: 3 real bugs in 9 flagship lint rules

We added a npm run ilb:flagship:smoke gate to the quality script. It's small: for each flagship rule with a labeled corpus, run the rule against vulne…

staticanalysis eslint testing benchmarks

When Generic Benchmarks Fail: Building a Sales-Domain Evaluation Bench from Scratch

By Natnael Alemseged The gap that τ²-Bench retail cannot measure Tenacious is a B2B sales automation company. Its agent produces outreach emails for c…

machinelearning llm benchmarks ai