Web — Tech News

All topics AI agents ai api architecture automation aws beginners career claude database devchallenge devops javascript learning linux llm machinelearning mcp opensource performance productivity programming python react security showdev tutorial typescript webdev

All EN RU

Claude Fable 5 Scores 95% on SWE-bench, Then Hands Off to Opus 4.8

The headline number is 95% on SWE-bench Verified. That's the score attached to Claude Fable 5, Anthropic's new general-access model in the Mythos clas…

anthropic claude benchmarks safety

Cross-Machine Memory Query: About 20 Milliseconds, Most Days

I wrote about hardware benchmarks twice this week. Different problem this time. Same machines. I have a Mac for daily work, a Linux box that runs a fe…

performance benchmarks machinelearning wireguard

An AMD GPU Beat My Mac on Llama 8B. The Same GPU Lost on Phi-3.

I wrote a post yesterday about why GPUs barely help small text embeddings at batch=1. Different workload, same machines. This time I ran a local LLM i…

performance benchmarks machinelearning gpu

Your GPU Probably Isn't Helping Your Retrieval System

Most "just use a GPU" advice is wrong for how anyone actually runs small models. I spent yesterday benchmarking a 33M parameter embedding model across…

performance benchmarks machinelearning gpu

pypdf vs PdfPig: Text Extraction at Scale

Overview PDF text extraction is a common pre-processing step in data pipelines — ingesting research papers, legal documents, or reports before embeddi…

dotnet csharp performance benchmarks

NetworkX vs CSR + TensorPrimitives: PageRank on 28M Edges

Overview PageRank is the canonical graph algorithm. NetworkX implements it in pure Python — its dict-of-dict adjacency representation means every powe…

dotnet csharp performance benchmarks

SurrealDB 3.x by the numbers

Author: Tobie Morgan Hitchcock One engine, multi-workloads, full durability. You can explore the full results, methodology, and per-database breakdown…

surrealdb database benchmarks news

What ground truth caught that unit tests missed: 3 real bugs in 9 flagship lint rules

We added a npm run ilb:flagship:smoke gate to the quality script. It's small: for each flagship rule with a labeled corpus, run the rule against vulne…

staticanalysis eslint testing benchmarks

When Generic Benchmarks Fail: Building a Sales-Domain Evaluation Bench from Scratch

By Natnael Alemseged The gap that τ²-Bench retail cannot measure Tenacious is a B2B sales automation company. Its agent produces outreach emails for c…

machinelearning llm benchmarks ai