Tech News — Latest News

All EN RU

Your model speed benchmark is measuring the wrong thing

Model speed is not a property of the model. It is a property of the model plus your payload size plus your output format plus whether you're constrain…

ai llm discuss benchmarking

Google Said It Had Native Function Calling. I Tested It.

Google released Gemma 4 E4B with a specific claim: native function calling. "Enhanced coding and agentic capabilities," the model card said. "Native f…

ai agents localai benchmarking

We Tested 10 Untested LLMs on Agent Coding — The Results Are In

We Tested 10 Untested LLMs on Agent Coding — The Results Are In Yesterday I promised to benchmark 10 LLMs that have never been tested on real agent co…

ai llm programming benchmarking

Why I spun my benchmark into its own repo (and why every dev tool with a benchmark should)

This week I shipped a benchmark for code-intelligence MCP servers and posted the results — including the cases where my own tool lost. Within 36 hours…

opensource benchmarking devtools ai