Anthropic Upgrades Claude AI Web Search Tools With 11% Accuracy Boost

Anthropic Upgrades Claude AI Web Search Tools With 11% Accuracy Boost




Caroline Bishop
Feb 17, 2026 18:34

Claude’s new dynamic filtering feature cuts input tokens by 24% while improving search accuracy. Opus 4.6 hits 61.6% on BrowseComp benchmark.



Anthropic Upgrades Claude AI Web Search Tools With 11% Accuracy Boost

Anthropic has rolled out a significant upgrade to Claude’s web search capabilities, with the AI assistant now writing and executing code on the fly to filter search results before processing them. The improvement delivers an average 11% accuracy gain while consuming 24% fewer input tokens, according to the company’s internal benchmarks.

The update, released alongside Claude Opus 4.6 and Sonnet 4.6, addresses a persistent challenge in AI-powered web search: context window bloat. Traditional search tools pull entire HTML files into memory, much of it irrelevant noise that degrades response quality and burns through tokens.

How Dynamic Filtering Works

Rather than reasoning over raw HTML dumps, Claude now dynamically generates code to post-process query results. The system keeps relevant data and discards the rest before anything hits the context window. Think of it as the AI building its own custom search scraper in real-time.

Anthropic tested the approach on two industry benchmarks. On BrowseComp—which measures an agent’s ability to hunt down deliberately hard-to-find information across multiple websites—Opus 4.6 jumped from 45.3% to 61.6% accuracy. Sonnet 4.6 climbed from 33.3% to 46.6%.

DeepsearchQA, which tests systematic multi-step research with many correct answers, showed similar gains. Opus 4.6’s F1 score rose from 69.8% to 77.3%, while Sonnet 4.6 improved from 52.6% to 59.4%.

Real-World Validation

Quora’s Poe platform, which serves millions of users across 200+ AI models, has already tested the upgrade internally. “The model behaves like an actual researcher, writing Python to parse, filter, and cross-reference results rather than reasoning over raw HTML in context,” said Gareth Jones, the company’s Product and Research Lead. Quora found Opus 4.6 with dynamic filtering achieved the highest accuracy against other frontier models on their internal evaluations.

Token Economics Get Complicated

Cost implications vary by use case. Price-weighted tokens decreased for Sonnet 4.6 across both benchmarks, but actually increased for Opus 4.6—the more powerful model sometimes writes more complex filtering code. Anthropic recommends developers benchmark against their specific query patterns before deployment.

Dynamic filtering ships enabled by default for the new web search and web fetch tools on the Claude API. The company also graduated several related tools to general availability: code execution sandboxes, persistent memory across conversations, programmatic tool calling, and dynamic tool discovery.

For developers building search-heavy applications—think research assistants, citation verification tools, or competitive intelligence bots—the upgrade could meaningfully cut operational costs while improving output quality. The API documentation is live now on Claude’s developer platform.

Image source: Shutterstock




Source link

Share:

Facebook
Twitter
Pinterest
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

Most Popular

Social Media

Get The Latest Updates

Subscribe To Our Weekly Newsletter

No spam, notifications only about new products, updates.

Categories