Claude deep research vs Scholar Feed

Scholar Feed vs Claude Code’s built-in deep research

We gave both the same prompt on three CS/AI/ML questions. A blind judge preferred Scholar Feed’s corpus report 3 out of 3 times. In our runs it cost about $0.63 vs $26.60 per question and cited 2.6× more real papers, weighted toward the 2025–2026 frontier. None of this means the built-in deep-research is weak. It fans out to ~100 agents and runs an adversarial fact-check the corpus does not, and it works on any topic. So pick by job: the built-in for breadth and claim-verification on any subject, Scholar Feed for cheap, current, correctly-cited CS/AI/ML literature research. Most days I want both.

What we compared, and how we kept it fair

Claude Code ships a real built-in deep-research command. It fans out to roughly 100 sub-agents, runs parallel web searches, fetches sources, and adversarially verifies claims before it writes anything. That free web pipeline, not some other paper website, is what people actually weigh Scholar Feed against. So we ran a head-to-head where the research tool is the only thing that changes.

Same prompt on both sides. Each question asks the agent to name the established approach, say how the method evolved, and point to the 2025–2026 work that supersedes it, with every claim cited to an arXiv ID.
One tool each. The built-in got web only, no Scholar Feed; Scholar Feed got the corpus only, no web. Same model (Claude Sonnet 4.6), same clean-room setup, fresh run every time.
Blind judging. A separate Claude Opus judge scored both reports per question with the tool labels stripped and the order shuffled, so it never knew which tool wrote which.
We checked the citations. Every cited arXiv ID went up against the live arXiv API and the corpus, so "grounding" is a count and not a vibe.

We picked three questions that move fast: the best optimizer for small-LM pretraining, long-context attention, and KV-cache compression. Fast-moving is where recency and a citation graph earn their keep.

The results (3 CS/AI/ML questions, blind-judged)

Comparison axis	Claude’s deep research	Scholar Feed
Blind-judge preference	Won 0 of 3	Won 3 of 3 (two high-confidence, one medium)
Cost per question	~$26.60	~$0.63 (about 1/40th)
Time per question	~15 min (~100-agent fan-out)	~3.5 min (1 agent, ~15 graph-aware calls)
Distinct real papers cited	38 across the 3	100 across the 3 (~2.6×)
Fabricated arXiv IDs	0 (none)	0 (none)
Citation binding	Occasionally misdates / misbinds an ID	Correct by construction (copied from the tool)
Claim verification	Adversarial 3-vote panel (real strength)	None; repeats a paper’s stated numbers
Topic scope	Any subject (web)	CS/AI/ML papers only (600,000+)

Why the corpus reached a fresher frontier

The recency gap was the clearest pattern. The corpus reports cited about twice as many genuinely recent (2025–2026) papers per question, and the reason is mechanical, not luck. Scholar Feed walks the citation graph, from a canonical paper out to the newer work that cites it. That is how you reach 2026 results a model can’t recall and a keyword search buries. On long-context attention, the blind judge noted the corpus report "identifies the actual frontier shift to natively-trained sparse attention and hybrid linear+softmax architectures," while the web report "omits the hybrid track entirely."

Citations that are right by construction

Neither tool invented fake arXiv IDs. Both came in at zero fabricated across everything we checked. The difference is binding: does the ID match the title and date attached to it? Scholar Feed copies the ID, title, and year straight out of a tool result, so it can’t misbind unless it mistypes. Across 100+ citations it had zero binding errors, and it even volunteered notes like "this paper isn’t indexed in the corpus" instead of bluffing past the gap. The web pipeline rebuilds citations from memory and snippets, and it slipped a few times: it cited one arXiv ID under the wrong paper’s title, and dated a December-2025 paper "late 2024." A corpus tool can’t make those mistakes, because it never reconstructs the citation in the first place.

Where the built-in deep research wins

We’re not going to pretend the built-in is bad, because it isn’t, and two things it does are flat-out better. One is adversarial verification: every built-in report ships a confirmed-versus-killed-claims ledger, and it correctly flagged overstated speedup numbers that the corpus report repeated without checking. Scholar Feed has no equivalent fact-check, and it will echo a paper’s headline figure as fact. The other is reach: it works on any topic, while the corpus only knows CS/AI/ML papers, so for anything outside CS the web tool is the right call and Scholar Feed is the wrong one. The honest read isn’t that ours is better and theirs is bad. The built-in spends ~40× more to brute-force breadth and verify magnitudes, and the corpus reaches a fresher, correctly-cited frontier for far less, because a citation graph maps a research field better than a web index does.

How to read this (the limitations)

This is a small study, not a sweeping benchmark: three questions, one run each, a single model, all blind-judged. We picked the questions in fast-moving CS/AI/ML areas where a citation graph helps most, so on a less paper-dense topic the quality gap would narrow. The cost and citation-accuracy advantages don’t, though; those come from how the tool works, not from which questions we asked. The cost and coverage numbers are mechanical counts. The quality result is one judge’s preference. We trust the takeaway because it doesn’t rest on that quality call alone.

Don’t take our word for it: the harness, every per-question report, the blind-judge verdicts, and the arXiv ID checks are public at github.com/YGao2005/scholar-feed-vs-deep-research. Swap in your own questions and rerun it.

When NOT to use Scholar Feed

Your question isn’t about CS/AI/ML papers. The corpus is 600k+ CS/AI/ML papers and nothing else, so for any other subject, Claude Code’s built-in deep-research (web, any topic) is the right tool and Scholar Feed is the wrong one.
You need every quantitative claim adversarially verified. The built-in’s 3-vote panel is a real strength Scholar Feed doesn’t replicate. If you need the magnitudes pressure-tested, reach for it, or verify the corpus’s numbers yourself.
You want a one-call finished report on a broad topic and cost isn’t a concern. The built-in’s ~100-agent fan-out is built for exactly that.

Frequently asked questions

Is Scholar Feed better than Claude Code’s built-in deep research?

For CS/AI/ML literature research, in our benchmark it was. A blind judge preferred the Scholar Feed corpus report on all three questions, and it cost about 1/40th as much per question while citing more than twice as many recent papers. But "better" only holds inside that scope. The built-in deep-research works on any topic and runs an adversarial fact-check that Scholar Feed does not, so for non-CS subjects, or when you need every magnitude verified, the built-in is the right tool. The two work best together.

How much did each cost per research question in the benchmark?

In our runs, Scholar Feed averaged about $0.63 per question and the built-in deep-research averaged about $26.60, roughly a 40x difference. The built-in costs more because it fans out to around 100 sub-agents with parallel web search, fetching, and a verification panel. Scholar Feed reaches a comparable or deeper answer with a single agent making about 15 citation-graph-aware tool calls.

Why does Scholar Feed find more recent papers?

Because it walks the citation graph. Scholar Feed starts from a canonical paper and follows it forward to the newer work that cites it, which surfaces 2025–2026 papers a language model cannot recall from training and a keyword web search buries. In the benchmark it cited about twice as many genuinely recent papers per question as the web pipeline.

Does this mean Claude Code’s deep research is bad?

No. The built-in deep-research is strong. It searches any topic on the open web and runs a 3-vote adversarial verification that caught overstated claims the corpus report repeated without checking. The benchmark is narrow on purpose: three CS/AI/ML questions, chosen where a curated citation graph has the most to offer. The honest conclusion is that the two tools complement each other, not that one is bad.

Do I need an account or API key to try Scholar Feed?

No. The search and read tools work anonymously at 100 calls per day, which is plenty for a typical research session. A free API key raises that to 1,000 calls per day, and Pro raises it to 10,000. Install with npx scholar-feed-mcp init.

Try it

npx scholar-feed-mcp init

Free anonymous access is 100 calls/day (no account); a free key raises it to 1,000/day. Open source (MIT): scholar-feed-mcp on GitHub.

More setup options on the developers page.

Related comparisons

Scholar Feed vs Semantic Scholar Scholar Feed vs Connected Papers Scholar Feed vs alphaXiv