Deep Research Digest

Project · 2025

Academic search is three separate databases with three different APIs, three different schemas, and no unified way to search them from an AI assistant. Deep Research Digest federates arXiv, PubMed, and Semantic Scholar behind a single MCP interface. Every search grows a local corpus that makes the next search better.

Published on PyPI as research-papers-mcp. Install with pip install research-papers-mcp, add to any MCP client config, and start querying papers from within Claude, Cursor, or any MCP-compatible tool. Covers arXiv (2.4M+ papers), PubMed (36M+ citations), and Semantic Scholar (200M+ records).

The core design principle is a self-growing corpus. Every search fetches results from external APIs and caches them in a local SQLite database. Over time, the cache accumulates papers, enabling richer trend detection, similarity analysis, and literature reviews without re-fetching. No database servers, no API keys required, no background workers. Everything runs in a single process.

The server provides 10 research tools across three categories: discovery (federated search, cached search, paper details), analysis (citation graphs from Semantic Scholar, SPECTER2 semantic similarity with TF-IDF fallback, author profiles), and intelligence (trending topic detection, structured literature review generation, BibTeX export). Supports stdio, SSE, and streamable HTTP transports.

Architecture

Federated search queries three academic APIs in parallel, normalizes results into a shared schema, and stores everything in a local SQLite cache. Downstream tools operate on the cache for trend analysis, similarity computation, and citation traversal.

Tools

Ten MCP tools across three categories that AI assistants can call to search, analyze, and synthesize research papers.

Tool	Description
Discovery
search_papers	Federated search across arXiv, PubMed, and Semantic Scholar
search_cached_papers	Fast local search with date, source, and citation filters
get_paper_details	Full metadata for any cached paper
Analysis
get_paper_citations	Real citation and reference data from Semantic Scholar
find_similar_papers	SPECTER2 semantic similarity with TF-IDF fallback
get_author_profile	Publication frequency, top topics, and collaborators
Intelligence
get_trending_topics	Detect emerging and declining research topics over time
generate_literature_review	Structured review with subtopics and consensus analysis
export_bibtex	Export papers as BibTeX for LaTeX documents
get_cache_stats	Cache size, source breakdown, and date range

Sources

Source	Coverage	API Key
arXiv	2.4M+ papers, physics/CS/math/biology	Not required
PubMed	36M+ biomedical citations	Not required
Semantic Scholar	200M+ papers, all fields	Optional (higher rate limits)