Deep Research Digest
Academic search is three separate databases with three different APIs, three different schemas, and no unified way to search them from an AI assistant. Deep Research Digest federates arXiv, PubMed, and Semantic Scholar behind a single MCP interface. Every search grows a local corpus that makes the next search better.
Published on PyPI as research-papers-mcp. Install with pip install research-papers-mcp, add to any MCP client config, and start querying papers from within Claude, Cursor, or any MCP-compatible tool. Covers arXiv (2.4M+ papers), PubMed (36M+ citations), and Semantic Scholar (200M+ records).
The core design principle is a self-growing corpus. Every search fetches results from external APIs and caches them in a local SQLite database. Over time, the cache accumulates papers, enabling richer trend detection, similarity analysis, and literature reviews without re-fetching. No database servers, no API keys required, no background workers. Everything runs in a single process.
The server provides 10 research tools across three categories: discovery (federated search, cached search, paper details), analysis (citation graphs from Semantic Scholar, SPECTER2 semantic similarity with TF-IDF fallback, author profiles), and intelligence (trending topic detection, structured literature review generation, BibTeX export). Supports stdio, SSE, and streamable HTTP transports.
Architecture
Federated search queries three academic APIs in parallel, normalizes results into a shared schema, and stores everything in a local SQLite cache. Downstream tools operate on the cache for trend analysis, similarity computation, and citation traversal.
Tools
Ten MCP tools across three categories that AI assistants can call to search, analyze, and synthesize research papers.
| Tool | Description |
|---|---|
| Discovery | |
| search_papers | Federated search across arXiv, PubMed, and Semantic Scholar |
| search_cached_papers | Fast local search with date, source, and citation filters |
| get_paper_details | Full metadata for any cached paper |
| Analysis | |
| get_paper_citations | Real citation and reference data from Semantic Scholar |
| find_similar_papers | SPECTER2 semantic similarity with TF-IDF fallback |
| get_author_profile | Publication frequency, top topics, and collaborators |
| Intelligence | |
| get_trending_topics | Detect emerging and declining research topics over time |
| generate_literature_review | Structured review with subtopics and consensus analysis |
| export_bibtex | Export papers as BibTeX for LaTeX documents |
| get_cache_stats | Cache size, source breakdown, and date range |
Sources
| Source | Coverage | API Key |
|---|---|---|
| arXiv | 2.4M+ papers, physics/CS/math/biology | Not required |
| PubMed | 36M+ biomedical citations | Not required |
| Semantic Scholar | 200M+ papers, all fields | Optional (higher rate limits) |