← bsozudogru.com

Deep Research Digest

Project · 2025

Academic search is three separate databases with three different APIs, three different schemas, and no unified way to search them from an AI assistant. Deep Research Digest federates arXiv, PubMed, and Semantic Scholar behind a single MCP interface. Every search grows a local corpus that makes the next search better.


Published on PyPI as research-papers-mcp. Install with pip install research-papers-mcp, add to any MCP client config, and start querying papers from within Claude, Cursor, or any MCP-compatible tool. Covers arXiv (2.4M+ papers), PubMed (36M+ citations), and Semantic Scholar (200M+ records).

The core design principle is a self-growing corpus. Every search fetches results from external APIs and caches them in a local SQLite database. Over time, the cache accumulates papers, enabling richer trend detection, similarity analysis, and literature reviews without re-fetching. No database servers, no API keys required, no background workers. Everything runs in a single process.

The server provides 10 research tools across three categories: discovery (federated search, cached search, paper details), analysis (citation graphs from Semantic Scholar, SPECTER2 semantic similarity with TF-IDF fallback, author profiles), and intelligence (trending topic detection, structured literature review generation, BibTeX export). Supports stdio, SSE, and streamable HTTP transports.


Architecture

Federated search queries three academic APIs in parallel, normalizes results into a shared schema, and stores everything in a local SQLite cache. Downstream tools operate on the cache for trend analysis, similarity computation, and citation traversal.

ENTRY search_papers SOURCES arXiv (2.4M+) PubMed (36M+) Semantic Scholar CACHE SQLite (grows over time) INTELLIGENCE Trends Detection SPECTER2 Similarity S2 Citations

Tools

Ten MCP tools across three categories that AI assistants can call to search, analyze, and synthesize research papers.

Tool Description
Discovery
search_papers Federated search across arXiv, PubMed, and Semantic Scholar
search_cached_papers Fast local search with date, source, and citation filters
get_paper_details Full metadata for any cached paper
Analysis
get_paper_citations Real citation and reference data from Semantic Scholar
find_similar_papers SPECTER2 semantic similarity with TF-IDF fallback
get_author_profile Publication frequency, top topics, and collaborators
Intelligence
get_trending_topics Detect emerging and declining research topics over time
generate_literature_review Structured review with subtopics and consensus analysis
export_bibtex Export papers as BibTeX for LaTeX documents
get_cache_stats Cache size, source breakdown, and date range

Sources

Source Coverage API Key
arXiv 2.4M+ papers, physics/CS/math/biology Not required
PubMed 36M+ biomedical citations Not required
Semantic Scholar 200M+ papers, all fields Optional (higher rate limits)