This document outlines several Model Context Protocol (MCP) servers designed to interact with PDF files and integrate with Cursor IDE to create or query a knowledge base from PDF content. Each server is detailed with its capabilities, setup instructions, and use cases.
- Description: Integrates with PyPDF2 for efficient text extraction and information retrieval from PDF documents, suitable for knowledge base applications. Supports both local and URL-based PDFs with standardized JSON output for seamless Cursor integration.
- Capabilities:
- Extracts text from various PDF formats.
- Handles both local and remote PDFs.
- Provides structured output for AI-driven queries.
- Setup in Cursor:
- Create or edit
~/.cursor/mcp.jsonfor global access or.cursor/mcp.jsonin your project directory. - Add the following configuration:
{ "mcpServers": { "pdf-reader": { "command": "docker", "args": ["run", "-i", "--rm", "-v", "/path/to/pdfs:/pdfs", "mcp/pdf-reader"], "disabled": false, "autoApprove": [] } } } - Replace
/path/to/pdfswith the actual path to your PDF files directory. - Restart Cursor or refresh MCP settings (Settings > MCP > Refresh).
- Use commands like
read_local_pdfto extract text for knowledge base queries.
- Create or edit
- Use Case: Ideal for extracting text from documentation or research papers to build a searchable knowledge base.
- Description: A powerful document knowledge base system leveraging PDF processing, vector storage, and semantic search (available on GitHub: hyson666/pdf-rag-mcp-server). Supports uploading, processing, and querying PDFs, with a modern web interface.
- Capabilities:
- Uploads and processes PDFs, extracting and chunking content for vectorization.
- Supports semantic search using a FAISS index.
- Provides a React/Chakra UI web interface and WebSocket updates.
- Setup in Cursor:
- Clone the repository:
git clone https://github.com/hyson666/pdf-rag-mcp-server. - Install dependencies:
uv pip install -r requirements.txt. - Configure environment variables for FAISS index or knowledge base directory.
- Add to
~/.cursor/mcp.json:{ "mcpServers": { "pdf-rag": { "command": "python", "args": ["/path/to/pdf-rag-mcp-server/main.py"], "env": { "KNOWLEDGE_BASES_ROOT_DIR": "/path/to/knowledge_bases", "FAISS_INDEX_PATH": "/path/to/knowledge_bases/.faiss" } } } } - Replace paths with your local directories.
- Start the server and refresh Cursor’s MCP settings.
- Use tools like
retrieve_knowledgefor semantic search.
- Clone the repository:
- Use Case: Perfect for advanced knowledge bases requiring semantic search across large PDF collections, such as technical manuals or academic papers.
- Description: Focuses on text extraction and OCR for PDFs, designed for document analysis and content indexing.
- Capabilities:
- Extracts text and supports OCR for scanned documents.
- Provides tools for content indexing to build a structured knowledge base.
- Setup in Cursor:
- Install:
uv pip install pdf_extraction. - Configure in
~/.cursor/mcp.json:{ "mcpServers": { "pdf_extraction": { "command": "uvx", "args": ["pdf_extraction"] } } } - Restart Cursor or refresh MCP settings.
- Use tools like
extract_pdf_contentto process PDFs for knowledge base integration.
- Install:
- Use Case: Useful for indexing and querying PDF content, especially scanned documents requiring OCR.
- Description: Uses PyMuPDF to extract and visualize form field information from PDFs, suitable for structured data knowledge bases.
- Capabilities:
- Locates and extracts form field data.
- Visualizes form fields for structured data retrieval.
- Setup in Cursor:
- Add to
~/.cursor/mcp.json:{ "mcpServers": { "pdf-forms": { "command": "python", "args": ["/path/to/pdf-forms-mcp-server/main.py"] } } } - Replace the path with the actual server script location.
- Refresh Cursor’s MCP settings.
- Use tools like
extract_form_fieldsfor structured data integration.
- Add to
- Use Case: Best for knowledge bases with structured data from PDF forms, such as legal or application forms.
- Description: Provides semantic search and knowledge graph capabilities for structured repositories, including PDF-derived content (powered by txtai).
- Capabilities:
- Semantic search and knowledge graph creation.
- Processes text extracted from PDFs for advanced querying.
- Setup in Cursor:
- Install:
uv pip install kb-mcp-server. - Configure in
~/.cursor/mcp.json:{ "mcpServers": { "kb-server": { "command": "kb-mcp-server", "args": ["--embeddings", "/path/to/knowledge_base.tar.gz"], "cwd": "/path/to/working/directory" } } } - Replace paths with your directories or archives.
- Start the server and refresh Cursor.
- Use tools like
retrieve_knowledgeto query the knowledge base.
- Install:
- Use Case: Enhances PDF-based knowledge bases with semantic search and knowledge graphs.
Combine PDF RAG MCP Server with Knowledge Base MCP Server for a robust PDF-based knowledge base:
- Why PDF RAG? Excels at processing and vectorizing PDFs for semantic search.
- Why Knowledge Base? Adds semantic search and knowledge graph capabilities.
- Workflow:
- Use PDF RAG to process PDFs into a FAISS index.
- Integrate Knowledge Base MCP Server for semantic searches and knowledge graphs.
- Configure both in Cursor’s
mcp.jsonfor AI-driven queries.
- Install Dependencies:
- Ensure Python 3.10+ and
uv:pip install -U uv. - Install PDF RAG:
uv pip install -r requirements.txt(from cloned repository). - Install Knowledge Base:
uv pip install kb-mcp-server.
- Ensure Python 3.10+ and
- Configure in Cursor:
{ "mcpServers": { "pdf-rag": { "command": "python", "args": ["/path/to/pdf-rag-mcp-server/main.py"], "env": { "KNOWLEDGE_BASES_ROOT_DIR": "/path/to/knowledge_bases", "FAISS_INDEX_PATH": "/path/to/knowledge_bases/.faiss" } }, "kb-server": { "command": "kb-mcp-server", "args": ["--embeddings", "/path/to/knowledge_base.tar.gz"], "cwd": "/path/to/working/directory" } } } - Start Servers:
- Start both servers and refresh Cursor’s MCP settings.
- Query the knowledge base in Cursor’s Agent mode with prompts like: “Search my PDF knowledge base for [topic].”
- API Keys: Use environment variables for sensitive data.
- Local Hosting: Run servers locally with stdio transport.
- Review Tool Calls: Verify tool calls in Cursor before execution.
- Isolated Environments: Use Docker for isolation.
- PDF Complexity: Some servers may struggle with complex PDFs (e.g., scanned documents). Use PDF Extraction for OCR.
- Performance: FAISS index in PDF RAG requires significant memory for large collections.
- Cursor Integration: Manual refreshes may be needed after configuration changes.
The PDF RAG MCP Server and Knowledge Base MCP Server combination offers a comprehensive solution for a PDF-based knowledge base in Cursor, with robust text extraction, semantic search, and knowledge graph capabilities. For simpler needs, use PDF Reader or PDF Extraction. For PDF forms, consider PDF Forms MCP Server.