Skip to content

Instantly share code, notes, and snippets.

@theabbie
Created October 13, 2025 08:18
Show Gist options
  • Select an option

  • Save theabbie/d3f3e55882b2028fbfc5ba2323265d53 to your computer and use it in GitHub Desktop.

Select an option

Save theabbie/d3f3e55882b2028fbfc5ba2323265d53 to your computer and use it in GitHub Desktop.
MCP server enabling AI assistants to access VPN-protected internal documentation through an authenticated browser session.

MCP Authenticated Browser

A Model Context Protocol (MCP) server that allows AI assistants to access VPN-protected or authentication-required documentation through a persistent browser session.

Problem

Companies have internal documentation (Confluence, wikis, internal sites) behind VPNs and authentication that AI tools cannot access. This prevents AI assistants from leveraging your company's domain knowledge and internal documentation when helping with work tasks.

Traditional approaches fail because:

  • VPN connections require network-level access that AI tools don't have
  • Authentication flows (SSO, SAML, OAuth) need manual intervention
  • Session cookies and tokens must persist across requests
  • Many internal sites detect and block headless browsers

Solution

This MCP server runs a visible Chrome instance that you authenticate once. The browser session persists, and AI tools can fetch content through it using standard MCP tools. You can intervene in the browser at any time if re-authentication is needed.

How It Works

  1. MCP server launches a visible Chrome browser with a persistent profile
  2. You manually authenticate (VPN, SSO, login forms, etc.)
  3. AI tools call MCP tools to fetch content through your authenticated session
  4. Browser session persists across requests and server restarts
  5. Content is automatically cleaned (removes nav, headers, footers)

Installation

Clone or download this repository, then install dependencies:

cd /path/to/this/directory
npm install

Configuration

Claude Desktop

Edit the Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

Add this server configuration:

{
  "mcpServers": {
    "authenticated-browser": {
      "command": "node",
      "args": ["/absolute/path/to/this/directory/index.js"]
    }
  }
}

Replace /absolute/path/to/this/directory/ with the actual full path where you cloned this repo.

Example:

{
  "mcpServers": {
    "authenticated-browser": {
      "command": "node",
      "args": ["/Users/username/Desktop/Puppeteer/index.js"]
    }
  }
}

After editing the config:

  1. Save the file
  2. Restart Claude Desktop completely (quit and reopen)
  3. The server will be available for Claude to use

Windsurf

In Windsurf, go to Settings and search for "MCP" or "MCP Servers".

Add this configuration:

{
  "authenticated-browser": {
    "command": "node",
    "args": ["/Users/username/Desktop/Puppeteer/index.js"]
  }
}

Replace with your actual path. After adding, restart Windsurf completely.

Cursor

Add to Cursor's MCP configuration. Location varies by version:

  • Settings > Features > MCP Servers
  • Or edit .cursor/mcp.json in your home directory

Configuration format is the same as above.

Other MCP Clients

Any MCP-compatible client can use this server. Add it using the stdio transport with the command:

node /absolute/path/to/index.js

Available Tools

The server provides three tools that AI assistants can call:

fetch_page_content

Fetches and extracts clean content from a webpage.

Parameters:

  • url (string, required): URL to fetch
  • format (string, optional): Output format, either "text" or "markdown". Default is "text"

Behavior:

  • Automatically removes navigation, headers, footers, sidebars
  • Intelligently finds main content area
  • Returns clean text or markdown

get_page_links

Extracts links from a webpage, focused on content area.

Parameters:

  • url (string, required): URL to extract links from
  • filter (string, optional): Filter links containing this text in URL or link text

Behavior:

  • Only extracts links from main content area (ignores nav/sidebar)
  • Resolves relative URLs to absolute URLs
  • Returns array of {text, href, title}

search_page

Searches for text within a webpage.

Parameters:

  • url (string, required): URL to search
  • query (string, required): Text to search for
  • caseSensitive (boolean, optional): Case-sensitive search. Default is false

Behavior:

  • Searches page content (excluding nav/headers)
  • Returns up to 10 matches with surrounding context
  • Each match includes position and 200 characters of context

Usage

Once configured, use natural language with your AI assistant:

"Fetch content from https://internal-docs.company.com/api-guide"

"Get all links from https://confluence.company.com/page that contain 'setup'"

"Search for 'authentication' in https://internal-wiki.company.com/security"

The AI assistant will automatically detect internal documentation URLs and use the appropriate tools. You can also explicitly mention "internal docs" or "company documentation" to trigger tool usage.

First Run

On the first request to fetch internal documentation:

  1. You'll see a message: "Launching Chrome browser... Please authenticate in the browser window"
  2. Chrome will open in a visible window (maximized)
  3. The browser will navigate to the URL you requested
  4. Complete any authentication required (VPN, SSO, login forms)
  5. The session is automatically saved to chrome-session/ directory
  6. Content will be fetched and returned to the AI assistant

On subsequent requests:

  • Browser reuses the saved session (no re-authentication needed)
  • Session persists even if you restart the MCP server
  • You can manually intervene in the browser window anytime
  • If session expires, just re-authenticate in the same browser window

Session Management

Browser data is stored in chrome-session/ directory:

  • Cookies
  • Local storage
  • Authentication tokens
  • Cache

To reset the session (force re-authentication):

rm -rf chrome-session/

Troubleshooting

Browser doesn't launch

  • Check that the path in your MCP config is correct and absolute
  • Ensure Node.js is installed and in your PATH
  • Check Claude Desktop logs for errors

Authentication not persisting

  • The chrome-session/ directory must be writable
  • Some sites expire sessions quickly, you may need to re-authenticate
  • Check if the site is clearing cookies on logout

Content not extracted properly

  • The tool tries common selectors for main content
  • Some sites may need custom handling
  • You can manually navigate in the browser to verify the page loads

MCP server not showing up

  • Restart your AI tool completely after config changes
  • Verify the JSON config syntax is valid
  • Check that the path uses forward slashes even on Windows

Technical Details

  • Built with Puppeteer for browser automation
  • Uses Cheerio for HTML parsing and content extraction
  • Turndown for HTML to Markdown conversion
  • MCP SDK for protocol implementation
  • Runs on stdio transport (no network ports)

Limitations

  • Browser must remain open while in use
  • Only one browser instance per server
  • No concurrent requests (requests are serialized)
  • Session expires based on site's session timeout
  • Requires manual authentication intervention
#!/usr/bin/env node
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import {
CallToolRequestSchema,
ListToolsRequestSchema,
} from '@modelcontextprotocol/sdk/types.js';
import puppeteer from 'puppeteer';
import * as cheerio from 'cheerio';
import TurndownService from 'turndown';
import path from 'path';
import { fileURLToPath } from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
let browser = null;
let page = null;
const turndownService = new TurndownService({
headingStyle: 'atx',
codeBlockStyle: 'fenced'
});
async function launchBrowser() {
if (browser) {
return;
}
console.error('\n=== MCP Authenticated Browser ===');
console.error('Launching Chrome browser...');
console.error('IMPORTANT: A visible Chrome window will open.');
console.error('Please authenticate (VPN, SSO, login) in the browser window.');
console.error('The session will be saved and reused for future requests.\n');
browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
userDataDir: path.join(__dirname, 'chrome-session'),
args: [
'--start-maximized',
'--disable-blink-features=AutomationControlled',
'--no-sandbox'
]
});
const pages = await browser.pages();
page = pages[0] || await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
console.error('Browser launched successfully.');
console.error('You can now authenticate in the browser window.\n');
}
async function getPageContent(url, format = 'text') {
if (!browser || !page) {
await launchBrowser();
}
if (url) {
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
}
const html = await page.content();
const $ = cheerio.load(html);
$('script, style, nav, header, footer, .sidebar, #sidebar, .navigation, .menu').remove();
let content;
let mainHtml;
const mainSelectors = [
'main',
'[role="main"]',
'#main-content',
'.main-content',
'#content',
'.content',
'article',
'.article',
'#wiki-content',
'.wiki-content',
'.page-content'
];
for (const sel of mainSelectors) {
if ($(sel).length > 0) {
mainHtml = $(sel).html();
content = $(sel).text().trim();
break;
}
}
if (!content) {
mainHtml = $('body').html();
content = $('body').text().trim();
}
const title = await page.title();
const currentUrl = page.url();
let markdown = null;
if (format === 'markdown' && mainHtml) {
markdown = turndownService.turndown(mainHtml);
}
return {
url: currentUrl,
title: title,
content: content,
markdown: markdown
};
}
async function getPageLinks(url, filter = null) {
if (!browser || !page) {
await launchBrowser();
}
if (url) {
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
}
const html = await page.content();
const $ = cheerio.load(html);
const currentUrl = page.url();
let $scope = $;
const contentSelectors = ['main', '[role="main"]', '#main-content', '.main-content', 'article', '.page-content'];
for (const sel of contentSelectors) {
if ($(sel).length > 0) {
$scope = $(sel);
break;
}
}
const links = [];
$scope.find('a').each((i, elem) => {
const $a = $(elem);
const href = $a.attr('href');
const text = $a.text().trim();
if (href && text) {
let fullUrl = href;
if (href.startsWith('/')) {
const urlObj = new URL(currentUrl);
fullUrl = `${urlObj.protocol}//${urlObj.host}${href}`;
} else if (!href.startsWith('http')) {
try {
fullUrl = new URL(href, currentUrl).href;
} catch (e) {
fullUrl = href;
}
}
links.push({
text: text,
href: fullUrl,
title: $a.attr('title') || ''
});
}
});
let filteredLinks = links;
if (filter) {
const filterLower = filter.toLowerCase();
filteredLinks = links.filter(link =>
link.text.toLowerCase().includes(filterLower) ||
link.href.toLowerCase().includes(filterLower)
);
}
return {
url: currentUrl,
links: filteredLinks,
totalLinks: links.length
};
}
async function searchPage(url, query, caseSensitive = false) {
if (!browser || !page) {
await launchBrowser();
}
if (url) {
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
}
const html = await page.content();
const $ = cheerio.load(html);
$('script, style, nav, header, footer, .sidebar, #sidebar').remove();
const text = $('body').text();
const searchQuery = caseSensitive ? query : query.toLowerCase();
const searchText = caseSensitive ? text : text.toLowerCase();
const matches = [];
let index = 0;
while ((index = searchText.indexOf(searchQuery, index)) !== -1) {
const start = Math.max(0, index - 100);
const end = Math.min(text.length, index + query.length + 100);
const context = text.substring(start, end);
matches.push({
position: index,
context: context.trim()
});
index += query.length;
}
return {
url: page.url(),
query: query,
found: matches.length > 0,
matchCount: matches.length,
matches: matches.slice(0, 10)
};
}
const server = new Server(
{
name: 'mcp-authenticated-browser',
version: '1.0.0',
},
{
capabilities: {
tools: {},
},
}
);
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: [
{
name: 'fetch_page_content',
description: 'Fetch and extract clean content from internal company documentation, VPN-protected pages, or authenticated websites (Confluence, internal wikis, etc.). Use this when the user mentions internal docs, company documentation, or any URL behind authentication. Automatically removes navigation, headers, footers. Returns text or markdown format.',
inputSchema: {
type: 'object',
properties: {
url: {
type: 'string',
description: 'The URL to fetch content from',
},
format: {
type: 'string',
enum: ['text', 'markdown'],
description: 'Output format (default: text)',
default: 'text'
},
},
required: ['url'],
},
},
{
name: 'get_page_links',
description: 'Extract all links from internal documentation pages or authenticated websites. Use this to discover related internal docs or navigate company wikis. Focused on main content area, can filter links by text or URL.',
inputSchema: {
type: 'object',
properties: {
url: {
type: 'string',
description: 'The URL to extract links from',
},
filter: {
type: 'string',
description: 'Optional: filter links by text or URL containing this string',
},
},
required: ['url'],
},
},
{
name: 'search_page',
description: 'Search for specific text within internal documentation or authenticated pages. Use this to find information within company docs, Confluence pages, or internal wikis. Returns matches with surrounding context.',
inputSchema: {
type: 'object',
properties: {
url: {
type: 'string',
description: 'The URL to search within',
},
query: {
type: 'string',
description: 'The text to search for',
},
caseSensitive: {
type: 'boolean',
description: 'Whether search should be case-sensitive (default: false)',
default: false
},
},
required: ['url', 'query'],
},
},
],
};
});
server.setRequestHandler(CallToolRequestSchema, async (request) => {
try {
const { name, arguments: args } = request.params;
switch (name) {
case 'fetch_page_content': {
const result = await getPageContent(args.url, args.format || 'text');
return {
content: [
{
type: 'text',
text: args.format === 'markdown' && result.markdown
? result.markdown
: result.content,
},
],
};
}
case 'get_page_links': {
const result = await getPageLinks(args.url, args.filter);
return {
content: [
{
type: 'text',
text: JSON.stringify(result, null, 2),
},
],
};
}
case 'search_page': {
const result = await searchPage(args.url, args.query, args.caseSensitive || false);
return {
content: [
{
type: 'text',
text: JSON.stringify(result, null, 2),
},
],
};
}
default:
throw new Error(`Unknown tool: ${name}`);
}
} catch (error) {
return {
content: [
{
type: 'text',
text: `Error: ${error.message}`,
},
],
isError: true,
};
}
});
async function runServer() {
const transport = new StdioServerTransport();
await server.connect(transport);
console.error('MCP VPN Browser Server running on stdio');
console.error('Browser will launch on first request');
}
runServer().catch((error) => {
console.error('Server error:', error);
process.exit(1);
});
process.on('SIGINT', async () => {
if (browser) {
await browser.close();
}
process.exit(0);
});
{
"name": "mcp-authenticated-browser",
"version": "1.0.0",
"description": "MCP server for accessing VPN-protected and authenticated documentation",
"type": "module",
"bin": {
"mcp-authenticated-browser": "./index.js"
},
"scripts": {
"start": "node index.js"
},
"dependencies": {
"@modelcontextprotocol/sdk": "^1.0.4",
"puppeteer": "^24.15.0",
"cheerio": "^1.0.0-rc.12",
"turndown": "^7.1.2"
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment