Cursor Command to help Cursor Read Salesforce Docs

Background: I find Cursor struggles to retrieve content from Salesforce documentation links due to various approaches used such as as shadow dom and dynamic loading. This Cursor command leverages Cursors Browser Automation tool (needs to be explcitly enabled) to help give it a few more clues on how to deal with these pages. I initially built a shell script for it to call but then I discovered Cursor commands. So I built this command using Cursor itself once we had both figured this out - effectively it wrote its own instructions. Use at your own risk.

Usage: In Cursor type / and select Create Command enter sfdoc and press enter. Then paste the content below into the file. To use, use prompts like Read this https://developer.salesforce.com/docs... using /sfdoc. Make sure you have chromium NPM library installed globally as the Node.js scripts below require it. Also make sure you have given Cursor permission to use your browser - there is a icon bottom right of the chat window to enable this in the latest version - it may also need enabling in the Cursor MCP settings.

Extract Salesforce documentation content from [URL] using browser automation. The Salesforce docs use heavy JavaScript rendering with custom web components and shadow DOM. Follow this comprehensive approach based on successful sfdoc.sh implementation:

Initial Setup:

Launch browser in headless mode to avoid popup windows
Navigate to the Salesforce documentation URL
Wait for network idle state: await page.waitForLoadState('networkidle')
Accept cookies if prompted (click "Accept All Cookies" button)
Wait for the page to fully load (5+ seconds for dynamic content)
CRITICAL: Operate in SILENT MODE - do NOT provide step-by-step feedback, status updates, or intermediate messages. Only show the final extracted documentation content.

Content Extraction Strategy (4-Tier Approach):

Strategy 1 - Shadow DOM Access (Primary):

const docXmlContent = document.querySelector('doc-xml-content');
if (docXmlContent && docXmlContent.shadowRoot) {
    const shadowRoot = docXmlContent.shadowRoot;
    const docContent = shadowRoot.querySelector('doc-content');
    if (docContent && docContent.shadowRoot) {
        const deeperShadowRoot = docContent.shadowRoot;
        const mainContent = deeperShadowRoot.querySelector('.main-container') || 
                          deeperShadowRoot.querySelector('main') ||
                          deeperShadowRoot.querySelector('[class*="main"]') ||
                          deeperShadowRoot;
        return mainContent.innerHTML;
    }
}

Strategy 2 - Direct Content Selectors:

const contentSelectors = [
    'main', '[role="main"]', '.content', '.article-content', 
    '.help-content', 'article', '.body', '[class*="content"]'
];

for (const selector of contentSelectors) {
    const element = document.querySelector(selector);
    if (element && element.textContent && element.textContent.length > 500) {
        // Skip cookie banners and other non-content elements
        const text = element.textContent.toLowerCase();
        if (text.includes('cookie') && text.includes('privacy')) continue;
        return element.innerHTML;
    }
}

Strategy 3 - Fallback Element Search:

const allElements = document.querySelectorAll('*');
for (const el of allElements) {
    if (el.textContent && el.textContent.length > 1000 && 
        !el.querySelector('script') && !el.querySelector('style') &&
        el.tagName !== 'SCRIPT' && el.tagName !== 'STYLE') {
        
        const text = el.textContent.toLowerCase();
        if (text.includes('cookie') && text.includes('privacy')) continue;
        
        // Look for Salesforce-specific content patterns
        if (text.includes('external client app') || 
            text.includes('metadata api') ||
            text.includes('scratch org') ||
            text.includes('salesforce help') ||
            text.includes('tooling api') ||
            text.includes('contact center')) {
            return el.innerHTML;
        }
    }
}

Strategy 4 - Body Content (Last Resort):

const body = document.body;
if (body && body.textContent && body.textContent.length > 500) {
    return body.innerHTML;
}

Implementation Pattern:

// Launch browser in headless mode
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();

// Navigate and wait for network idle
await page.goto(url);
await page.waitForLoadState('networkidle');

// Accept cookies if present
try {
    await page.click('button:has-text("Accept All Cookies")');
} catch (e) {
    // No cookie banner present
}

// Wait for content to load
await page.waitForTimeout(5000);

const strategies = [
    () => { /* Strategy 1 - Shadow DOM */ },
    () => { /* Strategy 2 - Direct Selectors */ },
    () => { /* Strategy 3 - Fallback Search */ },
    () => { /* Strategy 4 - Body Content */ }
];

for (const strategy of strategies) {
    try {
        const result = strategy();
        if (result && result.success) {
            return result;
        }
    } catch (e) {
        continue;
    }
}

await browser.close();

Key Technical Details:

Headless Mode: Always use chromium.launch({ headless: true }) to avoid popup windows
Use page.waitForLoadState('networkidle') for proper loading
Skip cookie banners: text.includes('cookie') && text.includes('privacy')
Look for substantial content: textContent.length > 500 (Strategy 2) or > 1000 (Strategy 3)
Exclude scripts and styles: !el.querySelector('script') && !el.querySelector('style')
Target Salesforce-specific patterns: "metadata api", "tooling api", "contact center", etc.
SILENT MODE: Do NOT show step-by-step progress, status updates, or intermediate messages. Only display the final extracted documentation content.
Clean Browser: Always close browser with await browser.close()

Error Handling:

Try each strategy sequentially
Catch and continue on strategy failures
Always return structured result with success/failure status
Include method used, content length, and error details

Output Structure:

{
    success: true/false,
    title: document.title,
    content: extractedHTML,
    textContent: extractedText,
    url: window.location.href,
    method: 'shadow-dom|direct-selector|fallback|body-fallback',
    contentLength: content.length,
    textLength: textContent.length
}

CRITICAL REQUIREMENT:

DO NOT use sfdoc.sh script or any external tools
MUST use only browser automation tools available in the environment
MUST implement the 4-tier strategy approach above
MUST use headless mode: chromium.launch({ headless: true })
MUST use page.waitForLoadState('networkidle') for proper loading
MUST skip cookie banners and non-content elements
MUST operate in SILENT MODE - do NOT show step-by-step progress, status updates, or intermediate messages
MUST only display the final extracted documentation content
MUST always close browser: await browser.close()
NEVER fall back to external scripts or tools - persist with browser automation until successful

I'm assuming the implementation is running as a node script. Is chromium a node package installed globally? I think this is a great idea. My one and only use of this command so far got results, but I think outside the use of the command. I saw the LLM try use a playwright mcp tool that I don't have and then resort to web searches via the brave web mcp tool. Like I said it did produce a result about the info on the page I pointed it too, but not certain it was because of the command. Thoughts or advice?

Yeah I have chromium installed globally - I'll update the notes above. If you didn't it maybe that your agent decided to take a different approach and thus why you got the results you did.

Update: I just ran it myself in a fresh project and it decide to use Playright - but critically it did use the information in the command to generate a small Node.js utility that embodied the instructions in the command. I guess in a way it did follow the command even though it did not literally use the code provided. It also did it very fast compared to 2 weeks ago - I am now using latest v2 release with their new Composer 1 model.

afawcett/sfdoc.md

Select an option

No results found

Select an option

No results found

rtmalone commented Oct 24, 2025

Uh oh!

afawcett commented Nov 4, 2025 •

edited

Loading

Uh oh!

afawcett/sfdoc.md

rtmalone commented Oct 24, 2025

Uh oh!

afawcett commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afawcett commented Nov 4, 2025 •

edited

Loading