Skip to content

Instantly share code, notes, and snippets.

@sithu
Created May 28, 2025 21:34
Show Gist options
  • Save sithu/024b21354c0ce7098d8364c50ea856c0 to your computer and use it in GitHub Desktop.
Save sithu/024b21354c0ce7098d8364c50ea856c0 to your computer and use it in GitHub Desktop.
"Grab the top 5 rows from the Tableau dashboard on our analytics page."

To use a browser with Google Agent Developer Kit (ADK) to scrape a Tableau report, you’ll typically need to combine:

  • Google ADK (to receive and interpret the natural language intent),
  • A headless browser (like Puppeteer or Playwright),
  • And possibly an MCP server or tool plugin to expose scraping functionality to the agent.

✅ Example Workflow: Google ADK + Puppeteer + MCP Server

🎯 Goal:

Natural language command like:

"Grab the top 5 rows from the Tableau dashboard on our analytics page."


🧱 Architecture Overview

[User NL Input]
     ↓
[Google ADK Agent]
     ↓
[MCP Tool: scrape_tableau]
     ↓
[Browser Automation (e.g., Puppeteer)]
     ↓
[Extracted Data → Returned as JSON or text]

🧪 Step-by-Step Implementation

1. Set Up Your MCP Tool Endpoint (scrape_tableau.py)

from fastapi import FastAPI, Request
from pydantic import BaseModel
import subprocess
import json

app = FastAPI()

class ScrapeRequest(BaseModel):
    url: str
    table_selector: str  # optional CSS selector or ID

@app.post("/tool/scrape_tableau")
async def scrape_tableau(req: ScrapeRequest):
    result = subprocess.run(
        ["node", "puppeteer_scraper.js", req.url, req.table_selector],
        capture_output=True, text=True
    )
    return json.loads(result.stdout)

2. Write Puppeteer Scraper (puppeteer_scraper.js)

const puppeteer = require("puppeteer");

const [,, url, selector] = process.argv;

(async () => {
  const browser = await puppeteer.launch({ headless: "new" });
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: "networkidle2" });

  // Example: extract table rows
  const data = await page.evaluate((sel) => {
    const table = document.querySelector(sel || "table");
    const rows = Array.from(table?.rows || []);
    return rows.slice(0, 5).map(row =>
      Array.from(row.cells).map(cell => cell.innerText.trim())
    );
  }, selector);

  await browser.close();
  console.log(JSON.stringify({ result: data }));
})();

3. Call This Tool from Google ADK

You’ll create an action schema for the agent to invoke the scrape_tableau tool:

{
  "name": "scrape_tableau",
  "description": "Scrapes a Tableau report from a public dashboard URL",
  "parameters": {
    "url": {
      "type": "string",
      "description": "The URL of the Tableau dashboard"
    },
    "table_selector": {
      "type": "string",
      "description": "Optional CSS selector of the table to scrape"
    }
  }
}

In your Google ADK agent configuration, register this as a callable tool.


🧪 Example Prompt to Google ADK Agent:

"Scrape the Tableau dashboard from https://mycompany.com/analytics/sales. Get the first table."

→ Agent invokes:

{
  "tool": "scrape_tableau",
  "parameters": {
    "url": "https://mycompany.com/analytics/sales",
    "table_selector": "#vizContainer table"
  }
}

✅ Output:

The endpoint returns:

{
  "result": [
    ["Region", "Sales", "Growth"],
    ["North", "$50,000", "+12%"],
    ["South", "$45,000", "+8%"],
    ...
  ]
}

🧠 Pro Tips:

  • For secure or private dashboards, use browser automation with auth cookies or login flows.
  • You can swap Puppeteer with Playwright for more stable cross-browser automation.
  • Use MCP tool routing via cline, langserve, or custom ADK server integration.

Let me know if you want this wrapped into a working repo or integrated with Claude or Copilot agents!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment