Deepcrawl: Open-Source Firecrawl Alternative

Overview

Repository: github.com/lumpinif/deepcrawl Documentation: deepcrawl.dev/docs License: MIT (100% open source) Author: Felix Lu (@felixLu) Stars: 318+ | Forks: 35

Deepcrawl is a 100% free, open-source alternative to Firecrawl - an agents-oriented website data context extraction platform designed to make any website data AI-ready.

What It Does

Deepcrawl extracts:

Cleaned markdown of page content
- Hierarchical links tree (agent-favored navigation structure)
- Page metadata that LLMs can digest with minimal token cost

The goal is to reduce context switching and hallucination when feeding web content to AI agents.

Technical Architecture

Core Stack

Component	Technology	Purpose
API Workers	Cloudflare Workers	Edge-first fetching, normalization, link graph generation
Dashboard	Next.js 16 App Router	UI for crawls, API playground, logs, key management
Routing/Middleware	Hono	Lightweight edge-first framework
RPC Layer	oRPC	Contract-first RPC generating OpenAPI + SDK from single source
Auth	better-auth	Enterprise-grade session and API key management
Schemas	Zod v4	Type safety across workers, SDKs, and docs
Database	Drizzle ORM	Supports both Neon PostgreSQL and Cloudflare D1 SQLite
Caching	Upstash Redis + Cloudflare KV	Layered caching for performance/cost balance
UI	shadcn/ui + Tailwind CSS v4	Accessible component system
State Management	nuqs	URL state serialization for shareable configurations
Data Fetching	React Query	Server-side prefetching and cache sync
Build	Turborepo (monorepo)	Shared packages, database modules, scripts
Bundler	tsdown (Rolldown-based)	Rust-powered bundling
Linting	Biome	Unified formatter and linter

Language Breakdown

TypeScript: 94.5%
- MDX: 4.8%
- Other: 0.7%

Core API Endpoints

1. `getMarkdown` (GET /read)

Fast endpoint returning prompt-ready markdown. Ideal for caching snippets or feeding LLM prompts.

curl \
  -H "Authorization: Bearer $DEEPCRAWL_API_KEY" \
    "https://api.deepcrawl.dev/read?url=https://example.com"
    ```
    
    ### 2. `readUrl` (POST /read)
    Full operation with metadata, cleaned HTML, markdown, robots, sitemap data, and metrics in one payload.
    
    ### 3. `extractLinks` (POST /links)
    Configurable crawl endpoint that builds agent-navigable links tree, filters external/media links.
    
    ### 4. `getLinks` (GET /links)
    Quick fetch of extracted links for a page.
    
    ---
    
    ## SDK Usage
    
    ```typescript
    import { DeepcrawlApp } from 'deepcrawl';
    
    const deepcrawl = new DeepcrawlApp({
      apiKey: process.env.DEEPCRAWL_API_KEY as string,
      });
      
      // Simple markdown extraction
      const markdown = await deepcrawl.getMarkdown('https://example.com');
      
      // With options
      const markdown = await deepcrawl.getMarkdown('https://example.com', {
        cacheOptions: { expirationTtl: 3600 },
          cleaningProcessor: 'cheerio-reader', // or 'html-rewriter'
          });
          ```
          
          ---
          
          ## Total Cost of Ownership (TCO) Analysis
          
          ### Deepcrawl Self-Hosted Costs
          
          | Component | Free Tier | Paid Tier |
          |-----------|-----------|-----------|
          | **Software** | $0 (MIT license) | $0 |
          | **Cloudflare Workers** | $0/mo (100k req/day) | $5/mo + $0.30/M requests |
          | **Cloudflare KV** | Included | Included |
          | **Cloudflare R2** | Zero egress | Zero egress |
          | **Vercel (Dashboard)** | $0/mo (Hobby) | $20/mo (Pro) |
          | **Database (D1/Neon)** | Free tier available | Variable |
          
          **Typical TCO: $0/month** for most use cases
          
          ### Firecrawl Comparison (Commercial)
          
          | Plan | Monthly Cost | Credits | Cost/1K pages |
          |------|-------------|---------|---------------|
          | Free | $0 (one-time) | 500 | N/A |
          | Hobby | $16/mo | 3,000 | $5.33 |
          | Standard | $83/mo | 100,000 | $0.83 |
          | Growth | $333/mo | 500,000 | $0.67 |
          
          ### Cost Savings Summary
          
          | Usage Level | Firecrawl Cost | Deepcrawl Cost | Savings |
          |-------------|---------------|----------------|---------|
          | 3,000 pages/mo | $16/mo | $0 | $192/year |
          | 100,000 pages/mo | $83/mo | $0-5/mo | $936-996/year |
          | 500,000 pages/mo | $333/mo | $5-20/mo | $3,756-3,936/year |
          
          ---
          
          ## Key Features & Advantages
          
          ### Performance
          - **5-10x faster** than alternatives on standard HTML parsing
          - No headless browser overhead for simple content
          - Edge-native V8 Workers return responses in milliseconds
          - Dynamic cache controls reduce redundant crawls
          
          ### AI Optimization
          - **Markdown-first output** removes ads, scripts, boilerplate
          - **Token-efficient formatting** cuts prompt costs
          - **Link tree intelligence** maps site topology for agent navigation
          
          ### Developer Experience
          - Fully typed TypeScript SDK
          - Zod schemas plug directly into AI frameworks (ai-sdk, LangChain)
          - OpenAPI and oRPC endpoints work across any runtime
          - Consistent REST API across curl, Python, serverless functions
          
          ### Infrastructure
          - Cloudflare Workers run globally, minimizing latency
          - CDN-backed responses for consistent worldwide performance
          - Automatic retries for flaky sites
          
          ---
          
          ## Deployment Options
          
          ### Option 1: Use Hosted API (Free)
          - Sign up at deepcrawl.dev
          - Get API key from dashboard
          - Use SDK or REST API
          
          ### Option 2: Self-Host (Full Control)
          ```bash
          # Clone repo
          git clone https://github.com/lumpinif/deepcrawl.git
          
          # Deploy dashboard to Vercel
          # Deploy workers to Cloudflare
          # Configure database (D1 or Neon PostgreSQL)
          ```
          
          Full self-hosting guide: [deepcrawl.dev/docs/reference/self-hosting](https://deepcrawl.dev/docs/reference/self-hosting)
          
          ---
          
          ## Integration Examples
          
          ### With Vercel AI SDK
          ```typescript
          import { DeepcrawlApp } from 'deepcrawl';
          import { generateText } from 'ai';
          
          const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });
          
          const markdown = await deepcrawl.getMarkdown('https://example.com');
          const result = await generateText({
            model: yourModel,
              prompt: `Summarize this page:\n\n${markdown}`,
              });
              ```
              
              ### With LangChain
              ```typescript
              const markdown = await deepcrawl.getMarkdown(url);
              const docs = [new Document({ pageContent: markdown })];
              // Use with your LangChain pipeline
              ```
              
              ---
              
              ## Limitations & Considerations
              
               **Active Development Warning:**
               > DO NOT USE DEEPCRAWL IN PRODUCTION RIGHT NOW AS IT IS SUBJECT TO CHANGE AND STILL UNDER RAPID DEVELOPMENT. USE WITH YOUR OWN RISK!
               
               ### Current Limitations
               - APIs and SDKs subject to change
               - Many planned features still in development
               - Early-stage stability
               - Community support (no enterprise SLA)
               
               ### When to Choose Firecrawl Instead
               - Need production SLA guarantees
               - Require enterprise support
               - Don't want to manage infrastructure
               - Need features not yet in Deepcrawl
               
               ---
               
               ## Quick Start
               
               ```bash
               # Install SDK
               npm install deepcrawl
               
               # Set API key
               export DEEPCRAWL_API_KEY="dc_sk_live_your_key"
               ```
               
               ```typescript
               import { DeepcrawlApp } from 'deepcrawl';
               
               const dc = new DeepcrawlApp({
                 apiKey: process.env.DEEPCRAWL_API_KEY
                 });
                 
                 // Get markdown from any URL
                 const content = await dc.getMarkdown('https://news.ycombinator.com');
                 console.log(content);
                 ```
                 
                 ---
                 
                 ## Resources
                 
                 - **GitHub:** https://github.com/lumpinif/deepcrawl
                 - **Documentation:** https://deepcrawl.dev/docs
                 - **API Reference:** https://api.deepcrawl.dev/docs
                 - **Online Playground:** https://deepcrawl.dev/app
                 - **Author Twitter:** [@felixlu1018](https://twitter.com/felixlu1018)
                 
                 ---
                 
                 ## Conclusion
                 
                 Deepcrawl offers a compelling open-source alternative to Firecrawl with:
                 
                  **$0 TCO** for most use cases (vs $16-333/mo for Firecrawl)
                   **Full source code access** and self-hosting capability
                    **Modern tech stack** (Next.js 16, Cloudflare Workers, TypeScript)
                     **AI-optimized output** designed for LLM consumption
                      **Active development** with 318+ stars and growing community
                      
                      The main trade-off is stability - it's explicitly marked as not production-ready. For side projects, development, or cost-sensitive applications, it's an excellent choice. For mission-critical production workloads requiring SLA guarantees, evaluate carefully or wait for v1.0.
                      
                      ---
                      
                      *Research compiled: December 2024*

dwillitzer/deepcrawl-research.md

Select an option

No results found

Select an option

No results found

Deepcrawl: Open-Source Firecrawl Alternative

Overview

What It Does

Technical Architecture

Core Stack

Language Breakdown

Core API Endpoints

1. `getMarkdown` (GET /read)

dwillitzer/deepcrawl-research.md

Deepcrawl: Open-Source Firecrawl Alternative

Overview

What It Does

Technical Architecture

Core Stack

Language Breakdown

Core API Endpoints

1. getMarkdown (GET /read)

1. `getMarkdown` (GET /read)