Skip to content

Instantly share code, notes, and snippets.

@decagondev
Created June 23, 2026 21:35
Show Gist options
  • Select an option

  • Save decagondev/13745cd4cdde584c8b00338b7eab875c to your computer and use it in GitHub Desktop.

Select an option

Save decagondev/13745cd4cdde584c8b00338b7eab875c to your computer and use it in GitHub Desktop.

LinkedIn Applicant Qualification — Costing Comparison

Scenario: Acquire and qualify 10,000 applicant LinkedIn profiles. Two paths compared: (A) a B2B enrichment API vs (B) the current Cowork + Claude-in-Chrome live-scrape pipeline. Pricing basis: public list prices and reported market figures gathered June 2026 (sources at the end). All figures are estimates to validate against live quotes; the ratio between the two paths is the durable finding, not the exact dollar amounts.


0. Bottom line

Path A — Enrichment API Path B — Cowork + Claude-in-Chrome scrape
Hard cost, 10k profiles ~$50–500 (one-off) ~$2,400–9,600+ (mostly hidden labour + subscriptions)
Wall-clock Hours ~3–6 weeks of throttled, baby-sat runs
Completion confidence High Low — ban-bounded, historically non-completing
LinkedIn ToS exposure None (vendor-sourced) for B2B enrichment APIs Direct violation; account bans are the enforcement
Ongoing weekly (~100/wk) ~$1–10/wk, unattended cron Not viable at that cadence (cooldown overhead dominates)

The headline: the API path is roughly 10–30× cheaper in hard cost and ~100× faster in wall-clock, and it removes the account-ban failure mode entirely. The scrape path's costs are real but mostly invisible — they show up as people's time, churned accounts, and runs that don't finish, rather than as an invoice. That's exactly why it felt free during the Cohort 5 run while actually being the expensive option.


1. Path A — B2B Enrichment API

1.1 What this is (and the important distinction)

There are three categories of "LinkedIn data API," and they are not interchangeable:

  1. Official LinkedIn API (api.linkedin.com, partner-gated). Does not offer cold profile lookup at all. Reserved for ad/recruiting/CRM partners, approval takes weeks-to-months and many use cases are routinely denied, and partnerships run $10k–50k+/yr (some enterprise agreements $50k–300k+/yr). Not a fit for "walk an arbitrary list of 10k applicant profiles." Ruled out.
  2. Unofficial scraping APIs — vendors that run scraping farms under the hood and expose a REST endpoint. Cheap, broad, but the buyer inherits the LinkedIn ToS violation as the data controller. Same legal exposure as doing it yourself, just outsourced. Not recommended.
  3. B2B enrichment APIs — vendors that maintain their own people/company database, refreshed from public sources, opt-in databases, business registrations, and data partnerships, and expose it as a REST API. You send a LinkedIn URL (or name+company), you get back structured JSON: headline, current company, role, location, work history, education, company firmographics. The vendor is not calling LinkedIn, so there's no ToS exposure for you, and there are no account bans. This is the recommended category.

For this use case (input = a LinkedIn URL from your Firebase records, output = structured profile for the rubric engine), category 3 is the match.

1.2 Pricing landscape (June 2026, list prices)

Provider type / example Entry price Effective per-profile Notes
Low-cost enrichment (e.g. Derrick) ~$20/mo for 10,000 credits ~$0.002–0.02 Profile + email/phone/company finder; shared credit pool across API/extension/MCP
Mid enrichment (e.g. Netrows) ~€49/mo for 10,000 credits (~€0.005/call) ~$0.005–0.01 48+ LinkedIn endpoints; per-call billing; profile monitoring add-on
Premium / proprietary fields up to ~$0.50 Heavier contracts, proprietary data; only needed if you want enriched email/phone at high hit-rates
General market range (2026) ~$0.02 (heavy-tier) → $0.50 (premium) Per-record cost band reported across the 2026 landscape

One profile = one credit/call in the typical model. A full role-by-role work history (which is what your YOE + company-tiering rubric needs) comes back in that single call for the profile endpoint, so you generally don't pay per-role. Company firmographics (follower-count proxy) may be a separate endpoint call per unique company — and because applicants share employers, deduping companies keeps that count well below 10k.

1.3 Cost for 10,000 profiles

Line Low estimate High estimate
Profile enrichment (10,000 calls) 10,000 × $0.005 = $50 10,000 × $0.05 = $500
Company firmographics (say ~2,500 unique cos × 1 call) included / ~$12 ~$125
Subscription floor (if not pure PAYG) $20–49/mo $20–49/mo
Engineering integration (one-off) small (REST calls, map JSON → schema) small
Total, one 10k backfill ~$50–110 ~$300–625

Call it ~$50–500 all-in for 10,000 profiles, depending on provider tier and how much email/phone enrichment you want layered on. At the volume-discounted heavy tier it trends toward the low end.

1.4 Throughput & reliability

  • Rate limits are generous (thousands/hour); 10k profiles complete in hours, not days, often a single unattended run.
  • No accounts to warm or burn; no jitter tuning; no Cowork reinstalls.
  • Trade-offs to weigh: (a) data freshness can lag live LinkedIn by weeks/months — fine for pre-CCAT triage where you're sorting leads, not making hires; (b) match/coverage typically 70–95%, so a minority of profiles return partial or no data and fall back to manual or a live check; (c) profile-level fields (name, company, headline) are 90%+ accurate, but email/phone hit-rates are lower (40–70% email, 15–40% phone) if you also want contact enrichment.

1.5 Recurring weekly run (~100/wk)

  • ~100 calls/wk against a $20–49/mo plan you already hold → effectively $1–10/wk, fully cron-able and unattended. This is the steady-state mode the Slack thread wanted ("if we just run it weekly there'll only be ~100 apps").

2. Path B — Cowork + Claude-in-Chrome Live Scrape

2.1 Cost structure (this is where it gets expensive quietly)

Almost none of the cost is a per-profile fee. It's subscriptions, account churn, and — dominantly — human time. Anchored to the Cohort 5 operational log: ~3–5 min/profile, plus thousands of extra company-page loads, batches of ~50, 10-min cooldowns, account restrictions at ~144 profiles/session, multi-day runs, repeated bans, and a multi-hour Cowork reinstall incident.

2.2 Throughput maths for 10,000 profiles

  • Profile reads: 10,000 × ~4 min = ~667 agent-hours of active processing.
  • Company-page loads: the rubric clicks through to each role's company for follower count. At even ~3 roles/profile that's ~30,000 extra page loads (Cohort 5 estimated 3,500–5,800 for ~1,162 → consistent). Caching unique companies cuts this materially but it's still tens of thousands of loads.
  • Throughput ceiling per account: ~12–20 profiles/hour before throttling, and sessions break around ~144 profiles. So each account yields a partial day's work before a cooldown or ban.
  • Realistic wall-clock for 10k: even across multiple accounts/machines with cooldowns and ban recovery, this is a multi-week effort. Cohort 5's ~1,162 first-pass took roughly a week heavily baby-sat; 10k scales that to ~3–6 weeks, and that's assuming the bans don't escalate (they tend to, on reused machines/IPs).

2.3 Cost lines for a 10,000-profile backfill

Line Basis Estimate
Claude Max seats $200/mo (Max 20×) per concurrent worker; 2–4 to parallelise; spanning ~1–1.5 months $400–1,200
LinkedIn accounts Churned through bans; warming/replacement; possible Premium (~$40/mo) on some $100–400 (plus the real cost: account-creation/warming time)
Compute / machines Own PCs — electricity + opportunity cost of tying up 2–4 boxes for weeks $50–200 (direct), higher in opportunity cost
Human babysitting Bans, re-batching, ban recovery, Cowork bugs/reinstalls, monitoring. Cohort 5 implies ~10–20 hr per ~1,000 profiles. For 10k: ~100–200 hr. At a modest $50/hr internal cost $5,000–10,000
Total, one 10k backfill ~$5,500–11,800, and not reliably completing

Even if you value the human time at near-zero (treat it as "free" internal effort, which it isn't), the hard, out-of-pocket lines alone — Max seats + accounts + machines — land at ~$550–1,800 for a single 10k run that takes weeks and may stall. Once you price the labour realistically, it's ~$5.5k–12k.

2.4 Why "more computers / more accounts / more jitter" doesn't fix the cost

The Cohort 5 instinct was to scale horizontally to beat the throttling. But adding workers scales the thing that's expensive (accounts to burn, machines to babysit, human coordination) while only delaying the bans — LinkedIn throttles on per-session/account volume and flags clustered fingerprints/IPs, so a server farm gets detected faster, not slower. The cost curve for Path B bends the wrong way: more scale = more spend, not cheaper unit cost. Path A is the opposite — unit cost falls with volume discounts.

2.5 The hidden line items that don't appear on any invoice

  • Non-completion risk: a run that dies at profile 7,000 of 10,000 has spent most of the cost and delivered a partial result. Schedulability is ~nil.
  • Account loss: banned LinkedIn accounts may belong to real team members, with collateral impact on their actual professional use.
  • Tooling fragility: the Cowork/Claude workspace bug in the log cost hours and recurred; that's an ongoing tax, not a one-off.
  • Compliance/legal exposure: ToS violation + processing scraped personal data. Hard to price, but it's a real tail risk that Path A removes.

3. Side-by-Side at 10,000 Profiles

Dimension Path A — Enrichment API Path B — Live Scrape
Hard out-of-pocket cost $50–500 $550–1,800 (subs/accounts/machines)
Fully-loaded cost (incl. labour) $50–600 $5,500–11,800
Wall-clock to complete Hours 3–6 weeks
Human attention ~1 hr setup, then unattended 100–200 hrs
Completion confidence High Low (ban-bounded)
Data freshness Days–months stale Live
Coverage / match rate 70–95% High if it finishes
LinkedIn ToS exposure None (vendor-sourced) Direct violation
Account-ban risk None High, escalating
Weekly steady-state (~100/wk) $1–10/wk, cron Not viable
Unit cost as volume grows Falls (discounts) Rises (more accounts/babysitting)

4. Cost per qualified candidate (the metric that actually matters)

Raw per-profile cost understates Path A's advantage, because what you care about is cost per usable qualified lead, and Path A finishes while Path B may not.

Assume ~10–15% of profiles end up QUALIFIED (Cohort 5 was tracking in that range): ~1,000–1,500 qualified out of 10,000.

  • Path A: $50–500 ÷ 1,250 qualified = **$0.04–0.40 per qualified candidate.**
  • Path B: $5,500–11,800 ÷ 1,250 qualified = **$4.40–9.44 per qualified candidate** — and only if the run completes.

That's a ~10–200× difference per qualified lead, before counting the weeks of elapsed time, which itself has a cost (slower funnel, leads going cold).


5. Recommendation

For a 10,000-profile backfill, Path A (B2B enrichment API) is the clear choice on every axis except live-freshness: ~10–30× cheaper in hard cost, ~100× faster, no bans, no ToS exposure, and cron-able for the weekly steady state. Reserve live reading (a human, or a small Claude-in-Chrome check) only for the minority of profiles the API can't match (~5–30%), where a person eyeballs a handful rather than the agent grinding through thousands.

Concretely:

  1. Pilot first. Most enrichment vendors offer 100 free credits. Run 100 of your real applicant URLs through 2–3 providers, map the JSON to the §5.2 NormalisedProfile schema, and measure match rate + field completeness against what your rubric needs. Pick on real coverage, not marketing.
  2. Buy the smallest paid tier (~$20–49/mo, ~10k credits) for the first full backfill — that single tier likely covers the whole 10k.
  3. Keep the Acquire stage pluggable (per the systems design) so a provider's legal status or coverage changing is a config swap, not a rebuild — this market does churn, and LinkedIn litigates against scrapers/resellers, so vendor due-diligence + a DPA is a one-time legal task worth doing.
  4. Wire the weekly cron on the same provider for ~$1–10/wk steady state.

Caveat I won't gloss: "use an enrichment API" is not "use any API blindly." Vet the provider's data-sourcing and compliance posture (a real DPA, defensible sourcing), and confirm current legal standing before committing — providers in this space have been sued or shut down. I can pull a current, vetted shortlist with present-day legal status when you want to commit to a provider.


Sources & basis

  • LinkedIn official API access/pricing (partner-gated, $10k–50k+/yr; no cold profile lookup): getphyllo.com, connectsafely.ai, sociavault.com, nubela.co, derrick-app.com (LinkedIn data-extraction category breakdown), all 2026.
  • B2B enrichment per-credit pricing (~$20/mo–€49/mo for 10k credits; ~$0.005–0.50/record band): netrows.com, derrick-app.com, derrick "LinkedIn data extraction guide" (2026 per-record $0.02–0.50; 90%+ profile accuracy, 40–70% email, 15–40% phone hit-rates).
  • Scraping account-ban behaviour (flags within days at ~500 views/hr from one account; constant maintenance): derrick-app.com 2026 guide.
  • Claude Max pricing ($100 Max 5× / $200 Max 20×, monthly-only): claude.com/pricing, support.claude.com, multiple 2026 pricing roundups.
  • Cohort 5 operational data (throughput, bans, batch sizes, reinstall incident): your internal Slack log.

All third-party figures are list prices / reported market estimates as of June 2026 and should be confirmed with live vendor quotes before procurement. Pricing in this space changes frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment