Merges duplicate HubSpot companies into a designated primary record while:
- Keeping all existing data on the primary intact
- Backfilling only empty fields on the primary with data from the duplicate
- Transferring all associated contacts, deals, and activities via HubSpot's merge API
- Writing a full audit trail to a results CSV and log file
- Step 1 — Install Python
- Step 2 — Set up your project folder
- Step 3 — Install the required library
- Step 4 — Set up your HubSpot token
- Step 5 — Prepare your CSV file
- Step 6 — Run the script
- How the merge works
- Three modes in detail
- Common flags
- Output files
- Recommended workflow
- Troubleshooting
Python is the programming language the script is written in. You need version 3.9 or newer.
Open a Terminal window and run:
python3 --version
How to open Terminal:
- Mac: Press
Command + Space, typeTerminal, press Enter- Windows: Press
Windows key, typecmdorPowerShell, press Enter
If you see something like Python 3.11.4, you're good — skip to Step 2.
If you see command not found or a version older than 3.9, follow the install steps below.
- Go to python.org/downloads
- Click the big yellow Download Python button
- Open the downloaded
.pkgfile and follow the installer prompts - When it finishes, close and reopen Terminal, then run
python3 --versionto confirm
- Go to python.org/downloads
- Click the big yellow Download Python button
- Open the downloaded
.exefile - Important: On the first screen, check the box that says "Add Python to PATH" before clicking Install
- Click Install Now and follow the prompts
- When it finishes, close and reopen your command prompt, then run
python3 --versionto confirm
You need a folder on your computer to hold the script and your CSV files. Here's how to create one and navigate to it in Terminal.
You can create the folder the normal way (right-click on your Desktop → New Folder) and name it something like hubspot-dedup. Or do it directly in Terminal:
Mac:
mkdir ~/Desktop/hubspot-dedup
Windows (Command Prompt):
mkdir %USERPROFILE%\Desktop\hubspot-dedup
Save the file dedup_companies.py into that folder. If you received it as an attachment, move it there now. If you're copying it from somewhere, make sure it saves with the .py extension (not .py.txt).
Terminal always has a "current location" — you need to be inside your project folder before running the script.
Mac:
cd ~/Desktop/hubspot-dedup
Windows:
cd %USERPROFILE%\Desktop\hubspot-dedup
What is
cd? It stands for "change directory." Think of it like double-clicking a folder to open it, but in the terminal.
To confirm you're in the right place, run:
ls
(Mac) or
dir
(Windows) — you should see dedup_companies.py listed.
Tip: You can also drag the folder into Terminal after typing
cd(with a space) and it will fill in the path automatically on Mac.
The script uses a Python library called requests to communicate with HubSpot's API. You only need to do this once.
It's best practice to use a virtual environment — a self-contained Python installation just for this project that won't interfere with anything else on your computer.
Mac:
python3 -m venv venv
source venv/bin/activate
Windows:
python3 -m venv venv
venv\Scripts\activate
After activation your terminal prompt will change to show (venv) at the start — that means it worked.
Every time you open a new Terminal window to run this script, you'll need to
cdinto the folder and run theactivatecommand again. The(venv)prefix reminds you when it's active.
With the virtual environment active, run:
pip install requests python-dotenv
You should see output ending with Successfully installed .... You only need to do this once.
The script needs permission to access your HubSpot account. You grant that by creating a Private App in HubSpot and copying its token.
- Log into HubSpot and click the Settings gear icon (top right)
- In the left sidebar, go to Integrations → Private Apps
- Click Create a private app
- Give it a name like
Company Dedup Script - Click the Scopes tab and search for and enable these three scopes:
| Scope | Why it's needed |
|---|---|
crm.objects.companies.read |
Read company data |
crm.objects.companies.write |
Merge and update companies |
crm.schemas.companies.read |
Read property definitions |
- Click Create app → Continue creating
- On the next screen, click Show token and copy the full token — it starts with
pat-
Keep this token private. It grants access to your HubSpot data. Don't paste it into emails or shared documents.
In your hubspot-dedup folder you'll find a file called .env.example. Make a copy of it named .env:
Mac:
cp .env.example .env
Windows:
copy .env.example .env
Open .env in any text editor (Notepad, TextEdit, VS Code) and replace the placeholder with your actual token:
HUBSPOT_API_KEY=pat-na1-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Save the file. The script reads this automatically every time it runs — you don't need to do anything else.
Why a .env file? It keeps your token in one place on your machine and out of your terminal history. The
.gitignorein this folder ensures the file is never accidentally shared if you ever put this folder in version control.
The script has three modes, each accepting a different CSV format. Choose the one that matches what you have.
Place your CSV file in the same hubspot-dedup folder as the script.
Use this when you have a spreadsheet where you've already identified which company ID is the primary (to keep) and which is the duplicate (to merge in).
CSV format — two columns, one pair per row:
primary_company_id,duplicate_company_id
12345678,98765432
11111111,22222222
33333333,44444444
Save this file as something like pairs.csv in your project folder.
Use this when you've exported your companies from HubSpot and want the script to automatically find duplicates by matching company names. It will keep the oldest record for each name.
How to export from HubSpot:
- Go to CRM → Companies
- Optionally filter to just the affected records
- Click Actions or the export icon (top right of the table) → Export
- Choose CSV format
- Make sure these columns are included: Record ID, Company name, Create Date
- Download and move the file into your project folder
The script expects HubSpot's default column names. If your export uses different headers, see the flags section below.
Use this when you have a list of the specific company IDs you want to keep, and you want the script to search HubSpot for any other companies with the same name.
CSV format — record ID and company name:
record_id,name
12345678,Acme Corp
11111111,Globex Industries
Save this as something like primaries.csv in your project folder.
Make sure you have:
- Terminal open and navigated to your project folder (
cd ~/Desktop/hubspot-dedup) - Virtual environment active (you see
(venv)in your prompt) - Your
.envfile exists in the folder and contains yourHUBSPOT_API_KEY - Your CSV file in the same folder
A dry run shows you exactly what the script would do without making any changes to HubSpot. Always run this first and review the output before running live.
Mode 1 — pairs:
python3 dedup_companies.py pairs pairs.csv --dry-run
Mode 2 — export:
python3 dedup_companies.py export hs_companies_export.csv --dry-run
Mode 3 — search:
python3 dedup_companies.py search primaries.csv --dry-run
Read through the output carefully. For each pair you should see which company is primary, which is the duplicate, and what fields (if any) would be backfilled.
Once you're happy with the dry run output, remove --dry-run to execute the merges:
python3 dedup_companies.py pairs pairs.csv
The script will print progress to the screen and write a full log to dedup_companies.log in the same folder.
If you'd rather not use a .env file, you can pass the token directly on any command with --token:
python3 dedup_companies.py pairs pairs.csv --token pat-na1-xxxxxxxx --dry-run
When both are present, --token takes precedence over .env.
For every primary/duplicate pair the script:
- Fetches all properties from both the primary and duplicate company
- Identifies backfill fields — properties that are blank on the primary but have a value on the duplicate
- Calls the merge API — this carries all contacts, deals, and activities to the primary; the primary's own non-empty values win on any conflict
- Updates the primary with the backfill values from step 2, filling gaps without overwriting anything
The script uses HubSpot's official merge endpoint and intentionally avoids HubSpot's built-in duplicate tool.
You control exactly which ID is primary and which is the duplicate. No name matching happens.
python3 dedup_companies.py pairs pairs.csv
If your CSV uses different column names than the defaults:
python3 dedup_companies.py pairs pairs.csv \
--primary-col "Keep ID" \
--duplicate-col "Merge ID"
Groups all rows by company name (case-insensitive). For every group with 2 or more records, the oldest record by Create Date becomes the primary and all others are merged into it.
python3 dedup_companies.py export hs_companies_export.csv
If your export file uses different column headers:
python3 dedup_companies.py export export.csv \
--id-col "Record ID" \
--name-col "Name" \
--date-col "Create date"
Why oldest = primary: The oldest record is most likely to have the most complete history and the most existing associations. If you need a different rule (e.g., keep the record with the most contacts), use
pairsmode with a pre-processed CSV instead.
For each row in your CSV the script searches HubSpot for all companies with the exact same name. Any match whose ID differs from your specified primary is treated as a duplicate.
python3 dedup_companies.py search primaries.csv
If your CSV uses different column names:
python3 dedup_companies.py search primaries.csv \
--id-col "Company ID" \
--name-col "Company Name"
Rate limit note: HubSpot's Search API allows ~4 requests per second, which is a separate, stricter limit from general API calls. For a list of 100 companies, the search phase alone takes at least 25 seconds. The script shows an estimated wait time when it starts.
| Flag | Default | Description |
|---|---|---|
--token PAT |
value from .env |
Your HubSpot private app token |
--dry-run |
off | Preview every action without making any changes |
--results-csv FILE |
dedup_results.csv |
Name of the output results file |
After running, the script creates two files in your project folder:
A spreadsheet with one row per pair processed:
| Column | Description |
|---|---|
primary_id |
The company ID that was kept |
duplicate_id |
The company ID that was merged in |
status |
success, dry_run, http_error, or skipped_* |
fields_filled |
Properties that were backfilled onto the primary |
error |
Error details if something went wrong (blank on success) |
Open this in Excel or Google Sheets after running to confirm everything went as expected.
A detailed timestamped log of every step the script took. Useful for auditing the run or diagnosing problems.
1. Follow Steps 1–4 to get set up (one-time).
2. Prepare your CSV file (Step 5).
3. Run with --dry-run and read the output.
4. Spot-check 2–3 pairs manually in HubSpot to confirm they look right.
5. Run live on just the first 5 rows to test end-to-end.
6. Open dedup_results.csv and confirm all rows show "success".
7. Run the full list.
command not found: python3
Python is not installed or not on your PATH. Go back to Step 1. On Windows, make sure you checked "Add Python to PATH" during installation.
ModuleNotFoundError: No module named 'requests'
The requests library isn't installed in your current environment. Make sure your virtual environment is active (you should see (venv) in your prompt) and run pip install requests again.
No HubSpot API token found
The script can't find your token. Make sure .env exists in the same folder as the script and contains HUBSPOT_API_KEY=pat-.... Alternatively, pass it directly with --token pat-....
Column 'X' not found
The column names in your CSV don't match what the script expects. Check the headers in your file (open it in a text editor or Excel) and use --id-col, --name-col, --date-col, --primary-col, or --duplicate-col to point to the correct ones.
Company 12345 not found (already merged or deleted)
The duplicate was already merged in a previous run or was deleted. The script logs it as skipped_duplicate_missing and moves on — this is safe to ignore.
HTTP 403 / insufficient scopes
Your private app token is missing one of the required scopes. Go back to Step 4 and make sure all three scopes are enabled.
HTTP 429 / rate limit
Another process is hitting HubSpot's API at the same time from the same account. Wait a minute and re-run — pairs that already succeeded will not be re-processed (the duplicate no longer exists, so they are skipped cleanly).
Search mode returns unexpected companies The search uses exact name matching, but company names with extra spaces or special characters may behave unexpectedly. Always review the dry-run output before running live.