-
-
Save jwmcgettigan/0bf7cd39947764896735997056ca74d7 to your computer and use it in GitHub Desktop.
# Copyright © 2023 Justin McGettigan | |
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |
# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |
# This script will pull all of your vault 'items' using the 'bw list items' command and then it will compare | |
# all properties that are not inherently unique from the returned JSON to determine if they are duplicates. | |
# Note: It removes older duplicates first - the newest copy of the 'item' will be the only one to remain. | |
# You can simply flip the '>' sign to '<' if you want to preserve the oldest 'item' instead. | |
# | |
# Setup Steps | |
# 1. You must install Bitwarden CLI first: https://bitwarden.com/help/cli/#download-and-install | |
# 2. Login to the CLI with the 'bw login' command. You need your session key setup before continuing: https://bitwarden.com/help/cli/#using-a-session-key | |
# 3. Make sure to backup your 'items'. You can use the 'bw export' command to do so: https://bitwarden.com/help/cli/#export | |
# 4. Run this python script and your duplicate 'items' will start being deleted. https://bitwarden.com/help/cli/#delete | |
# Note: I am NOT using the '--permanent' flag. This means you can restore anything this script deletes within 30 days. | |
# Note2: The deletion process is pretty slow (1-2/items per second) so you'll likely need to let it run for a while. | |
import json | |
import hashlib | |
import subprocess | |
item_dict = {} | |
# Get the JSON data for each item in the vault | |
output = subprocess.check_output(['bw', 'list', 'items']) | |
items = json.loads(output) | |
for item in items: | |
# Remove unique fields from the item data | |
item_data = item.copy() | |
del item_data['id'] | |
del item_data['folderId'] | |
del item_data['revisionDate'] | |
del item_data['creationDate'] | |
del item_data['deletedDate'] | |
# Calculate a hash of the item data | |
item_hash = hashlib.sha256(str(item_data).encode('utf-8')).hexdigest() | |
# Check if we've seen this item before | |
if item_hash in item_dict: | |
# Compare the revisionDate to see which item is newer | |
if item['revisionDate'] > item_dict[item_hash]['revisionDate']: | |
print(f'Duplicate item found: {item["name"]}') | |
subprocess.run(['bw', 'delete', 'item', item_dict[item_hash]['id']]) | |
print(f'Deleted older item "{item_dict[item_hash]["name"]}".') | |
item_dict[item_hash] = item | |
else: | |
print(f'Duplicate item found: {item["name"]}') | |
subprocess.run(['bw', 'delete', 'item', item['id']]) | |
print(f'Deleted older item "{item["name"]}".') | |
else: | |
item_dict[item_hash] = item |
is there a report at the end of the process that lets you know which one it's been removed and which has not?
Not in its current form no, but the script does print out the 'name' of items as it deletes them.
does it compare also the additional information stored in the password tab? Like TOTP, notes, custom fields, etc?
In addition, does it compare and remove only passwords or also other items? Like cards and secret notes?
It should be yes to both. Since we use the bw list items
command, the script should check for duplicates for anything that bitwarden considers an 'item'. https://bitwarden.com/help/managing-items/
Do note that I only created this script for my specific use case so it hasn't been thoroughly tested to determine if it works for all of the scenarios you mentioned - theoretically though, it should work for all of those scenarios.
Here is an example of a bitwarden item
:
{
"object": "item",
"id": "1e113c10-881f-4f01-b88d-afd20162981b",
"organizationId": null,
"folderId": "6ab751e1-74ad-4c38-912a-afd20162981c",
"type": 1,
"reprompt": 0,
"name": "",
"notes": null,
"favorite": false,
"login": {
"uris": [
{
"match": null,
"uri": ""
}
],
"username": "",
"password": "",
"totp": null,
"passwordRevisionDate": null
},
"collectionIds": [],
"revisionDate": "2023-03-27T21:31:02.240Z",
"creationDate": "2023-03-27T21:31:02.240Z",
"deletedDate": null
}
Before checking for duplicates, these lines in the script remove the fields from the 'item' objects that are inherently unique regardless of item content.
del item_data['id']
del item_data['folderId']
del item_data['revisionDate']
del item_data['creationDate']
del item_data['deletedDate']
It would be cool if it lets you choose what to keep and what to remove, but it doesn't seem to let you do that.
That is correct as I just threw this script together quickly for my specific use case. Feel free to copy this script and add that functionality if you would like - I personally don't have a need for it and don't plan to add such functionality to this script.
I ask just to know how it works and how to better use it.
Thank you for asking! I'm happy to help explain what I can. I didn't have much luck with the other solutions online so I created this one and hope that it might help others as well.
Awesome - works a treat and I can see what it does! Thanks, mate.
I had almost the same idea, but not using the CLI app.
- export old.json from bitwarden
- purge vault
- run script to make new.json
- import
import json,sys
with open(sys.argv[1]) as f:
d = json.load(f)
dd = {}
for item in d['items']:
dd[repr({**item,'id':0})] = item
d['items'] = list(dd.values())
with open(sys.argv[2], 'w', encoding='utf-8') as f:
json.dump(d, f, indent=2)
Usage:
python3 dedup.py old.json new.json
Adding the other fields (inspired by your script):
import json,sys
with open(sys.argv[1]) as f:
d = json.load(f)
dd = {}
for item in d['items']:
remove = {'id':0,'folderId':0,'revisionDate':0,'creationDate':0,'deletedDate':0}
dd[repr({**item, **remove})] = item
d['items'] = list(dd.values())
with open(sys.argv[2], 'w', encoding='utf-8') as f:
json.dump(d, f, indent=2)
It keeps asking for the master password every time it finds a duplicate item.
It keeps asking for the master password every time it finds a duplicate item.
That shouldn't happen if you've setup the session key properly: https://bitwarden.com/help/cli/#using-a-session-key
This is awesome. I'd like to use it and extend it a little bit. Is that okay? Do you want to assign a specific license to it?
This is awesome. I'd like to use it and extend it a little bit. Is that okay? Do you want to assign a specific license to it?
@ewa I've added MIT license text to the top of the gist. Feel free to use the script however you'd like! 👍
Since many people have found this gist useful and I was feeling motivated, I've finished a pretty major overhaul of this script.
To any future readers, all comments preceding this one were referring to this revision.
Changelog
- The script now also deletes duplicate folders where the previous version only deleted duplicate items.
- The script output is now much fancier with colors and loading bars but does now require installing the
colorama
andtqdm
packages. - The script now automatically syncs your Bitwarden vault to ensure the data is up-to-date. This can be disabled with the
--no-sync
flag. - Session setup can now be taken care of by the script itself. If the user wants to opt-out of it they can use the
--no-auth
flag. - Added the
--dry-run
flag for if you want to run the script without actually deleting anything. - Added the
--ignore-history
flag for if you want to ignore password history when determining duplicates. - Added the
--oldest
flag for if you want to keep the older duplicates instead of the newer ones. - Added the
--empty-folders
flag for if you want empty folders to be included when determining duplicate folders. - The script now has a
VERSION
associated with it that can be checked with the--version
flag. - Added a summary to the end of a successful run of the script.
- All functions are well documented and the code should be very readable. Unfortunately the script is far larger as a result.
Example Usage (Preview Image)
@jwmcgettigan thanks! Here's my forked version (merged up to before your overhaul) that I used. It's arguably cruftier than the original, but it includes a couple command-line options and automates the login/session-key process, as least in simple situations. Having used it successfully to solve my problem, I don't really expect to develop it any further, but if there's anything in there that you want to incorporate back into the real version, feel free to!
https://gist.github.com/ewa/f5e115628b955bf8cd1e0540116b135a
May I suggest adding a shebang: #! /usr/bin/env python3
— so that script could be run using just filename, not python <filename>
using the script I get the following error
File ".....bitwarden_duplicate_cleaner.py", line 317, in
def identify_duplicate_item(item: dict, item_dict: dict) -> dict | None:
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
I use Python 3.9.1
regards
Ralf
May I suggest adding a shebang:
#! /usr/bin/env python3
— so that script could be run using just filename, notpython <filename>
@shvchk Thanks! I've added it as you've suggested.
using the script I get the following error File ".....bitwarden_duplicate_cleaner.py", line 317, in def identify_duplicate_item(item: dict, item_dict: dict) -> dict | None: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
I use Python 3.9.1 regards Ralf
@RalfZi |
was added in python 3.10 so any python version earlier than that will experience that error. Unfortunately, the script would have to be refactored to support earlier versions of python if you have that need.
May I suggest adding a shebang:
#! /usr/bin/env python3
— so that script could be run using just filename, notpython <filename>
@shvchk Thanks! I've added it as you've suggested.
using the script I get the following error File ".....bitwarden_duplicate_cleaner.py", line 317, in def identify_duplicate_item(item: dict, item_dict: dict) -> dict | None: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
I use Python 3.9.1 regards Ralf@RalfZi
|
was added in python 3.10 so any python version earlier than that will experience that error. Unfortunately, the script would have to be refactored to support earlier versions of python if you have that need.
Ok thanks so I had to update my python version
Can you please make this a proper repository? I think this has huge potential. However, I stumbled upon this deep into a link session, and it doesn't show up on github's default search. Additionally, instead of a long string of comments there is issue tracking, discussions, github pages for documentation, etc.
I am having problems with the lockfile:
Error: Lock file is already being held
at /usr/local/Cellar/bitwarden-cli/2023.12.0/libexec/lib/node_modules/@bitwarden/cli/node_modules/proper-lockfile/lib/lockfile.js:53:43
at FSReqCallback.oncomplete (node:fs:191:23) {
code: 'ELOCKED',
file: '/Users/ian.gallina/Library/Application Support/Bitwarden CLI/data.json'
Even when I run the 'bw items list' manually, it does not work.
I even tried to close Chrome and Bitwarden to make sure there are no other processes using the file.
On the other hand, I don't find better information how to fix this on the Bitwarden docs.
Any help?
I am having problems with the lockfile:
Error: Lock file is already being held at /usr/local/Cellar/bitwarden-cli/2023.12.0/libexec/lib/node_modules/@bitwarden/cli/node_modules/proper-lockfile/lib/lockfile.js:53:43 at FSReqCallback.oncomplete (node:fs:191:23) { code: 'ELOCKED', file: '/Users/ian.gallina/Library/Application Support/Bitwarden CLI/data.json'
Even when I run the 'bw items list' manually, it does not work. I even tried to close Chrome and Bitwarden to make sure there are no other processes using the file.
On the other hand, I don't find better information how to fix this on the Bitwarden docs. Any help?
Guys, just found a bug fixed a few hours ago on the Bitwarden that fix this.
bitwarden/clients#7126
I am having problems with the lockfile:
Error: Lock file is already being held at /usr/local/Cellar/bitwarden-cli/2023.12.0/libexec/lib/node_modules/@bitwarden/cli/node_modules/proper-lockfile/lib/lockfile.js:53:43 at FSReqCallback.oncomplete (node:fs:191:23) { code: 'ELOCKED', file: '/Users/ian.gallina/Library/Application Support/Bitwarden CLI/data.json'
Even when I run the 'bw items list' manually, it does not work. I even tried to close Chrome and Bitwarden to make sure there are no other processes using the file.
On the other hand, I don't find better information how to fix this on the Bitwarden docs. Any help?
It seems that the currently released packages don't fix this issue. I was able to get it to work after returning bw
from v2023.12.0 to v2023.10.0.
npm install -g @bitwarden/[email protected]
Can you please make this a proper repository? I think this has huge potential. However, I stumbled upon this deep into a link session, and it doesn't show up on github's default search. Additionally, instead of a long string of comments there is issue tracking, discussions, github pages for documentation, etc.
@JacobCarrell Thank you for the suggestion. Now that I have some time and there's sufficient interest, I'll spend some of it transitioning this gist to a repo.
It seems that the currently released packages don't fix this issue. I was able to get it to work after returning bw from v2023.12.0 to v2023.10.0.
@IvanLi-CN @iGallina Thank you for sharing the issue. It appears that the hotfix for bitwarden/clients#7126 was finally released so you should be able to use the latest version. I can run it without issue with version 2023.12.1
.
Hi, I'm having problem running the script.
==================================================
Bitwarden Duplicate Cleaner - Version 1.1.0
A script that deletes duplicate items and folders.
==================================================
Traceback (most recent call last):
File "c:\Users\jakub\Downloads\bitwarden-dedup_python_script\bitwarden_duplicate_cleaner.py", line 641, in <module>
check_bw_installed()
File "c:\Users\jakub\Downloads\bitwarden-dedup_python_script\bitwarden_duplicate_cleaner.py", line 165, in check_bw_installed
subprocess.check_output(['bw', '--version'])
File "C:\Program Files\Python312\Lib\subprocess.py", line 466, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\subprocess.py", line 548, in run
with Popen(*popenargs, **kwargs) as process:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Program Files\Python312\Lib\subprocess.py", line 1538, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] Systém nemůže nalézt uvedený soubor
Tried running it in Python 3.10, but with same result.
I installed BitWarden CLI via npm, version 2024.2.1
Do I need to explicitely add bw to PATH? Because when I use it in any location, it is accessible - so I'm a bit confused.
Or perhaps did I the installation wrong?
I have a bit of expirience in Python and npm (when I was learning React at school), and it's my first time writing gist
Thank you for any help,
Regards, Jakub
So, a little update.
I did a bit a digging and tinkering.
Found out that if I pass every usage of subprocess
library with the shell=True
argument, it fixes the error.
Although I'm not exactly sure why it works, I'm happy that I was able to get it working with my little knowledge of Python and cmd .
So, a little update.
I did a bit a digging and tinkering. Found out that if I pass every usage of
subprocess
library with theshell=True
argument, it fixes the error. Although I'm not exactly sure why it works, I'm happy that I was able to get it working with my little knowledge of Python and cmd .
You don't have to do that if you put "bw" cli in your PATH somewhere. It works for me without messing with subprocess. I did get the same error until I moved bw (Linux version on Debian) to my ~/bin directory and re-sourced my .profile which has ~/bin in my PATH.
Great script but it does not work for all items. I have about 5500 items and it found only 33 duplicates. That's impossible. For sure there is about 2000-2500 duplicates ;/
Great script, remove 562 items but i can see that i have more duplicate in the list. THen few duplicates are not deleted.
Thanks
Hi, is there a report at the end of the process that lets you know which one it's been removed and which has not?
Plus, does it compare also the additional information stored in the password tab? Like TOTP, notes, custom fields, etc?
In addition, does it compare and remove only passwords or also other items? Like cards and secret notes?
I ask just to know how it works and how to better use it.
It would be cool if it lets you choose what to keep and what to remove, but it doesn't seem to let you do that.