Last active
November 11, 2024 21:37
-
-
Save eliasdabbas/169cc580f8d10a63d5a5d3df04ef9758 to your computer and use it in GitHub Desktop.
Get the most up-to-date list of IP addresses for crawler bots, belonging to Google and Bing.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import ipaddress | |
import requests | |
import pandas as pd | |
def bot_ip_addresses(): | |
bots_urls = { | |
'google': 'https://developers.google.com/search/apis/ipranges/googlebot.json', | |
'bing': 'https://www.bing.com/toolbox/bingbot.json' | |
} | |
ip_addresses = [] | |
for bot, url in bots_urls.items(): | |
bot_resp = requests.get(url) | |
for iprange in bot_resp.json()['prefixes']: | |
network = iprange.get('ipv4Prefix') | |
if network: | |
ip_list = [(bot, str(ip)) for ip in ipaddress.IPv4Network(network)] | |
ip_addresses.extend(ip_list) | |
return pd.DataFrame(ip_addresses, columns=['bot_name', 'ip_address']) |
That's cool.
The code I shared can be used for programmatically grabbing the content from the page they are listed on. (or any equivalent in another language).
Nice! I'll consider scraping that page as well.
Might also find this repo of use https://github.com/AnTheMaker/GoodBots
@johnmurch
Interesting. Keeping and updating a static list of IPs can be another useful approach.
Thanks for sharing!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I've created a DuckDuckGo prefixes file here:
https://jsoneditoronline.org/#left=cloud.511273c830ca42a488778345c096f6a5
Unfortunately I do not see a way to grab this content programmatically from this site, but you can at least consume it and use it locally.