Skip to content

Instantly share code, notes, and snippets.

@eliasdabbas
Last active November 11, 2024 21:37
Show Gist options
  • Save eliasdabbas/169cc580f8d10a63d5a5d3df04ef9758 to your computer and use it in GitHub Desktop.
Save eliasdabbas/169cc580f8d10a63d5a5d3df04ef9758 to your computer and use it in GitHub Desktop.
Get the most up-to-date list of IP addresses for crawler bots, belonging to Google and Bing.
import ipaddress
import requests
import pandas as pd
def bot_ip_addresses():
bots_urls = {
'google': 'https://developers.google.com/search/apis/ipranges/googlebot.json',
'bing': 'https://www.bing.com/toolbox/bingbot.json'
}
ip_addresses = []
for bot, url in bots_urls.items():
bot_resp = requests.get(url)
for iprange in bot_resp.json()['prefixes']:
network = iprange.get('ipv4Prefix')
if network:
ip_list = [(bot, str(ip)) for ip in ipaddress.IPv4Network(network)]
ip_addresses.extend(ip_list)
return pd.DataFrame(ip_addresses, columns=['bot_name', 'ip_address'])
@kstubs
Copy link

kstubs commented Apr 26, 2023

I've created a DuckDuckGo prefixes file here:
https://jsoneditoronline.org/#left=cloud.511273c830ca42a488778345c096f6a5
Unfortunately I do not see a way to grab this content programmatically from this site, but you can at least consume it and use it locally.

@eliasdabbas
Copy link
Author

That's cool.

The code I shared can be used for programmatically grabbing the content from the page they are listed on. (or any equivalent in another language).

@kstubs
Copy link

kstubs commented Jul 1, 2023

Nice! I'll consider scraping that page as well.

@johnmurch
Copy link

Might also find this repo of use https://github.com/AnTheMaker/GoodBots

@eliasdabbas
Copy link
Author

@johnmurch
Interesting. Keeping and updating a static list of IPs can be another useful approach.
Thanks for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment