Skip to content

Instantly share code, notes, and snippets.

@eliasdabbas
Last active November 11, 2024 21:37
Show Gist options
  • Save eliasdabbas/169cc580f8d10a63d5a5d3df04ef9758 to your computer and use it in GitHub Desktop.
Save eliasdabbas/169cc580f8d10a63d5a5d3df04ef9758 to your computer and use it in GitHub Desktop.
Get the most up-to-date list of IP addresses for crawler bots, belonging to Google and Bing.
import ipaddress
import requests
import pandas as pd
def bot_ip_addresses():
bots_urls = {
'google': 'https://developers.google.com/search/apis/ipranges/googlebot.json',
'bing': 'https://www.bing.com/toolbox/bingbot.json'
}
ip_addresses = []
for bot, url in bots_urls.items():
bot_resp = requests.get(url)
for iprange in bot_resp.json()['prefixes']:
network = iprange.get('ipv4Prefix')
if network:
ip_list = [(bot, str(ip)) for ip in ipaddress.IPv4Network(network)]
ip_addresses.extend(ip_list)
return pd.DataFrame(ip_addresses, columns=['bot_name', 'ip_address'])
@eliasdabbas
Copy link
Author

@johnmurch
Interesting. Keeping and updating a static list of IPs can be another useful approach.
Thanks for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment