Skip to content

Instantly share code, notes, and snippets.

@lorey
Last active November 5, 2024 15:20
Show Gist options
  • Save lorey/079c5e178c9c9d3c30ad87df7f70491d to your computer and use it in GitHub Desktop.
Save lorey/079c5e178c9c9d3c30ad87df7f70491d to your computer and use it in GitHub Desktop.
Access Chrome's network tab (e.g. XHR requests) with Selenium
#
# This small example shows you how to access JS-based requests via Selenium
# Like this, one can access raw data for scraping,
# for example on many JS-intensive/React-based websites
#
from time import sleep
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
# make chrome log requests
capabilities = DesiredCapabilities.CHROME
capabilities["loggingPrefs"] = {"performance": "ALL"} # newer: goog:loggingPrefs
driver = webdriver.Chrome(
desired_capabilities=capabilities, executable_path="./chromedriver"
)
# fetch a site that does xhr requests
driver.get("https://sitewithajaxorsomething.com")
sleep(5) # wait for the requests to take place
# extract requests from logs
logs_raw = driver.get_log("performance")
logs = [json.loads(lr["message"])["message"] for lr in logs_raw]
def log_filter(log_):
return (
# is an actual response
log_["method"] == "Network.responseReceived"
# and json
and "json" in log_["params"]["response"]["mimeType"]
)
for log in filter(log_filter, logs):
request_id = log["params"]["requestId"]
resp_url = log["params"]["response"]["url"]
print(f"Caught {resp_url}")
print(driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id}))
@hamzaadad
Copy link

hamzaadad commented May 23, 2022

hello; thanks for sharing this gist; your code is working fine, i just got this little issue and can't get my head arround it;
so what i'm trying to log is a xhr call made by a webworker;
so getting the performance log on the main threads doesnt list the request i want;
in chrome when i select the worker in console tab, i can execute "performance.getEntries()" only then i can get the request i want
any idea on how to do that on selenium ?

@milanbog92
Copy link

milanbog92 commented Nov 22, 2022

Used this method for a while, after some time during script run and without clear reason "driver.execute_cdp_cmd" function throws error:
'WebDriver' object has no attribute 'execute_cdp_cmd'

Looking for alternative solution, feel free to suggest what could be done...

@lorey
Copy link
Author

lorey commented Nov 22, 2022

Hey @milanbog92, how about:

@milanbog92
Copy link

@lorey Thanks for the fast response!

Since I am executing my "python3 script.py" from external script it seams that my system has loaded wrong python version. I have seen that python3.6 is showing error consistently while python3.9 is working as expected. Hopefully this will help someone...

I was stumbling across all solutions available, and I believe that there is no better one, Selenium cant load Chrome extension that uses chrome.debugger API and I have no luck with hotkeys for now in my complex environment.

@FaMousNoob
Copy link

@lorey, thanks for your fantastic work, just one more thing.
is there a way that i could get only the response data from a specific url?

@milanbog92
Copy link

milanbog92 commented Dec 5, 2022

Hi,

I use performance_logs instead of logs_raw variable name and skipping "chrome://favicon2" and searching for image_name

performance_logs = driver.get_log("performance")
		for performance_log in performance_logs:
			performance_log_json = json.loads(performance_log["message"])
			if performance_log_json["message"]["method"] == 'Network.responseReceived':
				if performance_log_json["message"]["params"]["response"]["url"].find('chrome://favicon2/') != -1:
					continue;
				if performance_log_json["message"]["params"]["response"]["url"].find(image_name) != -1:
					print(performance_log_json["message"]["params"]["response"]["url"])
					print(performance_log_json["message"]["params"]["requestId"])
					print(performance_log_json["message"]["params"]["type"])

@LiamKrenn
Copy link

Desired Capabilities is deprecated and can't be used anymore, how can I achieve this without it?

@Newtoniano
Copy link

@danbailo
Copy link

danbailo commented Oct 7, 2023

Hello guys, for Selenium 4.x use it

driver.options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver.get(url)

then just follow the steps from line 24+

Works for selenium 4.13.0

@tinyhare
Copy link

for Selenium 4.15 set the option:

options = webdriver.ChromeOptions()
options.set_capability(
            "goog:loggingPrefs", {"performance": "ALL"}
        )
driver = webdriver.Chrome(options=options)

@nathan-fiscaletti
Copy link

nathan-fiscaletti commented Jan 25, 2024

Something I noticed is that you need to filter out Preflight requests.

if event['params']['type'] != 'Preflight':
    . . .

Otherwise, you might get this error:

{"code":-32000,"message":"No resource with given identifier found"}

@ofostier
Copy link

ofostier commented Jul 9, 2024

Hello
Is there a way to use this with selenium Grid ? (remote)

With selnium grid I can catch the request but never the response

options = webdriver.ChromeOptions()
    options.add_argument('--ignore-ssl-errors=yes')
    options.add_argument('--ignore-certificate-errors')
    options.add_experimental_option('w3c', True)
    # Try to catch XHR response
    options.set_capability(
        "goog:loggingPrefs", {"performance": "ALL"}
)

driver = webdriver.Remote(
  command_executor='http://'+GRID_HOST+'/wd/hub',
  options=options,
)

driver.get("https://www.mywebsite.com/")

searchbox = driver.find_element(By.ID, "searchbox")
searchbox.send_keys("type something in the searchbox")

logs_raw = driver.get_log("performance")

for log in filter(log_filter, requests):
  request_id = log["params"]["requestId"]
  resp_url = log["params"]["response"]["url"]
  if 'aj_recherche' in resp_url:
   response = driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})

I can then log every request associated to searchbox (ajax)
But never the JSON response (I can find it in the Chrome dev console)

Any idea ?

@mrcsmchll
Copy link

mrcsmchll commented Sep 12, 2024

Hello Is there a way to use this with selenium Grid ? (remote)

With selnium grid I can catch the request but never the response

options = webdriver.ChromeOptions()
    options.add_argument('--ignore-ssl-errors=yes')
    options.add_argument('--ignore-certificate-errors')
    options.add_experimental_option('w3c', True)
    # Try to catch XHR response
    options.set_capability(
        "goog:loggingPrefs", {"performance": "ALL"}
)

driver = webdriver.Remote(
  command_executor='http://'+GRID_HOST+'/wd/hub',
  options=options,
)

driver.get("https://www.mywebsite.com/")

searchbox = driver.find_element(By.ID, "searchbox")
searchbox.send_keys("type something in the searchbox")

logs_raw = driver.get_log("performance")

for log in filter(log_filter, requests):
  request_id = log["params"]["requestId"]
  resp_url = log["params"]["response"]["url"]
  if 'aj_recherche' in resp_url:
   response = driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})

I can then log every request associated to searchbox (ajax) But never the JSON response (I can find it in the Chrome dev console)

Any idea ?

@ofostier Yes! Took me some time to figure out but starting from Selenium 4.16.0 you should replace response = driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id}) with:

response = driver.execute(
                        driver_command="executeCdpCommand",
                        params={
                            "cmd": "Network.getResponseBody",
                            "params": {"requestId": request_id},
                                         },
                    )
body = response["value"]["body"]

source: get_browser_request_body(driver: WebDriver, request_id: str) answer from Borys Oliinyk.
Also consider what @nathan-fiscaletti mentioned about avoiding error -32000. In my case it happens for responses captured from previous sessions where I cleared session and local storage but was still using te same webdriver Remote() instance

@lhfmartin
Copy link

It's better to listen for the Network.loadingFinished event before calling driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id}) to prevent this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment