Skip to content

Instantly share code, notes, and snippets.

@lorey
Last active November 5, 2024 15:20
Show Gist options
  • Save lorey/079c5e178c9c9d3c30ad87df7f70491d to your computer and use it in GitHub Desktop.
Save lorey/079c5e178c9c9d3c30ad87df7f70491d to your computer and use it in GitHub Desktop.
Access Chrome's network tab (e.g. XHR requests) with Selenium
#
# This small example shows you how to access JS-based requests via Selenium
# Like this, one can access raw data for scraping,
# for example on many JS-intensive/React-based websites
#
from time import sleep
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
# make chrome log requests
capabilities = DesiredCapabilities.CHROME
capabilities["loggingPrefs"] = {"performance": "ALL"} # newer: goog:loggingPrefs
driver = webdriver.Chrome(
desired_capabilities=capabilities, executable_path="./chromedriver"
)
# fetch a site that does xhr requests
driver.get("https://sitewithajaxorsomething.com")
sleep(5) # wait for the requests to take place
# extract requests from logs
logs_raw = driver.get_log("performance")
logs = [json.loads(lr["message"])["message"] for lr in logs_raw]
def log_filter(log_):
return (
# is an actual response
log_["method"] == "Network.responseReceived"
# and json
and "json" in log_["params"]["response"]["mimeType"]
)
for log in filter(log_filter, logs):
request_id = log["params"]["requestId"]
resp_url = log["params"]["response"]["url"]
print(f"Caught {resp_url}")
print(driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id}))
@JaeEon-Ryu
Copy link

To : lee-hodg

I think it's an error that came from accessing a place without resources.
It works well with try-except syntax.

This is really great, however at the final step of getting the response body using the requestId I get

self.driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})
2021-05-06 14:04:12 jim-ThinkPad-S5-S540 selenium.webdriver.remote.remote_connection[36958] DEBUG POST http://127.0.0.1:42437/session/b29c0918324a3defb5d6d11100dd3bec/goog/cdp/execute {"cmd": "Network.getResponseBody", "params": {"requestId": "37056.284"}}
2021-05-06 14:04:12 jim-ThinkPad-S5-S540 urllib3.connectionpool[36958] DEBUG http://127.0.0.1:42437 "POST /session/b29c0918324a3defb5d6d11100dd3bec/goog/cdp/execute HTTP/1.1" 500 253
2021-05-06 14:04:12 jim-ThinkPad-S5-S540 selenium.webdriver.remote.remote_connection[36958] DEBUG Finished Request
*** selenium.common.exceptions.WebDriverException: Message: unknown error: unhandled inspector error: {"code":-32000,"message":"No resource with given identifier found"}
  (Session info: chrome=89.0.4389.114)

@thswen
Copy link

thswen commented Sep 15, 2021

I was working on a way to do this for a week or two before I found your post. Works beautifully for what I needed, thanks a bunch.

@megapegabot
Copy link

it's work! Senk's) I was looking for a solution for a long time, and you helped! 👍

@lorey
Copy link
Author

lorey commented Nov 8, 2021

Thanks for the kindness everyone. Glad I could help you out. Please feel free to check out my profile with similar tools and libraries at https://github.com/lorey <3

@billy8407
Copy link

Awsome!!

@BlondinkaQ
Copy link

how get xhr from real browser online?

@lorey
Copy link
Author

lorey commented Jan 10, 2022

Selenium is using a real browser. If you want to do it manually yourself, check out developer tools (e.g. F12 in Chrome, tab "Network").

@nikolaysm
Copy link

@lorey, thanks for sharing.

For Chrome >=75 we have to do small changes.

As specified in the release notes for ChromeDriver 75.0.3770.8, capability loggingPrefs has been renamed to goog:loggingPrefs

@skndrvoip
Copy link

im looking to print the response after click button to know the status response of this click if it's successful or failed the only way to know the status its to open dev tool and go to network and check the response manual from here
F2226CF6-6EBD-4DF5-A042-F7214CFD9785
FFCDCAA1-FDFD-4BEB-975A-0F9A37FE181F
so i need method to print this status in log

@hamzaadad
Copy link

hamzaadad commented May 23, 2022

hello; thanks for sharing this gist; your code is working fine, i just got this little issue and can't get my head arround it;
so what i'm trying to log is a xhr call made by a webworker;
so getting the performance log on the main threads doesnt list the request i want;
in chrome when i select the worker in console tab, i can execute "performance.getEntries()" only then i can get the request i want
any idea on how to do that on selenium ?

@milanbog92
Copy link

milanbog92 commented Nov 22, 2022

Used this method for a while, after some time during script run and without clear reason "driver.execute_cdp_cmd" function throws error:
'WebDriver' object has no attribute 'execute_cdp_cmd'

Looking for alternative solution, feel free to suggest what could be done...

@lorey
Copy link
Author

lorey commented Nov 22, 2022

Hey @milanbog92, how about:

@milanbog92
Copy link

@lorey Thanks for the fast response!

Since I am executing my "python3 script.py" from external script it seams that my system has loaded wrong python version. I have seen that python3.6 is showing error consistently while python3.9 is working as expected. Hopefully this will help someone...

I was stumbling across all solutions available, and I believe that there is no better one, Selenium cant load Chrome extension that uses chrome.debugger API and I have no luck with hotkeys for now in my complex environment.

@FaMousNoob
Copy link

@lorey, thanks for your fantastic work, just one more thing.
is there a way that i could get only the response data from a specific url?

@milanbog92
Copy link

milanbog92 commented Dec 5, 2022

Hi,

I use performance_logs instead of logs_raw variable name and skipping "chrome://favicon2" and searching for image_name

performance_logs = driver.get_log("performance")
		for performance_log in performance_logs:
			performance_log_json = json.loads(performance_log["message"])
			if performance_log_json["message"]["method"] == 'Network.responseReceived':
				if performance_log_json["message"]["params"]["response"]["url"].find('chrome://favicon2/') != -1:
					continue;
				if performance_log_json["message"]["params"]["response"]["url"].find(image_name) != -1:
					print(performance_log_json["message"]["params"]["response"]["url"])
					print(performance_log_json["message"]["params"]["requestId"])
					print(performance_log_json["message"]["params"]["type"])

@LiamKrenn
Copy link

Desired Capabilities is deprecated and can't be used anymore, how can I achieve this without it?

@Newtoniano
Copy link

@danbailo
Copy link

danbailo commented Oct 7, 2023

Hello guys, for Selenium 4.x use it

driver.options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver.get(url)

then just follow the steps from line 24+

Works for selenium 4.13.0

@tinyhare
Copy link

for Selenium 4.15 set the option:

options = webdriver.ChromeOptions()
options.set_capability(
            "goog:loggingPrefs", {"performance": "ALL"}
        )
driver = webdriver.Chrome(options=options)

@nathan-fiscaletti
Copy link

nathan-fiscaletti commented Jan 25, 2024

Something I noticed is that you need to filter out Preflight requests.

if event['params']['type'] != 'Preflight':
    . . .

Otherwise, you might get this error:

{"code":-32000,"message":"No resource with given identifier found"}

@ofostier
Copy link

ofostier commented Jul 9, 2024

Hello
Is there a way to use this with selenium Grid ? (remote)

With selnium grid I can catch the request but never the response

options = webdriver.ChromeOptions()
    options.add_argument('--ignore-ssl-errors=yes')
    options.add_argument('--ignore-certificate-errors')
    options.add_experimental_option('w3c', True)
    # Try to catch XHR response
    options.set_capability(
        "goog:loggingPrefs", {"performance": "ALL"}
)

driver = webdriver.Remote(
  command_executor='http://'+GRID_HOST+'/wd/hub',
  options=options,
)

driver.get("https://www.mywebsite.com/")

searchbox = driver.find_element(By.ID, "searchbox")
searchbox.send_keys("type something in the searchbox")

logs_raw = driver.get_log("performance")

for log in filter(log_filter, requests):
  request_id = log["params"]["requestId"]
  resp_url = log["params"]["response"]["url"]
  if 'aj_recherche' in resp_url:
   response = driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})

I can then log every request associated to searchbox (ajax)
But never the JSON response (I can find it in the Chrome dev console)

Any idea ?

@mrcsmchll
Copy link

mrcsmchll commented Sep 12, 2024

Hello Is there a way to use this with selenium Grid ? (remote)

With selnium grid I can catch the request but never the response

options = webdriver.ChromeOptions()
    options.add_argument('--ignore-ssl-errors=yes')
    options.add_argument('--ignore-certificate-errors')
    options.add_experimental_option('w3c', True)
    # Try to catch XHR response
    options.set_capability(
        "goog:loggingPrefs", {"performance": "ALL"}
)

driver = webdriver.Remote(
  command_executor='http://'+GRID_HOST+'/wd/hub',
  options=options,
)

driver.get("https://www.mywebsite.com/")

searchbox = driver.find_element(By.ID, "searchbox")
searchbox.send_keys("type something in the searchbox")

logs_raw = driver.get_log("performance")

for log in filter(log_filter, requests):
  request_id = log["params"]["requestId"]
  resp_url = log["params"]["response"]["url"]
  if 'aj_recherche' in resp_url:
   response = driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})

I can then log every request associated to searchbox (ajax) But never the JSON response (I can find it in the Chrome dev console)

Any idea ?

@ofostier Yes! Took me some time to figure out but starting from Selenium 4.16.0 you should replace response = driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id}) with:

response = driver.execute(
                        driver_command="executeCdpCommand",
                        params={
                            "cmd": "Network.getResponseBody",
                            "params": {"requestId": request_id},
                                         },
                    )
body = response["value"]["body"]

source: get_browser_request_body(driver: WebDriver, request_id: str) answer from Borys Oliinyk.
Also consider what @nathan-fiscaletti mentioned about avoiding error -32000. In my case it happens for responses captured from previous sessions where I cleared session and local storage but was still using te same webdriver Remote() instance

@lhfmartin
Copy link

It's better to listen for the Network.loadingFinished event before calling driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id}) to prevent this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment