-
-
Save lorey/079c5e178c9c9d3c30ad87df7f70491d to your computer and use it in GitHub Desktop.
# | |
# This small example shows you how to access JS-based requests via Selenium | |
# Like this, one can access raw data for scraping, | |
# for example on many JS-intensive/React-based websites | |
# | |
from time import sleep | |
from selenium import webdriver | |
from selenium.webdriver import DesiredCapabilities | |
# make chrome log requests | |
capabilities = DesiredCapabilities.CHROME | |
capabilities["loggingPrefs"] = {"performance": "ALL"} # newer: goog:loggingPrefs | |
driver = webdriver.Chrome( | |
desired_capabilities=capabilities, executable_path="./chromedriver" | |
) | |
# fetch a site that does xhr requests | |
driver.get("https://sitewithajaxorsomething.com") | |
sleep(5) # wait for the requests to take place | |
# extract requests from logs | |
logs_raw = driver.get_log("performance") | |
logs = [json.loads(lr["message"])["message"] for lr in logs_raw] | |
def log_filter(log_): | |
return ( | |
# is an actual response | |
log_["method"] == "Network.responseReceived" | |
# and json | |
and "json" in log_["params"]["response"]["mimeType"] | |
) | |
for log in filter(log_filter, logs): | |
request_id = log["params"]["requestId"] | |
resp_url = log["params"]["response"]["url"] | |
print(f"Caught {resp_url}") | |
print(driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})) |
hello; thanks for sharing this gist; your code is working fine, i just got this little issue and can't get my head arround it;
so what i'm trying to log is a xhr call made by a webworker;
so getting the performance log on the main threads doesnt list the request i want;
in chrome when i select the worker in console tab, i can execute "performance.getEntries()" only then i can get the request i want
any idea on how to do that on selenium ?
Used this method for a while, after some time during script run and without clear reason "driver.execute_cdp_cmd" function throws error:
'WebDriver' object has no attribute 'execute_cdp_cmd'
Looking for alternative solution, feel free to suggest what could be done...
Hey @milanbog92, how about:
- https://pypi.org/project/mitmproxy/ to catch requests
- a regular browser (e.g. by hotkeys) or maybe playwright with some adaptions to be undetectable
@lorey Thanks for the fast response!
Since I am executing my "python3 script.py" from external script it seams that my system has loaded wrong python version. I have seen that python3.6 is showing error consistently while python3.9 is working as expected. Hopefully this will help someone...
I was stumbling across all solutions available, and I believe that there is no better one, Selenium cant load Chrome extension that uses chrome.debugger API and I have no luck with hotkeys for now in my complex environment.
@lorey, thanks for your fantastic work, just one more thing.
is there a way that i could get only the response data from a specific url?
Hi,
I use performance_logs instead of logs_raw variable name and skipping "chrome://favicon2" and searching for image_name
performance_logs = driver.get_log("performance")
for performance_log in performance_logs:
performance_log_json = json.loads(performance_log["message"])
if performance_log_json["message"]["method"] == 'Network.responseReceived':
if performance_log_json["message"]["params"]["response"]["url"].find('chrome://favicon2/') != -1:
continue;
if performance_log_json["message"]["params"]["response"]["url"].find(image_name) != -1:
print(performance_log_json["message"]["params"]["response"]["url"])
print(performance_log_json["message"]["params"]["requestId"])
print(performance_log_json["message"]["params"]["type"])
Desired Capabilities is deprecated and can't be used anymore, how can I achieve this without it?
@LiamKrenn take a look at this, I haven't tried but hopefully it works https://stackoverflow.com/questions/76622916/converting-desired-capabilities-to-options-in-selenium-python
Hello guys, for Selenium 4.x use it
driver.options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver.get(url)
then just follow the steps from line 24+
Works for selenium 4.13.0
for Selenium 4.15 set the option:
options = webdriver.ChromeOptions()
options.set_capability(
"goog:loggingPrefs", {"performance": "ALL"}
)
driver = webdriver.Chrome(options=options)
Something I noticed is that you need to filter out Preflight
requests.
if event['params']['type'] != 'Preflight':
. . .
Otherwise, you might get this error:
{"code":-32000,"message":"No resource with given identifier found"}
Hello
Is there a way to use this with selenium Grid ? (remote)
With selnium grid I can catch the request but never the response
options = webdriver.ChromeOptions()
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')
options.add_experimental_option('w3c', True)
# Try to catch XHR response
options.set_capability(
"goog:loggingPrefs", {"performance": "ALL"}
)
driver = webdriver.Remote(
command_executor='http://'+GRID_HOST+'/wd/hub',
options=options,
)
driver.get("https://www.mywebsite.com/")
searchbox = driver.find_element(By.ID, "searchbox")
searchbox.send_keys("type something in the searchbox")
logs_raw = driver.get_log("performance")
for log in filter(log_filter, requests):
request_id = log["params"]["requestId"]
resp_url = log["params"]["response"]["url"]
if 'aj_recherche' in resp_url:
response = driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})
I can then log every request associated to searchbox (ajax)
But never the JSON response (I can find it in the Chrome dev console)
Any idea ?
Hello Is there a way to use this with selenium Grid ? (remote)
With selnium grid I can catch the request but never the response
options = webdriver.ChromeOptions() options.add_argument('--ignore-ssl-errors=yes') options.add_argument('--ignore-certificate-errors') options.add_experimental_option('w3c', True) # Try to catch XHR response options.set_capability( "goog:loggingPrefs", {"performance": "ALL"} ) driver = webdriver.Remote( command_executor='http://'+GRID_HOST+'/wd/hub', options=options, ) driver.get("https://www.mywebsite.com/") searchbox = driver.find_element(By.ID, "searchbox") searchbox.send_keys("type something in the searchbox") logs_raw = driver.get_log("performance") for log in filter(log_filter, requests): request_id = log["params"]["requestId"] resp_url = log["params"]["response"]["url"] if 'aj_recherche' in resp_url: response = driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})
I can then log every request associated to searchbox (ajax) But never the JSON response (I can find it in the Chrome dev console)
Any idea ?
@ofostier Yes! Took me some time to figure out but starting from Selenium 4.16.0 you should replace response = driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})
with:
response = driver.execute(
driver_command="executeCdpCommand",
params={
"cmd": "Network.getResponseBody",
"params": {"requestId": request_id},
},
)
body = response["value"]["body"]
source: get_browser_request_body(driver: WebDriver, request_id: str) answer from Borys Oliinyk.
Also consider what @nathan-fiscaletti mentioned about avoiding error -32000. In my case it happens for responses captured from previous sessions where I cleared session and local storage but was still using te same webdriver Remote()
instance
It's better to listen for the Network.loadingFinished
event before calling driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})
to prevent this issue
im looking to print the response after click button to know the status response of this click if it's successful or failed the only way to know the status its to open dev tool and go to network and check the response manual from here
so i need method to print this status in log