Skip to content

Instantly share code, notes, and snippets.

@clintval
Last active November 18, 2017 06:43
Show Gist options
  • Select an option

  • Save clintval/3c50a515e0e951fa10e4c733f766c33b to your computer and use it in GitHub Desktop.

Select an option

Save clintval/3c50a515e0e951fa10e4c733f766c33b to your computer and use it in GitHub Desktop.
A selenium web driver and a Python virtual display
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-warning\">\n",
" <h1>Parsing a JS-rendered webpage with Python and a virtual web driver</h1>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<bound method Tag.prettify of <html><head>\n",
"<title>Example Domain</title>\n",
"<meta charset=\"utf-8\"/>\n",
"<meta content=\"text/html; charset=utf-8\" http-equiv=\"Content-type\"/>\n",
"<meta content=\"width=device-width, initial-scale=1\" name=\"viewport\"/>\n",
"<style type=\"text/css\">\n",
" body {\n",
" background-color: #f0f0f2;\n",
" margin: 0;\n",
" padding: 0;\n",
" font-family: \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n",
" \n",
" }\n",
" div {\n",
" width: 600px;\n",
" margin: 5em auto;\n",
" padding: 50px;\n",
" background-color: #fff;\n",
" border-radius: 1em;\n",
" }\n",
" a:link, a:visited {\n",
" color: #38488f;\n",
" text-decoration: none;\n",
" }\n",
" @media (max-width: 700px) {\n",
" body {\n",
" background-color: #fff;\n",
" }\n",
" div {\n",
" width: auto;\n",
" margin: 0 auto;\n",
" border-radius: 0;\n",
" padding: 1em;\n",
" }\n",
" }\n",
" </style>\n",
"</head>\n",
"<body>\n",
"<div>\n",
"<h1>Example Domain</h1>\n",
"<p>This domain is established to be used for illustrative examples in documents. You may use this\n",
" domain in examples without prior coordination or asking for permission.</p>\n",
"<p><a href=\"http://www.iana.org/domains/example\">More information...</a></p>\n",
"</div>\n",
"</body></html>>"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from bs4 import BeautifulSoup\n",
"from pyvirtualdisplay import Display\n",
"from selenium import webdriver\n",
"\n",
"URL = 'http://www.example.com/'\n",
"\n",
"with Display(visible=0, size=(800, 600)):\n",
" driver = webdriver.Firefox()\n",
" driver.get(URL)\n",
" source = driver.page_source\n",
" driver.quit()\n",
" \n",
"soup = BeautifulSoup(source, 'html.parser')\n",
"soup.prettify"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment