Skip to content

Instantly share code, notes, and snippets.

@andymithamclarke
Last active February 2, 2022 15:05
Show Gist options
  • Save andymithamclarke/3cf67b67197c41053469488dbbd16824 to your computer and use it in GitHub Desktop.
Save andymithamclarke/3cf67b67197c41053469488dbbd16824 to your computer and use it in GitHub Desktop.
Reads a list of search queries as generated by Google Adwords Keyword Planner and performs a search for each query, returning the top 10 URLs for each query.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"source": "## 📟 Returning Top Ranking URLs from a List of Google Search Queries\n\nThis file reads a list of search queries as generated by <a href=\"https://ads.google.com/intl/en_uk/home/tools/keyword-planner/\" target=\"_blank\">Google Adwords Keyword Planner</a> and performs a search for each query, returning the top 10 URLs for each query. \n\nThe result is a dataset where each row represents a single URL.\n\n**Please duplicate this file before running it**\n ",
"metadata": {
"cell_id": "45c7b125-798d-46b4-85a0-164f447a6b8a",
"tags": [],
"deepnote_cell_type": "markdown"
}
},
{
"cell_type": "markdown",
"source": "### ⤴️ Module Imports ",
"metadata": {
"cell_id": "d409db02-c6ea-46c7-8fe3-57abf05b5441",
"tags": [],
"deepnote_cell_type": "markdown"
}
},
{
"cell_type": "code",
"metadata": {
"cell_id": "97038e2b-a6c4-437a-a3c9-1943b174a428",
"tags": [],
"deepnote_to_be_reexecuted": false,
"source_hash": "14851504",
"execution_start": 1643811678442,
"execution_millis": 8966,
"output_cleared": false,
"deepnote_cell_type": "code"
},
"source": "!pip install googlesearch-python",
"outputs": [
{
"name": "stdout",
"text": "",
"output_type": "stream"
}
],
"execution_count": null
},
{
"cell_type": "code",
"metadata": {
"tags": [],
"cell_id": "1f6667b6-c753-4367-b40e-7a170eb9419d",
"allow_embed": false,
"deepnote_to_be_reexecuted": false,
"source_hash": "7230bdf1",
"execution_start": 1643811687425,
"execution_millis": 172,
"deepnote_output_heights": [
194
],
"deepnote_cell_type": "code"
},
"source": "# Module Imports\n\nfrom googlesearch import search\nimport pandas as pd\nimport numpy as np\nfrom helpers import explode\nimport time\nfrom tqdm import tqdm\ntqdm.pandas()",
"outputs": [],
"execution_count": null
},
{
"cell_type": "code",
"metadata": {
"cell_id": "53bd93cc-9488-4040-802d-c1427b8a90ff",
"tags": [],
"deepnote_to_be_reexecuted": false,
"source_hash": "4ed0057b",
"execution_start": 1643811687645,
"execution_millis": 45,
"deepnote_table_state": {
"pageSize": 10,
"pageIndex": 0,
"filters": [],
"sortBy": []
},
"deepnote_table_loading": false,
"allow_embed": "code_output",
"output_cleared": false,
"deepnote_cell_type": "code"
},
"source": "# Load & Inspect Google Adwords Keyword Planner Dataset\n\ndataset_name = \"KeywordPlannerDataset\"\nqueries = pd.read_csv(dataset_name + '.csv')\nqueries.head()",
"outputs": [
{
"output_type": "execute_result",
"execution_count": 3,
"data": {
"application/vnd.deepnote.dataframe.v3+json": {
"column_count": 6,
"row_count": 5,
"columns": [
{
"name": "Keyword",
"dtype": "object",
"stats": {
"unique_count": 5,
"nan_count": 0,
"categories": [
{
"name": "keyword optimization",
"count": 1
},
{
"name": "moz seo guide",
"count": 1
},
{
"name": "3 others",
"count": 3
}
]
}
},
{
"name": "Currency",
"dtype": "object",
"stats": {
"unique_count": 1,
"nan_count": 0,
"categories": [
{
"name": "GBP",
"count": 5
}
]
}
},
{
"name": "Avg. monthly searches",
"dtype": "float64",
"stats": {
"unique_count": 2,
"nan_count": 0,
"min": "50.0",
"max": "500.0",
"histogram": [
{
"bin_start": 50,
"bin_end": 95,
"count": 3
},
{
"bin_start": 95,
"bin_end": 140,
"count": 0
},
{
"bin_start": 140,
"bin_end": 185,
"count": 0
},
{
"bin_start": 185,
"bin_end": 230,
"count": 0
},
{
"bin_start": 230,
"bin_end": 275,
"count": 0
},
{
"bin_start": 275,
"bin_end": 320,
"count": 0
},
{
"bin_start": 320,
"bin_end": 365,
"count": 0
},
{
"bin_start": 365,
"bin_end": 410,
"count": 0
},
{
"bin_start": 410,
"bin_end": 455,
"count": 0
},
{
"bin_start": 455,
"bin_end": 500,
"count": 2
}
]
}
},
{
"name": "Three month change",
"dtype": "object",
"stats": {
"unique_count": 2,
"nan_count": 0,
"categories": [
{
"name": "0%",
"count": 4
},
{
"name": "-90%",
"count": 1
}
]
}
},
{
"name": "YoY change",
"dtype": "object",
"stats": {
"unique_count": 3,
"nan_count": 0,
"categories": [
{
"name": "0%",
"count": 3
},
{
"name": "900%",
"count": 1
},
{
"name": "-90%",
"count": 1
}
]
}
},
{
"name": "Competition",
"dtype": "object",
"stats": {
"unique_count": 1,
"nan_count": 0,
"categories": [
{
"name": "Low",
"count": 5
}
]
}
},
{
"name": "_deepnote_index_column",
"dtype": "int64"
}
],
"rows": [
{
"Keyword": "keyword optimization",
"Currency": "GBP",
"Avg. monthly searches": 500,
"Three month change": "0%",
"YoY change": "900%",
"Competition": "Low",
"_deepnote_index_column": 0
},
{
"Keyword": "moz seo guide",
"Currency": "GBP",
"Avg. monthly searches": 500,
"Three month change": "-90%",
"YoY change": "-90%",
"Competition": "Low",
"_deepnote_index_column": 1
},
{
"Keyword": "keyword optimization tool",
"Currency": "GBP",
"Avg. monthly searches": 50,
"Three month change": "0%",
"YoY change": "0%",
"Competition": "Low",
"_deepnote_index_column": 2
},
{
"Keyword": "best seo tips",
"Currency": "GBP",
"Avg. monthly searches": 50,
"Three month change": "0%",
"YoY change": "0%",
"Competition": "Low",
"_deepnote_index_column": 3
},
{
"Keyword": "local seo keyword research",
"Currency": "GBP",
"Avg. monthly searches": 50,
"Three month change": "0%",
"YoY change": "0%",
"Competition": "Low",
"_deepnote_index_column": 4
}
]
},
"text/plain": " Keyword Currency Avg. monthly searches \\\n0 keyword optimization GBP 500.0 \n1 moz seo guide GBP 500.0 \n2 keyword optimization tool GBP 50.0 \n3 best seo tips GBP 50.0 \n4 local seo keyword research GBP 50.0 \n\n Three month change YoY change Competition \n0 0% 900% Low \n1 -90% -90% Low \n2 0% 0% Low \n3 0% 0% Low \n4 0% 0% Low ",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Keyword</th>\n <th>Currency</th>\n <th>Avg. monthly searches</th>\n <th>Three month change</th>\n <th>YoY change</th>\n <th>Competition</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>keyword optimization</td>\n <td>GBP</td>\n <td>500.0</td>\n <td>0%</td>\n <td>900%</td>\n <td>Low</td>\n </tr>\n <tr>\n <th>1</th>\n <td>moz seo guide</td>\n <td>GBP</td>\n <td>500.0</td>\n <td>-90%</td>\n <td>-90%</td>\n <td>Low</td>\n </tr>\n <tr>\n <th>2</th>\n <td>keyword optimization tool</td>\n <td>GBP</td>\n <td>50.0</td>\n <td>0%</td>\n <td>0%</td>\n <td>Low</td>\n </tr>\n <tr>\n <th>3</th>\n <td>best seo tips</td>\n <td>GBP</td>\n <td>50.0</td>\n <td>0%</td>\n <td>0%</td>\n <td>Low</td>\n </tr>\n <tr>\n <th>4</th>\n <td>local seo keyword research</td>\n <td>GBP</td>\n <td>50.0</td>\n <td>0%</td>\n <td>0%</td>\n <td>Low</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
],
"execution_count": null
},
{
"cell_type": "markdown",
"source": "### 🌐 Perform Search",
"metadata": {
"cell_id": "2f42c9f6-c8da-4be9-a8df-c8531ad03b82",
"tags": [],
"deepnote_cell_type": "markdown"
}
},
{
"cell_type": "code",
"metadata": {
"cell_id": "f3e263c3-1e9c-4ba1-8164-594ffc130dcd",
"tags": [],
"deepnote_to_be_reexecuted": false,
"source_hash": "4c996b0",
"execution_start": 1643811687691,
"execution_millis": 106089,
"allow_embed": "code_output",
"output_cleared": false,
"deepnote_cell_type": "code"
},
"source": "# Define function to perform Google search and return 10 top-ranking URLs\n\ndef perform_search(x):\n result = [j for j in search(x, num_results=10)]\n return result\n\n\n# Perform the search\n\nqueries['URLs'] = queries['Keyword'].progress_apply(perform_search)",
"outputs": [
{
"name": "stderr",
"text": "100%|██████████| 137/137 [01:46<00:00, 1.29it/s]\n",
"output_type": "stream"
}
],
"execution_count": null
},
{
"cell_type": "markdown",
"source": "### 🧮 Transforming the Dataset",
"metadata": {
"cell_id": "204f596c-28da-4be5-9b85-a1892d1f5930",
"tags": [],
"deepnote_cell_type": "markdown"
}
},
{
"cell_type": "code",
"metadata": {
"cell_id": "65a168a3-b31a-4438-bcf2-ceaee0f3e970",
"tags": [],
"deepnote_to_be_reexecuted": false,
"source_hash": "1834e46c",
"execution_start": 1643811793741,
"execution_millis": 4,
"deepnote_output_heights": [
79
],
"deepnote_cell_type": "code"
},
"source": "# Add column of lists [1 ... 10] to track URL ranking index\n\nqueries['Ranking'] = queries['URLs'].apply(lambda x: list(range(1, len(x)+1)))",
"outputs": [],
"execution_count": null
},
{
"cell_type": "code",
"metadata": {
"cell_id": "7afa0bc2-669b-4f79-b4dd-4f687e7f7ed4",
"tags": [],
"deepnote_to_be_reexecuted": false,
"source_hash": "f45082e8",
"execution_start": 1643811793750,
"execution_millis": 89,
"allow_embed": "code_output",
"output_cleared": false,
"deepnote_cell_type": "code"
},
"source": "# Explode Queries Dataframe to Create Domain Level Dataframe (One URL per row)\n\ndomain_df = explode(queries, ['URLs', 'Ranking'], fill_value='', preserve_index=True)\n\n# Remove & reorder columns for clarity\ndomain_df = domain_df[['URLs', 'Ranking', 'Keyword', 'Avg. monthly searches', 'Competition']]\n\n# Inspect exploded dataset\n\ndomain_df.head()",
"outputs": [
{
"output_type": "execute_result",
"execution_count": 6,
"data": {
"application/vnd.deepnote.dataframe.v3+json": {
"column_count": 5,
"row_count": 5,
"columns": [
{
"name": "URLs",
"dtype": "object",
"stats": {
"unique_count": 5,
"nan_count": 0,
"categories": [
{
"name": "https://www.wordstream.com/blog/ws/2010/04/14/keyword-optimization",
"count": 1
},
{
"name": "https://www.semrush.com/blog/keyword-optimization/",
"count": 1
},
{
"name": "3 others",
"count": 3
}
]
}
},
{
"name": "Ranking",
"dtype": "int64",
"stats": {
"unique_count": 5,
"nan_count": 0,
"min": "1",
"max": "5",
"histogram": [
{
"bin_start": 1,
"bin_end": 1.4,
"count": 1
},
{
"bin_start": 1.4,
"bin_end": 1.8,
"count": 0
},
{
"bin_start": 1.8,
"bin_end": 2.2,
"count": 1
},
{
"bin_start": 2.2,
"bin_end": 2.6,
"count": 0
},
{
"bin_start": 2.6,
"bin_end": 3,
"count": 0
},
{
"bin_start": 3,
"bin_end": 3.4000000000000004,
"count": 1
},
{
"bin_start": 3.4000000000000004,
"bin_end": 3.8000000000000003,
"count": 0
},
{
"bin_start": 3.8000000000000003,
"bin_end": 4.2,
"count": 1
},
{
"bin_start": 4.2,
"bin_end": 4.6,
"count": 0
},
{
"bin_start": 4.6,
"bin_end": 5,
"count": 1
}
]
}
},
{
"name": "Keyword",
"dtype": "object",
"stats": {
"unique_count": 1,
"nan_count": 0,
"categories": [
{
"name": "keyword optimization",
"count": 5
}
]
}
},
{
"name": "Avg. monthly searches",
"dtype": "float64",
"stats": {
"unique_count": 1,
"nan_count": 0,
"min": "500.0",
"max": "500.0",
"histogram": [
{
"bin_start": 499.5,
"bin_end": 499.6,
"count": 0
},
{
"bin_start": 499.6,
"bin_end": 499.7,
"count": 0
},
{
"bin_start": 499.7,
"bin_end": 499.8,
"count": 0
},
{
"bin_start": 499.8,
"bin_end": 499.9,
"count": 0
},
{
"bin_start": 499.9,
"bin_end": 500,
"count": 0
},
{
"bin_start": 500,
"bin_end": 500.1,
"count": 5
},
{
"bin_start": 500.1,
"bin_end": 500.2,
"count": 0
},
{
"bin_start": 500.2,
"bin_end": 500.3,
"count": 0
},
{
"bin_start": 500.3,
"bin_end": 500.4,
"count": 0
},
{
"bin_start": 500.4,
"bin_end": 500.5,
"count": 0
}
]
}
},
{
"name": "Competition",
"dtype": "object",
"stats": {
"unique_count": 1,
"nan_count": 0,
"categories": [
{
"name": "Low",
"count": 5
}
]
}
},
{
"name": "_deepnote_index_column",
"dtype": "int64"
}
],
"rows": [
{
"URLs": "https://www.wordstream.com/blog/ws/2010/04/14/keyword-optimization",
"Ranking": 1,
"Keyword": "keyword optimization",
"Avg. monthly searches": 500,
"Competition": "Low",
"_deepnote_index_column": 0
},
{
"URLs": "https://www.semrush.com/blog/keyword-optimization/",
"Ranking": 2,
"Keyword": "keyword optimization",
"Avg. monthly searches": 500,
"Competition": "Low",
"_deepnote_index_column": 0
},
{
"URLs": "https://blog.alexa.com/keyword-optimization/",
"Ranking": 3,
"Keyword": "keyword optimization",
"Avg. monthly searches": 500,
"Competition": "Low",
"_deepnote_index_column": 0
},
{
"URLs": "https://developers.google.com/search/docs/beginner/seo-starter-guide",
"Ranking": 4,
"Keyword": "keyword optimization",
"Avg. monthly searches": 500,
"Competition": "Low",
"_deepnote_index_column": 0
},
{
"URLs": "/search?q=keyword+optimization&num=11&hl=en&tbm=isch&source=iu&ictx=1&vet=1&fir=LMyQKVcestFYKM%252C…",
"Ranking": 5,
"Keyword": "keyword optimization",
"Avg. monthly searches": 500,
"Competition": "Low",
"_deepnote_index_column": 0
}
]
},
"text/plain": " URLs Ranking \\\n0 https://www.wordstream.com/blog/ws/2010/04/14/... 1 \n0 https://www.semrush.com/blog/keyword-optimizat... 2 \n0 https://blog.alexa.com/keyword-optimization/ 3 \n0 https://developers.google.com/search/docs/begi... 4 \n0 /search?q=keyword+optimization&num=11&hl=en&tb... 5 \n\n Keyword Avg. monthly searches Competition \n0 keyword optimization 500.0 Low \n0 keyword optimization 500.0 Low \n0 keyword optimization 500.0 Low \n0 keyword optimization 500.0 Low \n0 keyword optimization 500.0 Low ",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>URLs</th>\n <th>Ranking</th>\n <th>Keyword</th>\n <th>Avg. monthly searches</th>\n <th>Competition</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>https://www.wordstream.com/blog/ws/2010/04/14/...</td>\n <td>1</td>\n <td>keyword optimization</td>\n <td>500.0</td>\n <td>Low</td>\n </tr>\n <tr>\n <th>0</th>\n <td>https://www.semrush.com/blog/keyword-optimizat...</td>\n <td>2</td>\n <td>keyword optimization</td>\n <td>500.0</td>\n <td>Low</td>\n </tr>\n <tr>\n <th>0</th>\n <td>https://blog.alexa.com/keyword-optimization/</td>\n <td>3</td>\n <td>keyword optimization</td>\n <td>500.0</td>\n <td>Low</td>\n </tr>\n <tr>\n <th>0</th>\n <td>https://developers.google.com/search/docs/begi...</td>\n <td>4</td>\n <td>keyword optimization</td>\n <td>500.0</td>\n <td>Low</td>\n </tr>\n <tr>\n <th>0</th>\n <td>/search?q=keyword+optimization&amp;num=11&amp;hl=en&amp;tb...</td>\n <td>5</td>\n <td>keyword optimization</td>\n <td>500.0</td>\n <td>Low</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
],
"execution_count": null
},
{
"cell_type": "markdown",
"source": "### ⤵️ Export the Data",
"metadata": {
"cell_id": "64b920eb-fa12-42e2-9c68-316a7451e4a2",
"tags": [],
"deepnote_cell_type": "markdown"
}
},
{
"cell_type": "code",
"metadata": {
"cell_id": "c6d37cc9-ed3b-4c69-a173-7b338d5f5f25",
"tags": [],
"deepnote_to_be_reexecuted": false,
"source_hash": "7f2c2d0e",
"execution_start": 1643811793841,
"execution_millis": 13,
"deepnote_cell_type": "code"
},
"source": "# Export the final dataframe\n\ndomain_df.to_csv(dataset_name + '-Export' + '.csv', index=None)",
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"source": "",
"metadata": {
"tags": [],
"created_in_deepnote_cell": true,
"deepnote_cell_type": "markdown"
}
}
],
"nbformat": 4,
"nbformat_minor": 2,
"metadata": {
"orig_nbformat": 2,
"deepnote": {
"is_reactive": false
},
"deepnote_notebook_id": "78a50467-1c24-4cec-b383-11002f330f7c",
"deepnote_execution_queue": []
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment