andymithamclarke · September 21, 2022 06:02
diff --git a/Collecting-Tweets.ipynb b/Collecting-Tweets.ipynb
 {
  "cells": [
    {
      "cell_type": "markdown",
      "source": "# Twitter Search API 🔎💬",
      "metadata": {
        "id": "D8dyDeluNACR",
        "cell_id": "00001-41dd5298-8cc4-498c-948f-58809891a2b0",
        "deepnote_cell_type": "markdown"
      }
    },
    {
      "cell_type": "markdown",
      "source": "This is a Graphext notebook made in conjunction <a href=\"https://www.graphext.com/docs/collecting-twitter-data\" target=\"_blank\">our guide on collecting data from Twitter</a>.\n\nThe notebook uses <a href=\"http://docs.tweepy.org/en/v3.5.0/getting_started.html\" target=\"_blank\">Tweepy</a> to get data from the Twitter API. \n\nWithin the notebook, you can set a search query using the  <a href=\"https://gist.github.com/andyclarkemedia/3b4e062a45323138bd28ec52d80eb7b1\" target=\"_blank\">Twitter query language</a> to return specific tweets. The results returned will be tweets matching your query that were posted within the last week.",
      "metadata": {
        "id": "ryqNyAz_NACW",
        "cell_id": "00002-43d00c08-e941-436b-bf2b-b0f2f24f6629",
        "deepnote_cell_type": "markdown"
      }
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "WHW_T20-NACY",
        "cell_id": "00009-7c45fad0-3fba-49c6-b03d-8d19f2c7c34a",
        "deepnote_to_be_reexecuted": false,
        "source_hash": "b623e53d",
        "execution_millis": 3,
        "output_cleared": false,
        "execution_start": 1615294745705,
        "deepnote_cell_type": "code"
      },
      "source": "# Install 'tweepy' package\n!pip install tweepy\n# Import 'tweepy' package\nimport tweepy",
      "execution_count": 12,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": "## Setup ⚙️",
      "metadata": {
        "tags": [],
        "cell_id": "00004-b47e35ec-92c2-49aa-88da-612638ded490",
        "deepnote_cell_type": "markdown"
      }
    },
    {
      "cell_type": "markdown",
      "source": "The following steps configure `tweepy` to make API calls to Twitter. \n\nYou need to provide your own Twitter API keys; `api_key` and `api_secret`. Since we are using _App Authentication_ you **don't need** the `access_token` and `access_token_secret` that is usually required for retrieving data from the Twitter API with _User Authentication_ .\n\nYou must sign up for a <a href=\"https://developer.twitter.com/en/apply-for-access\" target=\"_blank\">Twitter developers account </a> in order to get an `api_key` and an `api_secret`. \n\nFor details on how to do this, <a href=\"https://www.graphext.com/help-center-articles/collecting-twitter-data\" target=\"_blank\">follow our guide</a>.",
      "metadata": {
        "tags": [],
        "cell_id": "00005-89269c3e-6309-4019-8118-0e238f28cd19",
        "deepnote_cell_type": "markdown"
      }
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nB_94U7YNACY",
        "cell_id": "00011-4d117785-1691-41c2-ba13-b6bf56519ff1",
        "deepnote_to_be_reexecuted": false,
        "source_hash": "6e4c8aea",
        "execution_millis": 0,
        "execution_start": 1615294277266,
        "deepnote_cell_type": "code"
      },
      "source": "# Set your API keys - This information is yours alone and should be kept private.\nimport os\napi_key = ' '\napi_secret = ' '",
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "P6WYJbZYNACZ",
        "cell_id": "00012-fe0e4989-9f36-43e3-879b-176c347bdb5e",
        "deepnote_to_be_reexecuted": false,
        "source_hash": "4fca230f",
        "execution_millis": 109,
        "execution_start": 1615294279858,
        "deepnote_cell_type": "code"
      },
      "source": "# Authorize your api key and api secret \nauth = tweepy.AppAuthHandler(api_key, api_secret)\n# Call the API  \napi = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True) ",
      "execution_count": 3,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": "## Getting Tweets Using a Search Query 🐦",
      "metadata": {
        "id": "Xd-LuzRNNACZ",
        "cell_id": "00016-081150c6-422b-4ff1-b619-dfa9326931eb",
        "deepnote_cell_type": "markdown"
      }
    },
    {
      "cell_type": "markdown",
      "source": "Next, you will search the Twitter API for tweets using the <a href=\"https://gist.github.com/andyclarkemedia/3b4e062a45323138bd28ec52d80eb7b1\" target=\"_blank\">Twitter query language</a>.\n\nThe tweets matching your query will be saved in a dictionary using the 'ID' of the tweet as a key.\n\nThe information returned from your query is as follows. The text inside of `< ... >` tells Graphext the variable type of values in this field.\n\n\n>`id<gx:number>`: The ID of the tweet used as the key for each dictionary entry.\n\n>`text<gx:text>`: The text of the tweet.\n\n>`created_at<gx:date>`: The date the tweet was posted.\n\n>`author_id<gx:number>`: The ID of the user that posted the tweet.\n\n>`author_name<gx:category>`: The name of the user holding the account that posted the tweet.\n\n>`author_handler<gx:category>`: The handle of the account that posted the tweet.\n\n>`author_user_agent<gx:category>`: The type of device used to post the tweet.\n\n>`user_description<gx:text>`: The description of the account that posted the tweet.\n\n>`user_location<gx:text>`: The location, if provided, of the user that posted the tweet.\n\n>`author_avatar<gx:url>`: The image link of the profile picture used by the account that posted the tweet.\n\n>`user_followers_count<gx:number>`: The number of followers the account that posted the tweet has.\n\n>`user_created_at<gx:date>`: The creation date of the account that posted the tweet.\n\n>`user_following_count<gx:number>`: The number of accounts that the account posting the tweet follows.\n\n>`user_verified<gx:boolean>`: A `True` or `False` value denoting whether the account posting the tweet is a verified account.\n\n>`lang<gx:category>`: The language of the tweet.\n\n>`tweet_hashtags<gx:list[category]>`: The hashtags used inside of the tweet.\n\n>`tweet_symbols<gx:list[category]>`: The emojis or symbols used inside of the tweets.\n\n>`mention_names<gx:list[category]>`: The handles of accounts mentioned inside of the tweet.\n\n>`mention_ids<gx:list[category]>`: The ids of the accounts mentioned inside of the tweet.\n\n>`n_retweets<gx:number>`: The number of retweets received by the tweet.\n\n>`n_favorites<gx:number>`: The number of times the tweet was favorited by other users.\n\n>`is_retweet<gx:boolean>`: A `True` or `False` value denoting whether the tweet is a retweet or not.\n\n>`original_tweet_user_handle<gx:category>`: If `is_retweet` is `True`, this value provides the handle of the user that originally posted the tweet.\n\n>`original_tweet_id<gx:number>`: If `is_retweet` is `True`, this value provides the id of the original tweet.\n",
      "metadata": {
        "id": "xU6eHoUVNACa",
        "cell_id": "00018-500919c2-eede-4269-a0bf-fa26c79558ef",
        "deepnote_cell_type": "markdown"
      }
    },
    {
      "cell_type": "markdown",
      "source": "##### Check out the <a href=\"https://gist.github.com/andyclarkemedia/3b4e062a45323138bd28ec52d80eb7b1\" target=\"_blank\">Twitter query language</a> to learn how to format a search query using Twitter's advanced search operators.",
      "metadata": {
        "tags": [],
        "cell_id": "00010-93106e85-d0d5-47d5-aa6a-705fd134af91",
        "deepnote_cell_type": "markdown"
      }
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "QD-UAaUwNACa",
        "cell_id": "00022-d43469b3-9d38-47d7-b4b6-45254fbe12a6",
        "deepnote_to_be_reexecuted": false,
        "source_hash": "cd1292af",
        "execution_millis": 0,
        "execution_start": 1615294331058,
        "deepnote_cell_type": "code"
      },
      "source": "# Declare your search query 🔎👇\nsearch_query = \"#Euro2020\"",
      "execution_count": 8,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "iONOtMzTBqSA",
        "cell_id": "00023-cb490c00-2136-4788-bcf4-97e8f0852670",
        "deepnote_to_be_reexecuted": false,
        "source_hash": "61b0bd2e",
        "execution_millis": 2,
        "execution_start": 1615294332723,
        "deepnote_cell_type": "code"
      },
      "source": "# Max Tweet ID - allows you to pause and resume tweet collection from the tweet you last collected\n# Each time you run a NEW collection process using a different query - Run this command to reset the ID of the tweet you want to start collecting from\nmax_id = None",
      "execution_count": 9,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "HSMeNcb6NACb",
        "cell_id": "00024-5429b43e-0ca7-4f09-a0b2-4851f33ee803",
        "deepnote_to_be_reexecuted": false,
        "source_hash": "572c10a4",
        "execution_millis": 472,
        "execution_start": 1615294334550,
        "deepnote_cell_type": "code"
      },
      "source": "# Make the Twitter API request\n# The progress bar will indicate how many tweets you have collected\n\n# Note that this process might take some time.\n# Additionally, the loop will pause each time you have exceeeded Twitter's rate limit for collecting tweets.\n# It will resume progress automatically once enough time has passed to begin collecting tweets again.\n\n# Declare a dictionary to save incoming tweets\ntweets = {}\n\ntry:\n\n  # Use the 'lang' parameter to set a specific language for your query. Eg. 'es' for Spanish\n  parameters = {\n      \"q\": search_query,\n      \"count\":100, \n      \"lang\": \"\",\n      \"max_id\": max_id\n  }\n  \n  # Enter a number inside the '.items()' brackets to limit the size of your results.\n  # Rate limits: 100 max per request, user auth 180 requests per 15 min or 450 per app auth. (We are using app auth)\n  # 45K tweets per 15 min. Delete items if you want to obtain all of them  \n  # https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets\n  \n  tweet_requests = tweepy.Cursor(api.search, **parameters).items(10)\n\n  print(tweet_requests)\n\n  for tweet in tweet_requests:\n      \n      # Add new tweet to dictionary\n      tweets[tweet.id] = {\n          'text<gx:text>': tweet.text,\n          'created_at<gx:date>': tweet.created_at,\n          'author_id<gx:number>': tweet.user.id,\n          'author_name<gx:category>': tweet.user.name,\n          'author_handler<gx:category>': str(tweet.user.screen_name),\n          'author_user_agent<gx:category>': tweet.source,\n          'user_description<gx:text>': tweet.user.description,\n          'user_location<gx:text>': tweet.user.location,\n          'author_avatar<gx:url>': tweet.user.profile_image_url,\n          'user_followers_count<gx:number>': tweet.user.followers_count,\n          'user_created_at<gx:date>': tweet.user.created_at,\n          'user_following_count<gx:number>': tweet.user.friends_count,\n          'user_verified<gx:boolean>': tweet.user.verified,\n          'lang<gx:category>': tweet.lang,\n          'tweet_hashtags<gx:list[category]>': tweet.entities['hashtags'],\n          'tweet_symbols<gx:list[category]>': tweet.entities['symbols'],\n          'mention_names<gx:list[category]>': [\"@\" + d['screen_name'] for d in tweet.entities['user_mentions'] if 'screen_name' in d],\n          'mention_ids<gx:list[number]>': [d['id'] for d in tweet.entities['user_mentions'] if 'id' in d],\n          'n_retweets<gx:number>': tweet.retweet_count,\n          'n_favorites<gx:number>': tweet.favorite_count,\n          'is_retweet<gx:boolean>': hasattr(tweet, 'retweeted_status')\n      }\n\n      if tweets[tweet.id]['is_retweet<gx:boolean>']:\n          tweets[tweet.id]['original_tweet_user_handle<gx:category>'] = tweet.retweeted_status.user.screen_name\n          tweets[tweet.id]['original_tweet_id<gx:number>'] = str(tweet.retweeted_status.id)\n\n      \n      # Set the latest tweet ID as the tweet to resume from\n      max_id = tweet.id\n      \n\n# Catches any error\nexcept Exception as e:\n  print(\"Something went wrong. Run the command again to continue from the ID of the tweet you last collected. You can also download the {} tweets you've collected so far by runnning the rest of the notebook. \\n \\n The error message:\".format(len(tweets)), e)\n\n# When process completes \nelse:\n  print(\"\\nHurray! You've collected all tweets matching your query. You have {} tweets ready to export. Now run the rest of the notebook to download your data.\".format(len(tweets)))",
      "execution_count": 10,
      "outputs": [
        {
          "name": "stdout",
          "text": "<tweepy.cursor.ItemIterator object at 0x7ff0263467d0>\n\nHurray! You've collected all tweets matching your query. You have 10 tweets ready to export. Now run the rest of the notebook to download your data.\n",
          "output_type": "stream"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": "## Exporting the Data ⬇️",
      "metadata": {
        "id": "KwQD2DkvNACb",
        "cell_id": "00026-5a6d79eb-308e-463a-b414-bce8525ecca1",
        "deepnote_cell_type": "markdown"
      }
    },
    {
      "cell_type": "markdown",
      "source": "After making the request you will now export the data inside your dictionary to a csv file. First, load the dictionary into a dataframe. Then, export the dataframe as a csv file.",
      "metadata": {
        "id": "sQNIF8F-NACb",
        "cell_id": "00027-1f1c8ea3-491b-44e2-afa4-ed33a6e8745a",
        "deepnote_cell_type": "markdown"
      }
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "GTAuc9UiNACc",
        "cell_id": "00028-32b52cca-4910-42b7-90c3-69c108fdea47",
        "deepnote_to_be_reexecuted": false,
        "source_hash": "c4c9c7d3",
        "execution_millis": 0,
        "execution_start": 1615294736142,
        "deepnote_cell_type": "code"
      },
      "source": "# Import pandas\nimport pandas as pd\n\n# Save dictionary as a dataframe\ndf = pd.DataFrame.from_dict(tweets, orient='index')\n\n# Convert ID index to 'id' column\ndf['id<gx:number>'] = df.index\n\n# Export as csv\ndf.to_csv(search_query + ' '+ 'tweets.csv', index=False)\n\n# Inspect your dataset\ndf",
      "execution_count": 12,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "tRw5ke9jBftS",
        "cell_id": "00031-aba5660a-2d82-417a-ba26-9035d08a7b9d",
        "deepnote_to_be_reexecuted": false,
        "source_hash": null,
        "execution_millis": 371,
        "execution_start": 1615223932570,
        "output_cleared": true,
        "deepnote_cell_type": "code"
      },
      "source": "",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": "<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=f47e8d8d-e9a0-451f-b929-ba54b6b81628' target=\"_blank\">\n<img style='display:inline;max-height:16px;margin:0px;margin-right:7.5px;' src='data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPHN2ZyB3aWR0aD0iODBweCIgaGVpZ2h0PSI4MHB4IiB2aWV3Qm94PSIwIDAgODAgODAiIHZlcnNpb249IjEuMSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayI+CiAgICA8IS0tIEdlbmVyYXRvcjogU2tldGNoIDU0LjEgKDc2NDkwKSAtIGh0dHBzOi8vc2tldGNoYXBwLmNvbSAtLT4KICAgIDx0aXRsZT5Hcm91cCAzPC90aXRsZT4KICAgIDxkZXNjPkNyZWF0ZWQgd2l0aCBTa2V0Y2guPC9kZXNjPgogICAgPGcgaWQ9IkxhbmRpbmciIHN0cm9rZT0ibm9uZSIgc3Ryb2tlLXdpZHRoPSIxIiBmaWxsPSJub25lIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPgogICAgICAgIDxnIGlkPSJBcnRib2FyZCIgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoLTEyMzUuMDAwMDAwLCAtNzkuMDAwMDAwKSI+CiAgICAgICAgICAgIDxnIGlkPSJHcm91cC0zIiB0cmFuc2Zvcm09InRyYW5zbGF0ZSgxMjM1LjAwMDAwMCwgNzkuMDAwMDAwKSI+CiAgICAgICAgICAgICAgICA8cG9seWdvbiBpZD0iUGF0aC0yMCIgZmlsbD0iIzAyNjVCNCIgcG9pbnRzPSIyLjM3NjIzNzYyIDgwIDM4LjA0NzY2NjcgODAgNTcuODIxNzgyMiA3My44MDU3NTkyIDU3LjgyMTc4MjIgMzIuNzU5MjczOSAzOS4xNDAyMjc4IDMxLjY4MzE2ODMiPjwvcG9seWdvbj4KICAgICAgICAgICAgICAgIDxwYXRoIGQ9Ik0zNS4wMDc3MTgsODAgQzQyLjkwNjIwMDcsNzYuNDU0OTM1OCA0Ny41NjQ5MTY3LDcxLjU0MjI2NzEgNDguOTgzODY2LDY1LjI2MTk5MzkgQzUxLjExMjI4OTksNTUuODQxNTg0MiA0MS42NzcxNzk1LDQ5LjIxMjIyODQgMjUuNjIzOTg0Niw0OS4yMTIyMjg0IEMyNS40ODQ5Mjg5LDQ5LjEyNjg0NDggMjkuODI2MTI5Niw0My4yODM4MjQ4IDM4LjY0NzU4NjksMzEuNjgzMTY4MyBMNzIuODcxMjg3MSwzMi41NTQ0MjUgTDY1LjI4MDk3Myw2Ny42NzYzNDIxIEw1MS4xMTIyODk5LDc3LjM3NjE0NCBMMzUuMDA3NzE4LDgwIFoiIGlkPSJQYXRoLTIyIiBmaWxsPSIjMDAyODY4Ij48L3BhdGg+CiAgICAgICAgICAgICAgICA8cGF0aCBkPSJNMCwzNy43MzA0NDA1IEwyNy4xMTQ1MzcsMC4yNTcxMTE0MzYgQzYyLjM3MTUxMjMsLTEuOTkwNzE3MDEgODAsMTAuNTAwMzkyNyA4MCwzNy43MzA0NDA1IEM4MCw2NC45NjA0ODgyIDY0Ljc3NjUwMzgsNzkuMDUwMzQxNCAzNC4zMjk1MTEzLDgwIEM0Ny4wNTUzNDg5LDc3LjU2NzA4MDggNTMuNDE4MjY3Nyw3MC4zMTM2MTAzIDUzLjQxODI2NzcsNTguMjM5NTg4NSBDNTMuNDE4MjY3Nyw0MC4xMjg1NTU3IDM2LjMwMzk1NDQsMzcuNzMwNDQwNSAyNS4yMjc0MTcsMzcuNzMwNDQwNSBDMTcuODQzMDU4NiwzNy43MzA0NDA1IDkuNDMzOTE5NjYsMzcuNzMwNDQwNSAwLDM3LjczMDQ0MDUgWiIgaWQ9IlBhdGgtMTkiIGZpbGw9IiMzNzkzRUYiPjwvcGF0aD4KICAgICAgICAgICAgPC9nPgogICAgICAgIDwvZz4KICAgIDwvZz4KPC9zdmc+' > </img>\nCreated in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>",
      "metadata": {
        "tags": [],
        "created_in_deepnote_cell": true,
        "deepnote_cell_type": "markdown"
      }
    }
  ],
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.3"
    },
    "colab": {
      "name": "Collecting-Twitter-Data.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "deepnote_notebook_id": "5b9519ee-f1f8-494f-b3fd-dc66a6e975f5",
    "deepnote": {},
    "deepnote_execution_queue": []
  }
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"source": "# Twitter Search API 🔎💬",
	"metadata": {
	"id": "D8dyDeluNACR",
	"cell_id": "00001-41dd5298-8cc4-498c-948f-58809891a2b0",
	"deepnote_cell_type": "markdown"
	}
	},
	{
	"cell_type": "markdown",
	"source": "This is a Graphext notebook made in conjunction <a href=\"https://www.graphext.com/docs/collecting-twitter-data\" target=\"_blank\">our guide on collecting data from Twitter</a>.\n\nThe notebook uses <a href=\"http://docs.tweepy.org/en/v3.5.0/getting_started.html\" target=\"_blank\">Tweepy</a> to get data from the Twitter API. \n\nWithin the notebook, you can set a search query using the <a href=\"https://gist.github.com/andyclarkemedia/3b4e062a45323138bd28ec52d80eb7b1\" target=\"_blank\">Twitter query language</a> to return specific tweets. The results returned will be tweets matching your query that were posted within the last week.",
	"metadata": {
	"id": "ryqNyAz_NACW",
	"cell_id": "00002-43d00c08-e941-436b-bf2b-b0f2f24f6629",
	"deepnote_cell_type": "markdown"
	}
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "WHW_T20-NACY",
	"cell_id": "00009-7c45fad0-3fba-49c6-b03d-8d19f2c7c34a",
	"deepnote_to_be_reexecuted": false,
	"source_hash": "b623e53d",
	"execution_millis": 3,
	"output_cleared": false,
	"execution_start": 1615294745705,
	"deepnote_cell_type": "code"
	},
	"source": "# Install 'tweepy' package\n!pip install tweepy\n# Import 'tweepy' package\nimport tweepy",
	"execution_count": 12,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": "## Setup ⚙️",
	"metadata": {
	"tags": [],
	"cell_id": "00004-b47e35ec-92c2-49aa-88da-612638ded490",
	"deepnote_cell_type": "markdown"
	}
	},
	{
	"cell_type": "markdown",
	"source": "The following steps configure `tweepy` to make API calls to Twitter. \n\nYou need to provide your own Twitter API keys; `api_key` and `api_secret`. Since we are using _App Authentication_ you don't need the `access_token` and `access_token_secret` that is usually required for retrieving data from the Twitter API with _User Authentication_ .\n\nYou must sign up for a <a href=\"https://developer.twitter.com/en/apply-for-access\" target=\"_blank\">Twitter developers account </a> in order to get an `api_key` and an `api_secret`. \n\nFor details on how to do this, <a href=\"https://www.graphext.com/help-center-articles/collecting-twitter-data\" target=\"_blank\">follow our guide</a>.",
	"metadata": {
	"tags": [],
	"cell_id": "00005-89269c3e-6309-4019-8118-0e238f28cd19",
	"deepnote_cell_type": "markdown"
	}
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "nB_94U7YNACY",
	"cell_id": "00011-4d117785-1691-41c2-ba13-b6bf56519ff1",
	"deepnote_to_be_reexecuted": false,
	"source_hash": "6e4c8aea",
	"execution_millis": 0,
	"execution_start": 1615294277266,
	"deepnote_cell_type": "code"
	},
	"source": "# Set your API keys - This information is yours alone and should be kept private.\nimport os\napi_key = ' '\napi_secret = ' '",
	"execution_count": 2,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "P6WYJbZYNACZ",
	"cell_id": "00012-fe0e4989-9f36-43e3-879b-176c347bdb5e",
	"deepnote_to_be_reexecuted": false,
	"source_hash": "4fca230f",
	"execution_millis": 109,
	"execution_start": 1615294279858,
	"deepnote_cell_type": "code"
	},
	"source": "# Authorize your api key and api secret \nauth = tweepy.AppAuthHandler(api_key, api_secret)\n# Call the API \napi = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True) ",
	"execution_count": 3,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": "## Getting Tweets Using a Search Query 🐦",
	"metadata": {
	"id": "Xd-LuzRNNACZ",
	"cell_id": "00016-081150c6-422b-4ff1-b619-dfa9326931eb",
	"deepnote_cell_type": "markdown"
	}
	},
	{
	"cell_type": "markdown",
	"source": "Next, you will search the Twitter API for tweets using the <a href=\"https://gist.github.com/andyclarkemedia/3b4e062a45323138bd28ec52d80eb7b1\" target=\"_blank\">Twitter query language</a>.\n\nThe tweets matching your query will be saved in a dictionary using the 'ID' of the tweet as a key.\n\nThe information returned from your query is as follows. The text inside of `< ... >` tells Graphext the variable type of values in this field.\n\n\n>`id<gx:number>`: The ID of the tweet used as the key for each dictionary entry.\n\n>`text<gx:text>`: The text of the tweet.\n\n>`created_at<gx:date>`: The date the tweet was posted.\n\n>`author_id<gx:number>`: The ID of the user that posted the tweet.\n\n>`author_name<gx:category>`: The name of the user holding the account that posted the tweet.\n\n>`author_handler<gx:category>`: The handle of the account that posted the tweet.\n\n>`author_user_agent<gx:category>`: The type of device used to post the tweet.\n\n>`user_description<gx:text>`: The description of the account that posted the tweet.\n\n>`user_location<gx:text>`: The location, if provided, of the user that posted the tweet.\n\n>`author_avatar<gx:url>`: The image link of the profile picture used by the account that posted the tweet.\n\n>`user_followers_count<gx:number>`: The number of followers the account that posted the tweet has.\n\n>`user_created_at<gx:date>`: The creation date of the account that posted the tweet.\n\n>`user_following_count<gx:number>`: The number of accounts that the account posting the tweet follows.\n\n>`user_verified<gx:boolean>`: A `True` or `False` value denoting whether the account posting the tweet is a verified account.\n\n>`lang<gx:category>`: The language of the tweet.\n\n>`tweet_hashtags<gx:list[category]>`: The hashtags used inside of the tweet.\n\n>`tweet_symbols<gx:list[category]>`: The emojis or symbols used inside of the tweets.\n\n>`mention_names<gx:list[category]>`: The handles of accounts mentioned inside of the tweet.\n\n>`mention_ids<gx:list[category]>`: The ids of the accounts mentioned inside of the tweet.\n\n>`n_retweets<gx:number>`: The number of retweets received by the tweet.\n\n>`n_favorites<gx:number>`: The number of times the tweet was favorited by other users.\n\n>`is_retweet<gx:boolean>`: A `True` or `False` value denoting whether the tweet is a retweet or not.\n\n>`original_tweet_user_handle<gx:category>`: If `is_retweet` is `True`, this value provides the handle of the user that originally posted the tweet.\n\n>`original_tweet_id<gx:number>`: If `is_retweet` is `True`, this value provides the id of the original tweet.\n",
	"metadata": {
	"id": "xU6eHoUVNACa",
	"cell_id": "00018-500919c2-eede-4269-a0bf-fa26c79558ef",
	"deepnote_cell_type": "markdown"
	}
	},
	{
	"cell_type": "markdown",
	"source": "##### Check out the <a href=\"https://gist.github.com/andyclarkemedia/3b4e062a45323138bd28ec52d80eb7b1\" target=\"_blank\">Twitter query language</a> to learn how to format a search query using Twitter's advanced search operators.",
	"metadata": {
	"tags": [],
	"cell_id": "00010-93106e85-d0d5-47d5-aa6a-705fd134af91",
	"deepnote_cell_type": "markdown"
	}
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "QD-UAaUwNACa",
	"cell_id": "00022-d43469b3-9d38-47d7-b4b6-45254fbe12a6",
	"deepnote_to_be_reexecuted": false,
	"source_hash": "cd1292af",
	"execution_millis": 0,
	"execution_start": 1615294331058,
	"deepnote_cell_type": "code"
	},
	"source": "# Declare your search query 🔎👇\nsearch_query = \"#Euro2020\"",
	"execution_count": 8,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "iONOtMzTBqSA",
	"cell_id": "00023-cb490c00-2136-4788-bcf4-97e8f0852670",
	"deepnote_to_be_reexecuted": false,
	"source_hash": "61b0bd2e",
	"execution_millis": 2,
	"execution_start": 1615294332723,
	"deepnote_cell_type": "code"
	},
	"source": "# Max Tweet ID - allows you to pause and resume tweet collection from the tweet you last collected\n# Each time you run a NEW collection process using a different query - Run this command to reset the ID of the tweet you want to start collecting from\nmax_id = None",
	"execution_count": 9,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "HSMeNcb6NACb",
	"cell_id": "00024-5429b43e-0ca7-4f09-a0b2-4851f33ee803",
	"deepnote_to_be_reexecuted": false,
	"source_hash": "572c10a4",
	"execution_millis": 472,
	"execution_start": 1615294334550,
	"deepnote_cell_type": "code"
	},
	"source": "# Make the Twitter API request\n# The progress bar will indicate how many tweets you have collected\n\n# Note that this process might take some time.\n# Additionally, the loop will pause each time you have exceeeded Twitter's rate limit for collecting tweets.\n# It will resume progress automatically once enough time has passed to begin collecting tweets again.\n\n# Declare a dictionary to save incoming tweets\ntweets = {}\n\ntry:\n\n # Use the 'lang' parameter to set a specific language for your query. Eg. 'es' for Spanish\n parameters = {\n \"q\": search_query,\n \"count\":100, \n \"lang\": \"\",\n \"max_id\": max_id\n }\n \n # Enter a number inside the '.items()' brackets to limit the size of your results.\n # Rate limits: 100 max per request, user auth 180 requests per 15 min or 450 per app auth. (We are using app auth)\n # 45K tweets per 15 min. Delete items if you want to obtain all of them \n # https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets\n \n tweet_requests = tweepy.Cursor(api.search, **parameters).items(10)\n\n print(tweet_requests)\n\n for tweet in tweet_requests:\n \n # Add new tweet to dictionary\n tweets[tweet.id] = {\n 'text<gx:text>': tweet.text,\n 'created_at<gx:date>': tweet.created_at,\n 'author_id<gx:number>': tweet.user.id,\n 'author_name<gx:category>': tweet.user.name,\n 'author_handler<gx:category>': str(tweet.user.screen_name),\n 'author_user_agent<gx:category>': tweet.source,\n 'user_description<gx:text>': tweet.user.description,\n 'user_location<gx:text>': tweet.user.location,\n 'author_avatar<gx:url>': tweet.user.profile_image_url,\n 'user_followers_count<gx:number>': tweet.user.followers_count,\n 'user_created_at<gx:date>': tweet.user.created_at,\n 'user_following_count<gx:number>': tweet.user.friends_count,\n 'user_verified<gx:boolean>': tweet.user.verified,\n 'lang<gx:category>': tweet.lang,\n 'tweet_hashtags<gx:list[category]>': tweet.entities['hashtags'],\n 'tweet_symbols<gx:list[category]>': tweet.entities['symbols'],\n 'mention_names<gx:list[category]>': [\"@\" + d['screen_name'] for d in tweet.entities['user_mentions'] if 'screen_name' in d],\n 'mention_ids<gx:list[number]>': [d['id'] for d in tweet.entities['user_mentions'] if 'id' in d],\n 'n_retweets<gx:number>': tweet.retweet_count,\n 'n_favorites<gx:number>': tweet.favorite_count,\n 'is_retweet<gx:boolean>': hasattr(tweet, 'retweeted_status')\n }\n\n if tweets[tweet.id]['is_retweet<gx:boolean>']:\n tweets[tweet.id]['original_tweet_user_handle<gx:category>'] = tweet.retweeted_status.user.screen_name\n tweets[tweet.id]['original_tweet_id<gx:number>'] = str(tweet.retweeted_status.id)\n\n \n # Set the latest tweet ID as the tweet to resume from\n max_id = tweet.id\n \n\n# Catches any error\nexcept Exception as e:\n print(\"Something went wrong. Run the command again to continue from the ID of the tweet you last collected. You can also download the {} tweets you've collected so far by runnning the rest of the notebook. \\n \\n The error message:\".format(len(tweets)), e)\n\n# When process completes \nelse:\n print(\"\\nHurray! You've collected all tweets matching your query. You have {} tweets ready to export. Now run the rest of the notebook to download your data.\".format(len(tweets)))",
	"execution_count": 10,
	"outputs": [
	{
	"name": "stdout",
	"text": "<tweepy.cursor.ItemIterator object at 0x7ff0263467d0>\n\nHurray! You've collected all tweets matching your query. You have 10 tweets ready to export. Now run the rest of the notebook to download your data.\n",
	"output_type": "stream"
	}
	]
	},
	{
	"cell_type": "markdown",
	"source": "## Exporting the Data ⬇️",
	"metadata": {
	"id": "KwQD2DkvNACb",
	"cell_id": "00026-5a6d79eb-308e-463a-b414-bce8525ecca1",
	"deepnote_cell_type": "markdown"
	}
	},
	{
	"cell_type": "markdown",
	"source": "After making the request you will now export the data inside your dictionary to a csv file. First, load the dictionary into a dataframe. Then, export the dataframe as a csv file.",
	"metadata": {
	"id": "sQNIF8F-NACb",
	"cell_id": "00027-1f1c8ea3-491b-44e2-afa4-ed33a6e8745a",
	"deepnote_cell_type": "markdown"
	}
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "GTAuc9UiNACc",
	"cell_id": "00028-32b52cca-4910-42b7-90c3-69c108fdea47",
	"deepnote_to_be_reexecuted": false,
	"source_hash": "c4c9c7d3",
	"execution_millis": 0,
	"execution_start": 1615294736142,
	"deepnote_cell_type": "code"
	},
	"source": "# Import pandas\nimport pandas as pd\n\n# Save dictionary as a dataframe\ndf = pd.DataFrame.from_dict(tweets, orient='index')\n\n# Convert ID index to 'id' column\ndf['id<gx:number>'] = df.index\n\n# Export as csv\ndf.to_csv(search_query + ' '+ 'tweets.csv', index=False)\n\n# Inspect your dataset\ndf",
	"execution_count": 12,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "tRw5ke9jBftS",
	"cell_id": "00031-aba5660a-2d82-417a-ba26-9035d08a7b9d",
	"deepnote_to_be_reexecuted": false,
	"source_hash": null,
	"execution_millis": 371,
	"execution_start": 1615223932570,
	"output_cleared": true,
	"deepnote_cell_type": "code"
	},
	"source": "",
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": "<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=f47e8d8d-e9a0-451f-b929-ba54b6b81628' target=\"_blank\">\n<img style='display:inline;max-height:16px;margin:0px;margin-right:7.5px;' src='data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPHN2ZyB3aWR0aD0iODBweCIgaGVpZ2h0PSI4MHB4IiB2aWV3Qm94PSIwIDAgODAgODAiIHZlcnNpb249IjEuMSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayI+CiAgICA8IS0tIEdlbmVyYXRvcjogU2tldGNoIDU0LjEgKDc2NDkwKSAtIGh0dHBzOi8vc2tldGNoYXBwLmNvbSAtLT4KICAgIDx0aXRsZT5Hcm91cCAzPC90aXRsZT4KICAgIDxkZXNjPkNyZWF0ZWQgd2l0aCBTa2V0Y2guPC9kZXNjPgogICAgPGcgaWQ9IkxhbmRpbmciIHN0cm9rZT0ibm9uZSIgc3Ryb2tlLXdpZHRoPSIxIiBmaWxsPSJub25lIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPgogICAgICAgIDxnIGlkPSJBcnRib2FyZCIgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoLTEyMzUuMDAwMDAwLCAtNzkuMDAwMDAwKSI+CiAgICAgICAgICAgIDxnIGlkPSJHcm91cC0zIiB0cmFuc2Zvcm09InRyYW5zbGF0ZSgxMjM1LjAwMDAwMCwgNzkuMDAwMDAwKSI+CiAgICAgICAgICAgICAgICA8cG9seWdvbiBpZD0iUGF0aC0yMCIgZmlsbD0iIzAyNjVCNCIgcG9pbnRzPSIyLjM3NjIzNzYyIDgwIDM4LjA0NzY2NjcgODAgNTcuODIxNzgyMiA3My44MDU3NTkyIDU3LjgyMTc4MjIgMzIuNzU5MjczOSAzOS4xNDAyMjc4IDMxLjY4MzE2ODMiPjwvcG9seWdvbj4KICAgICAgICAgICAgICAgIDxwYXRoIGQ9Ik0zNS4wMDc3MTgsODAgQzQyLjkwNjIwMDcsNzYuNDU0OTM1OCA0Ny41NjQ5MTY3LDcxLjU0MjI2NzEgNDguOTgzODY2LDY1LjI2MTk5MzkgQzUxLjExMjI4OTksNTUuODQxNTg0MiA0MS42NzcxNzk1LDQ5LjIxMjIyODQgMjUuNjIzOTg0Niw0OS4yMTIyMjg0IEMyNS40ODQ5Mjg5LDQ5LjEyNjg0NDggMjkuODI2MTI5Niw0My4yODM4MjQ4IDM4LjY0NzU4NjksMzEuNjgzMTY4MyBMNzIuODcxMjg3MSwzMi41NTQ0MjUgTDY1LjI4MDk3Myw2Ny42NzYzNDIxIEw1MS4xMTIyODk5LDc3LjM3NjE0NCBMMzUuMDA3NzE4LDgwIFoiIGlkPSJQYXRoLTIyIiBmaWxsPSIjMDAyODY4Ij48L3BhdGg+CiAgICAgICAgICAgICAgICA8cGF0aCBkPSJNMCwzNy43MzA0NDA1IEwyNy4xMTQ1MzcsMC4yNTcxMTE0MzYgQzYyLjM3MTUxMjMsLTEuOTkwNzE3MDEgODAsMTAuNTAwMzkyNyA4MCwzNy43MzA0NDA1IEM4MCw2NC45NjA0ODgyIDY0Ljc3NjUwMzgsNzkuMDUwMzQxNCAzNC4zMjk1MTEzLDgwIEM0Ny4wNTUzNDg5LDc3LjU2NzA4MDggNTMuNDE4MjY3Nyw3MC4zMTM2MTAzIDUzLjQxODI2NzcsNTguMjM5NTg4NSBDNTMuNDE4MjY3Nyw0MC4xMjg1NTU3IDM2LjMwMzk1NDQsMzcuNzMwNDQwNSAyNS4yMjc0MTcsMzcuNzMwNDQwNSBDMTcuODQzMDU4NiwzNy43MzA0NDA1IDkuNDMzOTE5NjYsMzcuNzMwNDQwNSAwLDM3LjczMDQ0MDUgWiIgaWQ9IlBhdGgtMTkiIGZpbGw9IiMzNzkzRUYiPjwvcGF0aD4KICAgICAgICAgICAgPC9nPgogICAgICAgIDwvZz4KICAgIDwvZz4KPC9zdmc+' > </img>\nCreated in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>",
	"metadata": {
	"tags": [],
	"created_in_deepnote_cell": true,
	"deepnote_cell_type": "markdown"
	}
	}
	],
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.3"
	},
	"colab": {
	"name": "Collecting-Twitter-Data.ipynb",
	"provenance": [],
	"collapsed_sections": []
	},
	"deepnote_notebook_id": "5b9519ee-f1f8-494f-b3fd-dc66a6e975f5",
	"deepnote": {},
	"deepnote_execution_queue": []
	}
	}