Last active
November 24, 2022 18:59
-
-
Save ingridstevens/d93eb5746b1656d763e7e076b04ffac1 to your computer and use it in GitHub Desktop.
Translation + Sentiment Analysis
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Import Libraries\n", | |
"import deepl\n", | |
"import pandas as pd" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# import the csv file\n", | |
"df = pd.read_csv('translate-testdata.csv')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Load the API key of(deepL Free account)\n", | |
"auth_key = \"x-x-x-x-x:fx\" \n", | |
"translator = deepl.Translator(auth_key)\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Function that translates all text in the \"Text\" column of df\n", | |
"def translate_text(text):\n", | |
" result = translator.translate_text(text, target_lang=\"EN-US\")\n", | |
" return result.text" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Translate the text" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Apply the translate_text function to the df \n", | |
"df['English'] = df['Text'].apply(translate_text)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# export the df to a csv file\n", | |
"df.to_csv('translate-testdata.csv', index=False)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Clean up so we only have the original text and english translation" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Reduce the dataframe to only the \"English\" column and \"Text\" column\n", | |
"df = df[['Text', 'English']]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Sentiment Analysis \n", | |
"Run sentiment polarity analysis \n", | |
"Run sentiment emotion analysis" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# import a library for sentiment analysis \n", | |
"from textblob import TextBlob\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# apply the sentiment analysis to the \"English\" column\n", | |
"df['Sentiment'] = df['English'].apply(lambda x: TextBlob(x).sentiment.polarity)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"/var/folders/yl/mwd8tygs7p38z57chhqrlkx80000gn/T/ipykernel_66292/2747275817.py:3: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.\n", | |
" df = df.append({'Text': 'Finding the right product was difficult'}, ignore_index=True)\n" | |
] | |
} | |
], | |
"source": [ | |
"# add three rows to the dataframe column \"Text\" with the text \"I love you\", \"I hate you\", and \"I am neutral\"\n", | |
"\n", | |
"df = df.append({'Text': 'Finding the right product was difficult'}, ignore_index=True)\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Now to try sentiment analysis on the pre-labeled dataset \n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"df_test = pd.read_csv('kaggle-test.csv', encoding= 'unicode_escape')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# make the \"text\" column a string \n", | |
"df_test['text'] = df_test['text'].astype(str)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### TextBlob: Polarity & Subjectivity\n", | |
"\n", | |
"The output of TextBlob is polarity and subjectivity. \n", | |
"\n", | |
"*Polarity* score lies between (-1 to 1) where:\n", | |
"* -1 identifies the most negative words such as ‘disgusting’, ‘awful’, ‘pathetic’, \n", | |
"* 1 identifies the most positive words like ‘excellent’, ‘best’. \n", | |
"\n", | |
"*Subjectivity* score lies between (0 and 1), It shows the amount of personal opinion, \n", | |
"* If a sentence has high subjectivity i.e. close to 1, It resembles that the text contains more personal opinion than factual information. \n", | |
"* Conversely, a 0 would indicate a purely factual statement" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Apply sentiment analysis to text\n", | |
"df_test['polarity_textblob'] = df_test['text'].apply(lambda x: TextBlob(x).sentiment.polarity)\n", | |
"df_test['subjectivity_textblob'] = df_test['text'].apply(lambda x: TextBlob(x).sentiment.subjectivity)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Make the df smaller by only keeping the \"text\" and \"IN_Sentiment\" columns\n", | |
"df_test = df_test[['text', 'sentiment', 'polarity_textblob', 'subjectivity_textblob']]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3.10.1 64-bit", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.10.1" | |
}, | |
"orig_nbformat": 4, | |
"vscode": { | |
"interpreter": { | |
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49" | |
} | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment