Last active
December 2, 2020 01:32
-
-
Save Rishit-dagli/92e4526fd0c5c81c12e3fa12f5d3b2b4 to your computer and use it in GitHub Desktop.
This notebook shows how to begin analyzing language with the Text Analytics REST API and Python. This shows you how to detect language, analyze sentiment, extract key phrases, and identify linked entities.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/Rishit-dagli/92e4526fd0c5c81c12e3fa12f5d3b2b4/text-analytics-azure.ipynb)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Text Analytics Azure\n", | |
"\n", | |
"This notebook shows how to begin analyzing language with the Text Analytics REST API and Python. This shows you how to detect language, analyze sentiment, extract key phrases, and identify linked entities. Feel free to star this gist if this was useful to you!" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Setup\n", | |
"\n", | |
"A key and endpoint for a Text Analytics resource. Azure Cognitive Services are represented by Azure resources that you subscribe to. Create a resource for Text Analytics using the Azure portal or Azure CLI on your local machine. You can also view your resource on the Azure portal. We then create variables for your resource's Azure endpoint and subscription key." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import requests\n", | |
"# pprint is used to format the JSON response\n", | |
"from pprint import pprint" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import os\n", | |
"\n", | |
"subscription_key = \"<YOUR SUBSCRIPTION KEY>\"\n", | |
"endpoint = \"<YOUR ENDPOINT>\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Detect Language\n", | |
"\n", | |
"Append `/text/analytics/v3.0/languages` to the Text Analytics base endpoint to form the language detection URL. The payload to the API consists of a list of documents, which are tuples containing an `id` and a `text` attribute. The `text` attribute stores the text to be analyzed, and the `id` can be any value." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"language_api_url = endpoint + \"/text/analytics/v3.0/languages\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"documents = {\"documents\": [\n", | |
" {\"id\": \"1\", \"text\": \"An example sentence in English.\"},\n", | |
" {\"id\": \"2\", \"text\": \"une phrase d'exemple en français.\"},\n", | |
"]}" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We then use the `requests` library to send the documents to the API. Add your subscription key to the `Ocp-Apim-Subscription-Key` header, and send the request with `requests.post()`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{'documents': [{'detectedLanguage': {'confidenceScore': 1.0,\n", | |
" 'iso6391Name': 'en',\n", | |
" 'name': 'English'},\n", | |
" 'id': '1',\n", | |
" 'warnings': []},\n", | |
" {'detectedLanguage': {'confidenceScore': 1.0,\n", | |
" 'iso6391Name': 'fr',\n", | |
" 'name': 'French'},\n", | |
" 'id': '2',\n", | |
" 'warnings': []}],\n", | |
" 'errors': [],\n", | |
" 'modelVersion': '2020-09-01'}\n" | |
] | |
} | |
], | |
"source": [ | |
"headers = {\"Ocp-Apim-Subscription-Key\": subscription_key}\n", | |
"response = requests.post(language_api_url, \n", | |
" headers = headers,\n", | |
" json = documents)\n", | |
"languages = response.json()\n", | |
"pprint(languages)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Analyze sentiment\n", | |
"\n", | |
"To detect the sentiment (which ranges between positive or negative) of a set of documents, append `/text/analytics/v3.0/sentiment` to the Text Analytics base endpoint to form the language detection URL." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"sentiment_url = endpoint + \"/text/analytics/v3.0/sentiment\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As with the language detection example, create a dictionary with a `documents` key that consists of a list of documents. Each document is a tuple consisting of the `id`, the `text` to be analyzed and the `language` of the text. Here I demo with some real PlayStore reviews I pasted." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"documents = {\"documents\": [\n", | |
" {\"id\": \"1\", \"language\": \"en\",\n", | |
" \"text\": \"This app is getting worse!!!! No notifications for calls, voicenotes rarely work for the person receiving them from me!!\"},\n", | |
" {\"id\": \"2\", \"language\": \"en\",\n", | |
" \"text\": \"The app is very great. Beyond expectation. This is my main messaging app. And so do my friends.\"}\n", | |
"]}" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{'documents': [{'confidenceScores': {'negative': 0.85,\n", | |
" 'neutral': 0.14,\n", | |
" 'positive': 0.01},\n", | |
" 'id': '1',\n", | |
" 'sentences': [{'confidenceScores': {'negative': 1.0,\n", | |
" 'neutral': 0.0,\n", | |
" 'positive': 0.0},\n", | |
" 'length': 29,\n", | |
" 'offset': 0,\n", | |
" 'sentiment': 'negative',\n", | |
" 'text': 'This app is getting worse!!!!'},\n", | |
" {'confidenceScores': {'negative': 0.7,\n", | |
" 'neutral': 0.29,\n", | |
" 'positive': 0.01},\n", | |
" 'length': 89,\n", | |
" 'offset': 30,\n", | |
" 'sentiment': 'negative',\n", | |
" 'text': 'No notifications for calls, voicenotes '\n", | |
" 'rarely work for the person receiving '\n", | |
" 'them from me!'},\n", | |
" {'confidenceScores': {'negative': 0.03,\n", | |
" 'neutral': 0.91,\n", | |
" 'positive': 0.06},\n", | |
" 'length': 1,\n", | |
" 'offset': 119,\n", | |
" 'sentiment': 'neutral',\n", | |
" 'text': '!'}],\n", | |
" 'sentiment': 'negative',\n", | |
" 'warnings': []},\n", | |
" {'confidenceScores': {'negative': 0.0,\n", | |
" 'neutral': 0.0,\n", | |
" 'positive': 1.0},\n", | |
" 'id': '2',\n", | |
" 'sentences': [{'confidenceScores': {'negative': 0.0,\n", | |
" 'neutral': 0.0,\n", | |
" 'positive': 1.0},\n", | |
" 'length': 22,\n", | |
" 'offset': 0,\n", | |
" 'sentiment': 'positive',\n", | |
" 'text': 'The app is very great.'},\n", | |
" {'confidenceScores': {'negative': 0.1,\n", | |
" 'neutral': 0.55,\n", | |
" 'positive': 0.35},\n", | |
" 'length': 19,\n", | |
" 'offset': 23,\n", | |
" 'sentiment': 'neutral',\n", | |
" 'text': 'Beyond expectation.'},\n", | |
" {'confidenceScores': {'negative': 0.05,\n", | |
" 'neutral': 0.93,\n", | |
" 'positive': 0.02},\n", | |
" 'length': 30,\n", | |
" 'offset': 43,\n", | |
" 'sentiment': 'neutral',\n", | |
" 'text': 'This is my main messaging app.'},\n", | |
" {'confidenceScores': {'negative': 0.01,\n", | |
" 'neutral': 0.92,\n", | |
" 'positive': 0.07},\n", | |
" 'length': 21,\n", | |
" 'offset': 74,\n", | |
" 'sentiment': 'neutral',\n", | |
" 'text': 'And so do my friends.'}],\n", | |
" 'sentiment': 'positive',\n", | |
" 'warnings': []}],\n", | |
" 'errors': [],\n", | |
" 'modelVersion': '2020-04-01'}\n" | |
] | |
} | |
], | |
"source": [ | |
"headers = {\"Ocp-Apim-Subscription-Key\": subscription_key}\n", | |
"response = requests.post(sentiment_url, headers=headers, json=documents)\n", | |
"sentiments = response.json()\n", | |
"pprint(sentiments)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Extract key phrases\n", | |
"\n", | |
"To extract the key phrases from a set of documents, append `/text/analytics/v3.0/keyPhrases` to the Text Analytics base endpoint to form the language detection URL." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"keyphrase_url = endpoint + \"/text/analytics/v3.0/keyphrases\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This collection of documents is the same used for the sentiment analysis example." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"documents = {\"documents\": [\n", | |
" {\"id\": \"1\", \"language\": \"en\",\n", | |
" \"text\": \"I really enjoy the new XBox One S. It has a clean look, it has 4K/HDR resolution and it is affordable.\"},\n", | |
" {\"id\": \"2\", \"language\": \"es\",\n", | |
" \"text\": \"Si usted quiere comunicarse con Carlos, usted debe de llamarlo a su telefono movil. Carlos es muy responsable, pero necesita recibir una notificacion si hay algun problema.\"},\n", | |
"]}" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{'documents': [{'id': '1',\n", | |
" 'keyPhrases': ['HDR resolution', 'new XBox', 'clean look'],\n", | |
" 'warnings': []},\n", | |
" {'id': '2',\n", | |
" 'keyPhrases': ['Carlos',\n", | |
" 'notificacion',\n", | |
" 'algun problema',\n", | |
" 'telefono movil'],\n", | |
" 'warnings': []}],\n", | |
" 'errors': [],\n", | |
" 'modelVersion': '2020-07-01'}\n" | |
] | |
} | |
], | |
"source": [ | |
"headers = {\"Ocp-Apim-Subscription-Key\": subscription_key}\n", | |
"response = requests.post(keyphrase_url, headers=headers, json=documents)\n", | |
"key_phrases = response.json()\n", | |
"pprint(key_phrases)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Identify Entities\n", | |
"\n", | |
"To identify well-known entities (people, places, and things) in text documents, append `/text/analytics/v3.0/entities/recognition/general` to the Text Analytics base endpoint to form the language detection URL." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"entities_url = endpoint + \"/text/analytics/v3.0/entities/recognition/general\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Create a collection of documents, like in the previous examples." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"documents = {\"documents\": [\n", | |
" {\"id\": \"1\", \"text\": \"The Westin has really good service. You can call them at 111-111-1111\"}\n", | |
"]}" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{'documents': [{'entities': [{'category': 'Organization',\n", | |
" 'confidenceScore': 0.26,\n", | |
" 'length': 6,\n", | |
" 'offset': 4,\n", | |
" 'text': 'Westin'},\n", | |
" {'category': 'Phone Number',\n", | |
" 'confidenceScore': 0.8,\n", | |
" 'length': 12,\n", | |
" 'offset': 57,\n", | |
" 'text': '111-111-1111'}],\n", | |
" 'id': '1',\n", | |
" 'warnings': []}],\n", | |
" 'errors': [],\n", | |
" 'modelVersion': '2020-04-01'}\n" | |
] | |
} | |
], | |
"source": [ | |
"headers = {\"Ocp-Apim-Subscription-Key\": subscription_key}\n", | |
"response = requests.post(entities_url, headers=headers, json=documents)\n", | |
"entities = response.json()\n", | |
"pprint(entities)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"That is it for this quick demo about Text Analytics with Azure ML APIs. Feel free to star this gist if this was useful to you!" | |
] | |
} | |
], | |
"metadata": { | |
"environment": { | |
"name": "common-cpu.m59", | |
"type": "gcloud", | |
"uri": "gcr.io/deeplearning-platform-release/base-cpu:m59" | |
}, | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.8" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment