Last active
May 18, 2021 16:51
-
-
Save ivopbernardo/dcd9842a7c2fe41ea15a48d28fadbe4f to your computer and use it in GitHub Desktop.
Examples around NLTK stemming
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from nltk.tokenize import word_tokenize | |
from nltk.stem import PorterStemmer, SnowballStemmer, LancasterStemmer | |
porter = PorterStemmer() | |
snowball = SnowballStemmer(language='english') | |
lanc = LancasterStemmer() | |
sentence_example = ( | |
'This is definitely a controversy as the attorney labeled the case "extremely controversial"' | |
) | |
# Porter Stemmed version of sentence example | |
stemmed_sentence = [ | |
porter.stem(word) for word in word_tokenize(sentence_example) | |
] | |
# Examples of single words used in the post: | |
porter.stem('cats') | |
porter.stem('amazing') | |
porter.stem('amazement') | |
porter.stem('amaze') | |
porter.stem('amazed') | |
porter.stem('amazon') | |
porter.stem('nation') | |
porter.stem('premonition') | |
# Comparison between Porter and Snowball | |
porter.stem('loudly') | |
snowball.stem('loudly') | |
# Comparison between Snowball and Lancaster | |
porter.stem('salty') | |
snowball.stem('salty') | |
# Example of EU_Definition: | |
eu_definition = ''' | |
The European Union (EU) is a political and economic union of 27 member states that are located primarily in Europe. | |
Its members have a combined area of 4,233,255.3 km2 (1,634,469.0 sq mi) and an estimated total population of about 447 million. | |
The EU has developed an internal single market through a standardised system of laws that apply in all member states in those matters, | |
and only those matters, where members have agreed to act as one. EU policies aim to ensure the free movement of people, goods, | |
services and capital within the internal market; enact legislation in justice and home affairs; and maintain common policies on trade, | |
agriculture, fisheries and regional development. Passport controls have been abolished for travel within the Schengen Area. | |
A monetary union was established in 1999, coming into full force in 2002, and is composed of 19 EU member states which use the euro | |
currency. The EU has often been described as a sui generis political entity (without precedent or comparison). | |
The EU and European citizenship were established when the Maastricht Treaty came into force in 1993. | |
The EU traces its origins to the European Coal and Steel Community (ECSC) and the European Economic Community (EEC), established, | |
respectively, by the 1951 Treaty of Paris and 1957 Treaty of Rome. The original members of what came to be known as the European | |
Communities were the Inner Six: Belgium, France, Italy, Luxembourg, the Netherlands, and West Germany. The Communities and their | |
successors have grown in size by the accession of new member states and in power by the addition of policy areas to their remit. | |
The United Kingdom became the first member state to leave the EU on 31 January 2020. Before this, three territories of member states | |
had left the EU or its forerunners. The latest major amendment to the constitutional basis of the EU, the Treaty of Lisbon, | |
came into force in 2009. | |
''' | |
# Tokenizing and Stemming the eu_definition | |
tokenized_eu = word_tokenize(eu_definition) | |
porter_eu = [porter.stem(word) for word in tokenized_eu] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment