Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Linear regression for ideal Zipf's line | |
| linear = LinearRegression() | |
| linear.fit( | |
| X = np.log(np.array(df['rank'])).reshape(-1, 1), | |
| y = np.log(df['zipf_freq']) | |
| ) | |
| # Print slope and intercept | |
| print('Intercept: {intercept}\nSlope: {slope}'.format( | |
| intercept = linear.intercept_, | |
| slope = linear.coef_[0] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Data viz | |
| plotnine.options.figure_size = (10, 4.8) | |
| ( | |
| ggplot( | |
| data = df | |
| )+ | |
| geom_line( | |
| aes( | |
| x = 'rank', | |
| y = 'zipf_freq', |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Data viz | |
| plotnine.options.figure_size = (10, 4.8) | |
| ( | |
| ggplot( | |
| data = df[:20] | |
| )+ | |
| geom_bar( | |
| aes( | |
| x = 'word', | |
| y = 'actual_freq' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # List of data | |
| l_data = [] | |
| # Highest frequency | |
| max_freq = top_words[0][1] | |
| # Alpha | |
| alpha = 1 | |
| # Loop |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # How to get a list of top words | |
| def getTopWords( | |
| text: str | |
| ): | |
| # Split text by its whitespace | |
| list_words = text.split() | |
| # Count the word frequencies | |
| word_freq = collections.Counter(list_words) | |
| # Get top n words that have highest frequencies | |
| top_words = word_freq.most_common() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Lower text | |
| def lowerCase(text): | |
| return text.lower() | |
| # Numbers removal | |
| def numberRemoval(text): | |
| return re.sub( | |
| pattern = '\d', | |
| repl = ' ', | |
| string = text |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # List of URL | |
| urls = [ | |
| 'https://www.gutenberg.org/files/1661/1661-0.txt', | |
| 'https://www.gutenberg.org/files/2701/2701-0.txt', | |
| 'https://www.gutenberg.org/files/11/11-0.txt', | |
| 'https://www.gutenberg.org/files/98/98-0.txt', | |
| 'https://www.gutenberg.org/files/74/74-0.txt' | |
| ] | |
| # Text |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # HTTP library for Python | |
| import requests | |
| # Regular expression | |
| import re | |
| # Array manipulation | |
| import collections | |
| # Data manipulation |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.