Skip to content

Instantly share code, notes, and snippets.

View vi3k6i5's full-sized avatar
πŸ‘¨β€πŸ’»
Learning...

Vikash Singh vi3k6i5

πŸ‘¨β€πŸ’»
Learning...
View GitHub Profile
@vi3k6i5
vi3k6i5 / flashtext_extract_example.py
Created September 15, 2017 18:38
FlashText extract keywords from sentence
# pip install flashtext
from flashtext.keyword import KeywordProcessor
keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('Big Apple', 'New York')
keyword_processor.add_keyword('Bay Area')
keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')
keywords_found
# ['New York', 'Bay Area']
@vi3k6i5
vi3k6i5 / flashtext_regex_timing.ipynb
Last active September 27, 2017 18:07
Time FlashText and Regex for increasing number of keywords
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@vi3k6i5
vi3k6i5 / comparison.md
Last active September 16, 2017 11:35
Comparison results for FlashText vs Regex
Text Length 319065 Keywords Count 47326
FlashText 156 ms per loop
Compiled Regex 19.5 s per loop
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_find_and_replace.ipynb
Created October 3, 2017 07:36
Find and replace FlashText and regex comparison
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction.py
Last active May 28, 2023 21:05
Benchmarking timing performance Keyword Extraction between regex and flashtext
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
import time
def get_word_of_length(str_length):
# generate a random word of given length
return ''.join(random.choice(string.ascii_lowercase) for _ in range(str_length))
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_replace.py
Last active May 28, 2023 19:54
Benchmarking timing performance Keyword Replace between regex and flashtext
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
import time
def get_word_of_length(str_length):
# generate a random word of given length
@vi3k6i5
vi3k6i5 / guided_lda_example.py
Created October 7, 2017 07:57
guidedlda example code
import numpy as np
import guidedlda
X = guidedlda.datasets.load_data(guidedlda.datasets.NYT)
vocab = guidedlda.datasets.load_vocab(guidedlda.datasets.NYT)
word2id = dict((v, idx) for idx, v in enumerate(vocab))
print(X.shape)
print(X.sum())
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction.java
Created October 25, 2017 15:49
Benchmarking timing performance Keyword Extraction using regex in java
// compare the results with FlashText here https://gist.github.com/vi3k6i5/604eefd92866d081cfa19f862224e4a0
import java.util.regex.*;
import java.lang.StringBuilder;
import java.util.*;
public class RegexBenchmark {
public static String getWordOfLength(int length) {
String SALTCHARS = "abcdefghijklmnopqrstuvwxyz1234567890";
StringBuilder salt = new StringBuilder();
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction_regex_module.py
Created October 25, 2017 16:23
Benchmarking timing performance Keyword Extraction between regex (regex module) and flashtext
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import regex
import time
def get_word_of_length(str_length):
# generate a random word of given length
return ''.join(random.choice(string.ascii_lowercase) for _ in range(str_length))
@vi3k6i5
vi3k6i5 / flashtext_vs_cython_automaton_benchmark.py
Created November 14, 2017 16:35
Comparing flashtext with a cython implementation of similar algo
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
from automaton import Automaton
import time
def get_word_of_length(str_length):
# generate a random word of given length