Skip to content

Instantly share code, notes, and snippets.

View HAKSOAT's full-sized avatar
🏗️
Building information retrieval systems...

Habeeb Shopeju HAKSOAT

🏗️
Building information retrieval systems...
View GitHub Profile
"ace in the hole"
"across the board"
"against the grain"
"airs and graces"
"airy fairy"
"albatross around (his|her|their) neck"
"all bark and no bite"
"all ears"
"all hat and no cattle"
"all piss and wind"
"10 Downing Street"
"11 Downing Street"
"15 minutes of fame"
"1600 Pennsylvania Avenue"
"23 Skidoo Street"
"a (bridge too far|cold day in July|day late and a dollar short|good deal|great deal|hundred and ten percent|life of its own|little bird told me|little bit of bread and no cheese|little from column A, a little from column B|notch above|pound to a penny|Roland for an Oliver|short drop and a sudden stop|week from next Tuesday|week is a long time in politics|wild goose never laid a tame egg|woman without a man is like a fish without a bicycle)"
"above (one's bend|the curve|the law|the salt)"
"absence makes the heart grow fonder"
"abuse of distress"
"accident (of birth|waiting to happen)"
"10 Downing Street"
"11 Downing Street"
"15 minutes of fame"
"1600 Pennsylvania Avenue"
"23 Skidoo Street"
"a bridge too far"
"a cold day in July"
"a day late and a dollar short"
"a good deal"
"a great deal"
@HAKSOAT
HAKSOAT / sample.py
Created April 12, 2020 18:29
AI6 - Functions - Class 1
def generate_multiplications(multipicand):
if not (isinstance(multipicand, int) or isinstance(multipicand, float)):
return None
multiplications = []
for multiplier in range(1, 13):
answer = multipicand * multiplier
multiplications.append(answer)
return multiplications
{{about|the programming language||Python (disambiguation)}}
{{Short description|General-purpose, high-level programming language}}
{{Use dmy dates |date=August 2015}}
{{Infobox programming language
| logo = Python logo and wordmark.svg
| logo size = 250px
| paradigm = [[Multi-paradigm programming language|Multi-paradigm]]: [[functional programming|functional]], [[imperative programming|imperative]], [[object-oriented programming|object-oriented]], [[structured programming|structured]], [[reflective programming|reflective]]
| released = {{start date and age|1990}}<ref name=guttag />
| designer = [[Guido van Rossum]]
| developer = [[Python Software Foundation]]
def decorate_text(text):
decoration = "\n\n**********{}**********\n\n"
decorated_text = decoration.format(text)
return decorated_text
# Unpacking of values
def generate_multiplications_1(multiplicand, start, stop):
multiplications = []
for multiplier in range(start, stop + 1):
{
"tokens": [
{
"token": "As",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
@HAKSOAT
HAKSOAT / Regexes
Created April 24, 2020 14:35
Regexes for the Tokenizer
Python's regex
(?P<comment_start><!--)|(?P<comment_end>-->)|(?P<url>((bitcoin|geo|magnet|mailto|news|sips?|tel|urn)\:|((|ftp|ftps|git|gopher|https?|ircs?|mms|nntp|redis|sftp|ssh|svn|telnet|worldwind|xmpp)\:)?\/\/)[^\s/$.?#].[^\s]*)|(?P<entity>&[a-z][a-z0-9]*;)|(?P<cjk>[\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FCC\u3400-\u4DFF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\uF900-\uFAFF\U0002F800-\U0002FA1F\u3041-\u3096\u30A0-\u30FF\u3400-\u4DB5\u4E00-\u9FCB\uF900-\uFA6A\u2E80-\u2FD5\uFF5F-\uFF9F\u31F0-\u31FF\u3220-\u3243\u3280-\u337F])|(?P<ref_open><ref\b[^>/]*>)|(?P<ref_close></ref\b[^>]*>)|(?P<ref_singleton><ref\b[^>/]*/>)|(?P<tag></?([a-z][a-z0-9]*)\b[^>]*>)|(?P<number>[\d]+)|(?P<japan_punct>[\u3000-\u303F])|(?P<danda>।|॥)|(?P<bold>''')|(?P<italic>'')|(?P<word>([^\W\d]|[\u0901-\u0963\u0601-\u061A\u061C-\u0669\u06D5-\u06EF\u0980-\u09FF])[\w\u0901-\u0963\u0601-\u061A\u061C-\u0669\u06D5-\u06EF\u0980-\u0
# Creates an index for the tokenizer
import requests
param = (('v', ''),)
data = r"""{
"settings": {
"index.analyze.max_token_count" : 1000000,
"analysis": {
"analyzer": {
# Extracts the text used for the performance test
import time
import mwapi
session = mwapi.Session("https://en.wikipedia.org")
doc = session.get(action='query', prop='revisions', rvprop='content', titles='Alan Turing', formatversion=2)
text = doc['query']['pages'][0]['revisions'][0]['content']
# Functions for tokenization
import json