Abraham Hmiel abehmiel

Former postdoc, current nanopunk. Interested in machine/deep learning, materials science, NLP, data science, python, distributed computing, and XVX.

23 followers · 23 following

Quartet Health
New York, NY
abehmiel.net
@unless_if

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

abehmiel / btm.py

Created March 5, 2018 22:16 — forked from amintos/btm.py

Bi-term Topic Model implementation in pure Python

	"""
	Bi-Term Topic Model (BTM) for very short texts.

	Literature Reference:
	Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng:
	"A biterm topic model for short texts"
	In Proceedings of WWW '13, Rio de Janeiro, Brazil, pp. 1445-1456.
	ACM, DOI: https://doi.org/10.1145/2488388.2488514

	This module requires pre-processing of textual data,

abehmiel / resources.md

Created February 21, 2018 13:36

Some resources I've adapted to do work

http://sujitpal.blogspot.com/2014/10/clustering-section-titles-with.html

abehmiel / install_packages.R

Created January 4, 2018 18:37

Install useful R packages data science

	install.packages(
	c(
	"dplyr", # data manipulation
	"tidyr", # data manipulation
	"rmarkdown", # data presentation
	"knitr", # data presentation
	"RODBC", # database tools
	"RMySQL", # database tools
	"RPostgreSQL", # database tools
	"RSQLite", # database tools

abehmiel / clarify_pos.py

Created December 19, 2017 18:26

Part-of-speech clarifier from nltk

	from nltk import pos_tag
	from nltk.tag import str2tuple

	"""
	Usage:
	dictionary_df['Pos'] = dictionary_df['Word'].apply(pos_maker)
	dictionary_df['Help Definition'] = dictionary_df['Pos'].apply(clarify_pos)
	"""

	def clarify_pos(pos):

abehmiel / gist:e5dd495ca6123fda20ee876d58a6cd8f

Created December 15, 2017 23:18 — forked from rohannog/gist:3861442

Decrypt pdf on command-line

	qpdf --password=passwd --decrypt orig.pdf decrypted.pdf

	#To input the password
	read -s -p "Password: " password && qpdf --password=$password --decrypt orig.pdf decrypted.pdf

abehmiel / regex.md

Created December 11, 2017 21:36 — forked from magicznyleszek/regex.md

RegEx Cheatsheet

Contents:

Special characters
Quantifiers
Special sequences
Useful examples

abehmiel / understanding-word-vectors.ipynb

Created November 19, 2017 03:07 — forked from aparrish/understanding-word-vectors.ipynb

Understanding word vectors: A tutorial for "Reading and Writing Electronic Text," a class I teach at ITP. (Python 2.7) Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

abehmiel / spacy_intro.ipynb

Created November 16, 2017 23:03 — forked from aparrish/spacy_intro.ipynb

NLP Concepts with spaCy. Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

abehmiel / fix_exhibit_b.py

Created November 1, 2017 21:12

Convert tabular pdf data to a csv and also read it as a python dataframe

	# It's really stupid when the gov't releases pdf's of tabular data. So I made a quick, hacky script to
	# fix their mistakes for them. (I'm referring to https://t.co/oOyhHNVvjS )

	# requirements:
	# pandas
	# tabula-py

	import pandas as pd
	from tabula import read_pdf

abehmiel / figure_formatting.py

Created October 31, 2017 21:29 — forked from corbett/figure_formatting.py

Create beautiful square figures with big labels and the correct number of ticks

	def create_figure(size=3.6,nxticks=6):
	import matplotlib
	from matplotlib.ticker import MaxNLocator
	figure=matplotlib.pyplot.figure(figsize=(size,size))
	ax = figure.add_subplot(1, 1, 1, position = [0.2, 0.15, 0.75, 0.75])
	ax.xaxis.set_major_locator(MaxNLocator(nxticks))
	return ax

	def format_axes(ax,xf='%d',yf='%d',nxticks=6,nyticks=6,labelsize=10):
	import pylab

NewerOlder