Vibhu Jawa VibhuJawa

🏠

Working from home

Data Scientist at Nvidia / Former CS Grad student at Johns Hopkins University

VibhuJawa / chunksize_experiments.ipynb

Last active August 8, 2019 04:11

This notebook show cases how chunk_sizes impact performance

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

VibhuJawa / collab_py_37.sh

Created July 28, 2019 19:41

Install rapids collab_py_37 to github

	#!/bin/bash

	set -eu

	wget -nc https://github.com/rapidsai/notebooks-extended/raw/master/utils/env-check.py
	echo "Checking for GPU type:"
	python env-check.py

	if [ ! -f Miniconda3-4.5.4-Linux-x86_64.sh ]; then
	echo "Removing conflicting packages, will replace with RAPIDS compatible versions"

VibhuJawa / normalized_count_array.py

Created July 25, 2019 23:59

normalized_count_array

normalized_count_array = count_dary/np.sum(count_dary,axis=1)[:,None]

VibhuJawa / gutenburg_read_tokenize_gv100_run.ipynb

Last active August 26, 2019 18:27

Gutenburg read tokenize gv100_run

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

VibhuJawa / cpu_timing_10_percent.py

Created July 25, 2019 07:29

	## chunksize = 1.9 M, 10% of dataset
	small_df = df.head(1_900_000).copy()
	%time ouptput_df = preprocess_text_df(small_df,filter_regex=filters_regex)

VibhuJawa / cpu_timing_5_percent.py

Created July 25, 2019 07:27

	## chunksize = 950K, 5% of dataset
	small_df = df.head(950_000).copy(deep=True)
	%time ouptput_df = preprocess_text_df(small_df,filter_regex=filters_regex)

VibhuJawa / cpu_timing_1_percent.py

Created July 25, 2019 07:24

	## chunksize = 190k M, 1% of dataset
	small_df = df.head(190_000).copy(deep=True)
	%time ouptput_df = preprocess_text_df(small_df,filter_regex=filters_regex)

VibhuJawa / 1_knn_umap_charles.py

Created July 25, 2019 02:49

	author_id = author_name_ls.index('Charles Dickens')
	for index in output_indices_umap[author_id]:
	print(author_name_ls[int(index)])

VibhuJawa / 1_preprocessing_dask_16_cpu.py

Created July 25, 2019 01:49

	def preprocess_text(input_strs ,filter_regex,stop_words = nltk.corpus.stopwords.words('english')):
	"""
	* filter punctuation
	* to_lower
	* remove stop words (from nltk corpus) (taking the most time)
	* remove multiple space with one
	* remove leading and trailing spaces
	"""

VibhuJawa / 1_knn_umap_charles.py

Created July 25, 2019 00:17

	author_id = author_name_ls.index('Charles Dickens')
	for index in output_indices_umap[author_id]:
	print(author_name_ls[int(index)])