🏠

Working from home

Vibhu Jawa VibhuJawa

🏠

Working from home

Data Scientist at Nvidia / Former CS Grad student at Johns Hopkins University

35 followers · 21 following

Nvidia
Santa Clara

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

VibhuJawa / xgboost_working_example.ipynb

Created September 23, 2019 21:29

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

VibhuJawa / non_persisted_df_xgboost.ipynb

Last active September 23, 2019 18:40

This gist shows the error i get when trying to train a non perisited df with dask

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

VibhuJawa / persisted_df_xgboost_error.ipynb

Created September 23, 2019 18:33

This gist shows the error i get with persisted DF

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

VibhuJawa / dask_cudf_shuffle_problems.ipynb

Created September 21, 2019 00:04

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

VibhuJawa / possible_memory_leak_xgboost.ipynb

Created August 24, 2019 01:58

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

VibhuJawa / dask_run_non_persisited.ipynb

Created August 15, 2019 00:58

I can complee dask persisted run as i get a CommClosedError.

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

VibhuJawa / dask_run_persisited.ipynb

Last active August 15, 2019 00:51

Here, i am trying to train a persisted dataframe using xgboost , the `cudf-interoperabilty` cudf-interop branch but I cant send persisted dataframes for training.

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

VibhuJawa / gv100_10M_parse.py

Last active August 9, 2019 21:00

	print("Length of df = {:,}".format(len(gdf)))
	%time cleaned_df = extract_columns_without_regex(gdf)

VibhuJawa / cpu_short_regex.py

Created August 9, 2019 20:33

	def extract_columns_small_regex_pd(df):
	p1 = """\\[haproxy@([0-9.])\\]\\s\\S([A-Z][\\S\\s]) ([\\S])\\[([0-9])\\]:([\\S\\s])"""
	df1 = df['logline'].str.extract(p1)
	temp_cols = cols[:4]
	temp_cols.append("suffix")
	df1.columns = temp_cols

	p2 = """\\s([0-9.]):([0-9])\\s\\[([\\S])\\]([\\S\\s])"""
	extract_p2_df = df1['suffix'].str.extract(p2)
	df2 = pd.concat([df1,extract_p2_df ], axis=1)

VibhuJawa / pandas_extract_columns_without_regex.py

Last active August 9, 2019 20:18

	def extract_columns_custom_pd(df):

	### added expland=True for split
	### changed drop_column to drop
	clean_df = df['logline'].str.split(' ',expand=True)

	# log_ip column
	clean_df['log_ip'] = clean_df[0].str.lstrip('[haproxy@').str.rstrip(']')

	clean_df.drop(columns = [0],inplace=True)

Newer Older