Félix Revert FelixChop

Data scientist & Product manager

FelixChop / element.html

Last active June 1, 2018 09:35

element.html

	<table class="table table-hover table-bordered" id="tableID" style="margin-bottom: 10px;">
	</table>

FelixChop / scrape.py

Created June 1, 2018 09:37

	import requests
	url = "http://bank-code.net/country/FRANCE-%28FR%29/"
	page = requests.get(url)

FelixChop / scrape.py

Created June 1, 2018 09:38

	import bs4
	soup = bs4.BeautifulSoup(page.content, 'lxml')
	table = soup.find(name='table', attrs={'id':'tableID'})

FelixChop / scrape.py

Created June 1, 2018 09:39

result = pd.DataFrame([[td.text for td in row.findAll('td')] for row in table.tbody.findAll('tr')])

FelixChop / element.html

Created June 1, 2018 09:39

<a href="//bank-code.net/country/FRANCE-%28FR%29/15" data-ci-pagination-page="2" rel="next">&gt;</a>

FelixChop / scrape.py

Created June 1, 2018 09:40

"http:" + soup.find('a', attrs={'rel':'next'}).get('href')

FelixChop / scrape.py

Created June 1, 2018 09:41

	import os, bs4, requests
	import pandas as pd

	PATH = os.path.join("C:\\","Users","xxx","Documents","py") # you need to change to your local path
	res = pd.DataFrame()
	url = "http://bank-code.net/country/FRANCE-%28FR%29/"
	counter = 0

	def table_to_df(table):
	return pd.DataFrame([[td.text for td in row.findAll('td')] for row in table.tbody.findAll('tr')])

FelixChop / feature_importance.py

Last active June 1, 2018 09:47

	from sklearn.ensemble import RandomForestClassifier # from xgboost import XGBClassifier
	model = RandomForestClassifier() # XGBClassifier()
	model.fit(X, y)
	pd.DataFrame({'Variable':X.columns,
	'Importance':model.feature_importances_}).sort_values('Importance', ascending=False)

FelixChop / first_10_predictions_class1.py

Created June 1, 2018 09:48

	df = X_test.copy()
	df['predictions'] = rf_model.predict_proba(X_test)
	data_to_analyze = df.sort_values('predictions', ascending=False).head(10)

FelixChop / first_10_predictions_class1.py

Created June 1, 2018 10:04

	df = X_test.copy()df['predictions']
	rf_model.predict_proba(X_test)
	data_to_analyze = df.sort_values('predictions', ascending=False).head(10)