Vytautas Bielinskas vb100

🎯

Focusing on A.I., ML and Deep Learning

Data Scientist at IBM. Favourite domains: Computer Vision, NLP, Predictive Analysis, Deep learning.

127 followers · 0 following

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

vb100 / automation_selenium_data_gathering_writing.py

Last active July 2, 2018 13:57

This Python code gathering numerical data from Stat. data website by Selenium, parse tables, structuring data and write all the values directly to Excel file including cell formating (OpenXlsX library).

	# Import libraries
	import requests, re, os
	import pandas as pd
	from bs4 import BeautifulSoup
	import os

	print("Starting sheet: Inflation rates")

	""" Prepare Home directory : start """
	os.chdir("C:\\Users\\Vytautas.Bielinskas\\Desktop\\PythonWorking\\Python\\")

vb100 / automation_data_structuring_writing.py

Last active July 2, 2018 13:53

This Python code read CSV file and automatically recongizes real estate properties, finds specific cells at Excel file where specific values should be written, make all Excel formating and building all neccesary formulas only by one click! Half day work done just in half a minute.

	# -- coding: utf-8 --
	""" Project Jupyter - extract data from PDF by Vytautas"""

	""" Importing libraries """
	import pandas as pd
	import os

	""" Reading Dataset file """
	os.chdir("C:\\Users\\Vytautas.Bielinskas\\Desktop\\Python\\")
	DF = pd.read_csv("tabula-Statement (BULK) 02 July.csv", header=None)

vb100 / Comps_identifiactor_MachineLearning.py

Created June 27, 2018 13:22

This is one of the biggest Machine learning project 100 % made by me. This module read periodically updated Training set, analyze it by performing Hyperparameter Tuning for Decision Tree/Random Forest and set the best selected hyperparameters to the classifier. Then calculate probabilities for a property to be a comps, construct Panda dataframe …

	# -- coding: utf-8 --
	"""
	Created on Thu Jun 21 14:26:09 2018

	@author: Vytautas.Bielinskas

	Definitions:
	JN - Jupyter Notebook
	ML - Machine learning
	BOG - Bag Of Words

vb100 / XGBoost_example.py

Last active June 20, 2018 12:27

Simple readable XGBoost Example

	# -- coding: utf-8 --
	# Full instructions at: https://cambridgespark.com/content/tutorials/getting-started-with-xgboost/index.html
	# Date: 20180620

	#------------------------------------------------------------------------------
	# Use Pandas to load the data in a dataframe
	import pandas as pd
	df = pd.read_excel('default of credit card clients.xls', header = 1, index_col = 0)

	print('The shape of dataframe is {}.'.format(df.shape))

vb100 / SelectingDataFromTable:_RawSQL.py

Created June 18, 2018 21:47

Selecting data from a Table: raw SQL

	# Build select statement for census table: stmt
	stmt = 'SELECT * FROM census'

	# Execute the statement and fetch the results: results
	results = connection.execute(stmt).fetchall()

	# Print Results
	print(results)

vb100 / ViewingTableDetails.py

Last active June 18, 2018 21:29

Viewing Table Details

	# Reflect the census table from the engine: census
	census = Table('census', metadata, autoload=True, autoload_with=engine)

	# Print the column names
	print(census.columns.keys())

	# Print full table metadata
	print(repr(metadata.tables['census']))

vb100 / AutoloadingTablesfromDatabase.py

Created June 18, 2018 21:11

Autoloading Tables from a Database

	# Import Table
	from sqlalchemy import Table

	# Reflect census table from the engine: census
	census = Table('census', metadata, autoload=True, autoload_with=engine)

	# Print census table metadata
	print(repr(census))

vb100 / BinomialPoisson.py

Created June 14, 2018 22:10

Relationship between Binomial and Poisson distributions You just heard that the Poisson distribution is a limit of the Binomial distribution for rare events.

	# Draw 10,000 samples out of Poisson distribution: samples_poisson
	samples_poisson = np.random.poisson(10, size = 10000)

	# Print the mean and standard deviation
	print('Poisson: ', np.mean(samples_poisson),
	np.std(samples_poisson))

	# Specify values of n and p to consider for Binomial: n, p
	n = [20, 100, 1000]
	p = [0.5, 0.1, 0.01]

vb100 / pearson_corr.py

Created June 13, 2018 21:05

Pearson Correlation Coefficient R

	def pearson_r(x, y):
	"""Compute Pearson correlation coefficient between two arrays."""
	# Compute correlation matrix: corr_mat
	corr_mat = np.corrcoef(x, y)

	# Return entry [0,1]
	return corr_mat[0,1]

	# Compute Pearson correlation coefficient for I. versicolor: r
	r = pearson_r(versicolor_petal_length, versicolor_petal_width)

vb100 / compareECDFtoPercentiles.py

Created June 12, 2018 10:48

Comparing percentiles to ECDF To see how the percentiles relate to the ECDF, you will plot the percentiles of Iris versicolor petal lengths you calculated in the last exercise on the ECDF plot you generated in chapter 1. The percentile variables from the previous exercise are available in the workspace as ptiles_vers and percentiles. Note that t…

	# Plot the ECDF
	_ = plt.plot(x_vers, y_vers, '.')
	plt.margins(0.02)
	_ = plt.xlabel('petal length (cm)')
	_ = plt.ylabel('ECDF')

	# Overlay percentiles as red diamonds.
	_ = plt.plot(ptiles_vers, percentiles/100, marker='D', color='red',
	linestyle='none')

NewerOlder