Vytautas Bielinskas vb100

🎯

Focusing on A.I., ML and Deep Learning

Data Scientist at IBM. Favourite domains: Computer Vision, NLP, Predictive Analysis, Deep learning.

127 followers · 0 following

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

vb100 / zoopla_rents.py

Last active August 4, 2022 23:58

Zoopla.com scrapper for get rents data

	""" Zoopla scraping project """

	# Import libraries
	import requests, re, os
	import pandas as pd
	from bs4 import BeautifulSoup

	""" Generate the list of URLs : Start"""
	def generateURLs(pages):
	listURLs = []

vb100 / ECDF.py

Created June 12, 2018 09:52

Empirical distribution function example with Real Estate data

	# 01 Calculate ECDF for Zoopla distribution model
	def ecdf(data):
	"""Compute ECDF for a one-dimensional array of measurements."""

	# Number of data points: n
	n = len(data)

	# x-data for the ECDF: x
	x = np.sort(data)

vb100 / compareECDFtoPercentiles.py

Created June 12, 2018 10:48

Comparing percentiles to ECDF To see how the percentiles relate to the ECDF, you will plot the percentiles of Iris versicolor petal lengths you calculated in the last exercise on the ECDF plot you generated in chapter 1. The percentile variables from the previous exercise are available in the workspace as ptiles_vers and percentiles. Note that t…

	# Plot the ECDF
	_ = plt.plot(x_vers, y_vers, '.')
	plt.margins(0.02)
	_ = plt.xlabel('petal length (cm)')
	_ = plt.ylabel('ECDF')

	# Overlay percentiles as red diamonds.
	_ = plt.plot(ptiles_vers, percentiles/100, marker='D', color='red',
	linestyle='none')

vb100 / pearson_corr.py

Created June 13, 2018 21:05

Pearson Correlation Coefficient R

	def pearson_r(x, y):
	"""Compute Pearson correlation coefficient between two arrays."""
	# Compute correlation matrix: corr_mat
	corr_mat = np.corrcoef(x, y)

	# Return entry [0,1]
	return corr_mat[0,1]

	# Compute Pearson correlation coefficient for I. versicolor: r
	r = pearson_r(versicolor_petal_length, versicolor_petal_width)

vb100 / BinomialPoisson.py

Created June 14, 2018 22:10

Relationship between Binomial and Poisson distributions You just heard that the Poisson distribution is a limit of the Binomial distribution for rare events.

	# Draw 10,000 samples out of Poisson distribution: samples_poisson
	samples_poisson = np.random.poisson(10, size = 10000)

	# Print the mean and standard deviation
	print('Poisson: ', np.mean(samples_poisson),
	np.std(samples_poisson))

	# Specify values of n and p to consider for Binomial: n, p
	n = [20, 100, 1000]
	p = [0.5, 0.1, 0.01]

vb100 / AutoloadingTablesfromDatabase.py

Created June 18, 2018 21:11

Autoloading Tables from a Database

	# Import Table
	from sqlalchemy import Table

	# Reflect census table from the engine: census
	census = Table('census', metadata, autoload=True, autoload_with=engine)

	# Print census table metadata
	print(repr(census))

vb100 / ViewingTableDetails.py

Last active June 18, 2018 21:29

Viewing Table Details

	# Reflect the census table from the engine: census
	census = Table('census', metadata, autoload=True, autoload_with=engine)

	# Print the column names
	print(census.columns.keys())

	# Print full table metadata
	print(repr(metadata.tables['census']))

vb100 / SelectingDataFromTable:_RawSQL.py

Created June 18, 2018 21:47

Selecting data from a Table: raw SQL

	# Build select statement for census table: stmt
	stmt = 'SELECT * FROM census'

	# Execute the statement and fetch the results: results
	results = connection.execute(stmt).fetchall()

	# Print Results
	print(results)

vb100 / XGBoost_example.py

Last active June 20, 2018 12:27

Simple readable XGBoost Example

	# -- coding: utf-8 --
	# Full instructions at: https://cambridgespark.com/content/tutorials/getting-started-with-xgboost/index.html
	# Date: 20180620

	#------------------------------------------------------------------------------
	# Use Pandas to load the data in a dataframe
	import pandas as pd
	df = pd.read_excel('default of credit card clients.xls', header = 1, index_col = 0)

	print('The shape of dataframe is {}.'.format(df.shape))

vb100 / Comps_identifiactor_MachineLearning.py

Created June 27, 2018 13:22

This is one of the biggest Machine learning project 100 % made by me. This module read periodically updated Training set, analyze it by performing Hyperparameter Tuning for Decision Tree/Random Forest and set the best selected hyperparameters to the classifier. Then calculate probabilities for a property to be a comps, construct Panda dataframe …

	# -- coding: utf-8 --
	"""
	Created on Thu Jun 21 14:26:09 2018

	@author: Vytautas.Bielinskas

	Definitions:
	JN - Jupyter Notebook
	ML - Machine learning
	BOG - Bag Of Words

Older Newer