Skip to content

Instantly share code, notes, and snippets.

View vb100's full-sized avatar
🎯
Focusing on A.I., ML and Deep Learning

Vytautas Bielinskas vb100

🎯
Focusing on A.I., ML and Deep Learning
View GitHub Profile
@vb100
vb100 / zoopla_rents.py
Last active August 4, 2022 23:58
Zoopla.com scrapper for get rents data
""" Zoopla scraping project """
# Import libraries
import requests, re, os
import pandas as pd
from bs4 import BeautifulSoup
""" Generate the list of URLs : Start"""
def generateURLs(pages):
listURLs = []
@vb100
vb100 / ECDF.py
Created June 12, 2018 09:52
Empirical distribution function example with Real Estate data
# 01 Calculate ECDF for Zoopla distribution model
def ecdf(data):
"""Compute ECDF for a one-dimensional array of measurements."""
# Number of data points: n
n = len(data)
# x-data for the ECDF: x
x = np.sort(data)
@vb100
vb100 / compareECDFtoPercentiles.py
Created June 12, 2018 10:48
Comparing percentiles to ECDF To see how the percentiles relate to the ECDF, you will plot the percentiles of Iris versicolor petal lengths you calculated in the last exercise on the ECDF plot you generated in chapter 1. The percentile variables from the previous exercise are available in the workspace as ptiles_vers and percentiles. Note that t…
# Plot the ECDF
_ = plt.plot(x_vers, y_vers, '.')
plt.margins(0.02)
_ = plt.xlabel('petal length (cm)')
_ = plt.ylabel('ECDF')
# Overlay percentiles as red diamonds.
_ = plt.plot(ptiles_vers, percentiles/100, marker='D', color='red',
linestyle='none')
@vb100
vb100 / pearson_corr.py
Created June 13, 2018 21:05
Pearson Correlation Coefficient R
def pearson_r(x, y):
"""Compute Pearson correlation coefficient between two arrays."""
# Compute correlation matrix: corr_mat
corr_mat = np.corrcoef(x, y)
# Return entry [0,1]
return corr_mat[0,1]
# Compute Pearson correlation coefficient for I. versicolor: r
r = pearson_r(versicolor_petal_length, versicolor_petal_width)
@vb100
vb100 / BinomialPoisson.py
Created June 14, 2018 22:10
Relationship between Binomial and Poisson distributions You just heard that the Poisson distribution is a limit of the Binomial distribution for rare events.
# Draw 10,000 samples out of Poisson distribution: samples_poisson
samples_poisson = np.random.poisson(10, size = 10000)
# Print the mean and standard deviation
print('Poisson: ', np.mean(samples_poisson),
np.std(samples_poisson))
# Specify values of n and p to consider for Binomial: n, p
n = [20, 100, 1000]
p = [0.5, 0.1, 0.01]
@vb100
vb100 / AutoloadingTablesfromDatabase.py
Created June 18, 2018 21:11
Autoloading Tables from a Database
# Import Table
from sqlalchemy import Table
# Reflect census table from the engine: census
census = Table('census', metadata, autoload=True, autoload_with=engine)
# Print census table metadata
print(repr(census))
@vb100
vb100 / ViewingTableDetails.py
Last active June 18, 2018 21:29
Viewing Table Details
# Reflect the census table from the engine: census
census = Table('census', metadata, autoload=True, autoload_with=engine)
# Print the column names
print(census.columns.keys())
# Print full table metadata
print(repr(metadata.tables['census']))
@vb100
vb100 / SelectingDataFromTable:_RawSQL.py
Created June 18, 2018 21:47
Selecting data from a Table: raw SQL
# Build select statement for census table: stmt
stmt = 'SELECT * FROM census'
# Execute the statement and fetch the results: results
results = connection.execute(stmt).fetchall()
# Print Results
print(results)
@vb100
vb100 / XGBoost_example.py
Last active June 20, 2018 12:27
Simple readable XGBoost Example
# -*- coding: utf-8 -*-
# Full instructions at: https://cambridgespark.com/content/tutorials/getting-started-with-xgboost/index.html
# Date: 20180620
#------------------------------------------------------------------------------
# Use Pandas to load the data in a dataframe
import pandas as pd
df = pd.read_excel('default of credit card clients.xls', header = 1, index_col = 0)
print('The shape of dataframe is {}.'.format(df.shape))
@vb100
vb100 / Comps_identifiactor_MachineLearning.py
Created June 27, 2018 13:22
This is one of the biggest Machine learning project 100 % made by me. This module read periodically updated Training set, analyze it by performing Hyperparameter Tuning for Decision Tree/Random Forest and set the best selected hyperparameters to the classifier. Then calculate probabilities for a property to be a comps, construct Panda dataframe …
# -*- coding: utf-8 -*-
"""
Created on Thu Jun 21 14:26:09 2018
@author: Vytautas.Bielinskas
Definitions:
JN - Jupyter Notebook
ML - Machine learning
BOG - Bag Of Words