ivopbernardo

Partner @DareData, teaching on @udemy and Nova University. Also on https://thedatajourney.substack.com/

ivopbernardo / nltk_intro.py

Last active November 2, 2022 09:29

Introduction to NLTK Library

	# Getting started with NLTK scripts - used in blog post:
	# https://towardsdatascience.com/getting-started-with-nltk-eb4ed6eb7a37

	from nltk import tokenize

	python_wiki = '''
	Python is a high-level, interpreted, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
	Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.
	Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0.[33] Python 2.0 was released in 2000 and introduced new features such as list comprehensions, cycle-detecting garbage collection, reference counting, and Unicode support. Python 3.0, released in 2008, was a major revision that is not completely

ivopbernardo / decisiontree.R

Last active November 2, 2022 09:30

Data Science Tutorials Blog Post Series: Training a Decision Tree using R

	# Training a decision tree in R - used in blog post:
	# https://medium.com/codex/data-science-tutorials-training-a-decision-tree-using-r-d6266936d86

	library(dplyr)
	library(rpart)
	library(rpart.plot)
	library(caret)
	library(Metrics)
	library(ggplot2)

ivopbernardo / geoprocess_dd_post.py

Last active March 11, 2022 14:00

Locate your Data and Boost it with Geo-Processing Post

	# Getting Latitude and Longitude from Nominatim

	from geopy.geocoders import Nominatim
	from geopy.extra.rate_limiter import RateLimiter

	geocoder = Nominatim(user_agent="FindAddress")
	geocode = RateLimiter(
	geocoder.geocode,
	min_delay_seconds = 1,
	return_value_on_exception = None

ivopbernardo / xgboostr.r

Last active November 2, 2022 09:31

xgboostr.r

	# Training an XGBoost in R - used in blog post:
	# https://towardsdatascience.com/data-science-tutorials-training-an-xgboost-using-r-cf3c00b1425

	library(dplyr)
	library(xgboost)
	library(Metrics)
	library(ggplot2)

	# Load london bike csv
	london_bike <- read.csv('./london_merged.csv')

ivopbernardo / randomforests.r

Last active November 2, 2022 09:31

	# Training a Random Forest in R - used in blog post:
	# https://towardsdatascience.com/data-science-tutorials-training-a-random-forest-in-r-a883cc1bacd1

	library(dplyr)
	library(randomForest)
	library(ranger)
	library(Metrics)

	# Load london bike csv
	london_bike <- read.csv('./london_merged.csv')

ivopbernardo / rf_demo.R

Created February 4, 2022 18:18

Random Forests vs. Decision Trees

	# Don't forget to download the train.csv file
	# to make this gist work.

	# Download it at: https://www.kaggle.com/c/titanic/data?select=train.csv

	# You also need to install ROCR and rpart libraries

	# Reading the titanic train dataset
	titanic <- read.csv('./train.csv')

ivopbernardo / cooccurrence_example.py

Created August 16, 2021 12:49

word_vectors_cooccurrence

	import wikipedia
	import pandas as pd
	import numpy as np
	import string
	from nltk.tokenize import word_tokenize
	from sklearn.metrics.pairwise import cosine_similarity

	def retrieve_page(page_name: str) -> list:
	'''
	Retrieves page data from wikipedia

ivopbernardo / stemming_example.py

Last active May 18, 2021 16:51

Examples around NLTK stemming

	from nltk.tokenize import word_tokenize
	from nltk.stem import PorterStemmer, SnowballStemmer, LancasterStemmer

	porter = PorterStemmer()
	snowball = SnowballStemmer(language='english')
	lanc = LancasterStemmer()

	sentence_example = (
	'This is definitely a controversy as the attorney labeled the case "extremely controversial"'
	)

ivopbernardo / text_representation.py

Created April 23, 2021 16:10

Python Text Representation

	# Import sklearn vectorizers and pandas
	import pandas as pd
	from sklearn.feature_extraction.text import (
	CountVectorizer,
	TfidfVectorizer
	)


	# Defining our sentence examples
	sentence_list = [

ivopbernardo / cleaning_data.R

Last active January 3, 2021 13:41

cleaning FBI crime data

	# Loading readxl library
	library(readxl)

	clean_crime_data <- function(path) {
	# Load the Data
	crime_data <- read_xls(path)

	# Assigning colnames
	colnames(crime_data) <- crime_data[3,]