Skip to content

Instantly share code, notes, and snippets.

@joskid
joskid / ALS_implementation.py
Created February 28, 2021 05:29 — forked from himanshk96/ALS_implementation.py
Recommendation using ALS for implicit data. Code for Medium Blog
# -*- coding: utf-8 -*-
"""
Created on Sun Jun 23 22:20:58 2019
@author: himansh
"""
#import libraries
import sys
import pandas as pd
import numpy as np
@joskid
joskid / matrix_factorization.py
Created February 28, 2021 05:30 — forked from kastnerkyle/matrix_factorization.py
Matrix factorization code related to matrix completion
# (C) Kyle Kastner, June 2014
# License: BSD 3 clause
import numpy as np
from scipy import sparse
def minibatch_indices(X, minibatch_size):
minibatch_indices = np.arange(0, len(X), minibatch_size)
minibatch_indices = np.asarray(list(minibatch_indices) + [len(X)])
@joskid
joskid / Data Mining Books.md
Created March 3, 2021 02:25 — forked from dweinstein/Data Mining Books.md
Free Data Mining books

Source: http://christonard.com/12-free-data-mining-books/

  • An Introduction to Statistical Learning with Applications in R by James, Witten, Hastie & Tibshirani – This book is fantastic and has helped me quite a bit. It provides an overview of several methods, along with the R code for how to complete them. 426 Pages.
  • The Elements of Statistical Learning by Hastie, Tibshirani & Friedman – This is an in-depth overview of methods, complete with theory, derivations & code. I’d definitely consider this a graduate level text. I’d also consider it one of the best books available on the topic of data mining. 745 Pages.
  • A Programmer’s Guide to Data Mining by Ron Zacharski – This one is an online book, each chapter downloadable as a PDF. It’s also still in progress, with chapters being added a few times each year. Probabilistic Programming & Bayesian Methods for Hackers by Cam Davidson-Pilson – This book is absolutely fantastic. The author explains Bayesian statistics, provides several diverse examples of how to a
@joskid
joskid / web-scraping-java-jsoup-htmlunit-jaunt-uij-selenium-phantomjs.md
Created March 6, 2021 04:04
Web Scraping with Java: JSoup - HtmlUnit - Jaunt - ui4j - Selenium - PhantomJS

JSoup

JSoup is a HTML parser, it can't control the web page, only parse the content. Supports only CSS Selectors. It gives you the possibility to select elements using jQuery-like CSS selectors and provides a slick API to traverse the HTML DOM tree to get the elements of interest. Particularly the traversing of the HTML DOM tree is the major strength of JSoup. Can be used in web applications.

HtmlUnit

HtmlUnit is a "GUI-Less browser for Java programs". The HtmlUnit browser can simulate Chrome, Firefox or Internet Explorer behaviour. It is a light weight solution that doesn't have too many dependencies. Generally, it supports JavaScript and Cookies, but in some cases it may fail. HtmlUnit is used for testing, web scraping, and is the basis for other tools. You can simulate pretty much anything a browser can do like click events, submit events etc. It's much more than alone a HTML parser, is ideal for web application automated unit testing. Supports XPath, but the problem starts when you try to extrac

A complete list of books, articles, blog posts, videos and neat pages that support Data Fundamentals (H), organised by Unit.

Formatting

If the resource is available online (legally) I have included a link to it. Each entry has symbols following it.

  • ⨕⨕⨕ indicates difficulty/depth, from ⨕ (easy to pick up intro, no background required) through ⨕⨕⨕⨕⨕ (graduate level textbook, maths heavy, expect equations)
  • ⭐ indicates a particularly recommended resource; 🌟 is a very strongly recommended resource and you should look at it.
@joskid
joskid / short_version.py
Created May 5, 2021 14:00 — forked from miloharper/short_version.py
A neural network in 9 lines of Python code.
from numpy import exp, array, random, dot
training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
training_set_outputs = array([[0, 1, 1, 0]]).T
random.seed(1)
synaptic_weights = 2 * random.random((3, 1)) - 1
for iteration in xrange(10000):
output = 1 / (1 + exp(-(dot(training_set_inputs, synaptic_weights))))
synaptic_weights += dot(training_set_inputs.T, (training_set_outputs - output) * output * (1 - output))
print 1 / (1 + exp(-(dot(array([1, 0, 0]), synaptic_weights))))
@joskid
joskid / pytorch_tips_yt_follow.ipynb
Created May 13, 2021 09:30 — forked from ejmejm/pytorch_tips_yt_follow.ipynb
pytorch_tips_yt_follow.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rem Create by Erik van Oost for version PC/LR 12.55
@if not exist C:\temp mkdir C:\temp
@echo %~f0
@set scriptpath=%~f0\..\
@echo Extracting setup files with 7Zip
%scriptpath%7z.exe x %scriptpath%SetupVugen.exe -oC:\temp\SetupVugen -y
%scriptpath%7z.exe x %scriptpath%SetupAnalysis.exe -oC:\temp\SetupAnalysis -y
@echo Installing prerequisites for Vugen
@joskid
joskid / google-drive-api.js
Created January 2, 2022 04:36 — forked from trulymittal/google-drive-api.js
Gist to demonstrate Google Drive API using NodeJs
/*
Google Drive API:
Demonstration to:
1. upload
2. delete
3. create public URL of a file.
required npm package: googleapis
*/
const { google } = require('googleapis');