- Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
- Models and Issues in Data Stream Systems
- Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
- Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
- [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python2 | |
# Copyright (C) 2016 Sixten Bergman | |
# License WTFPL | |
# | |
# This program is free software. It comes without any warranty, to the extent | |
# permitted by applicable law. | |
# You can redistribute it and/or modify it under the terms of the Do What The | |
# Fuck You Want To Public License, Version 2, as published by Sam Hocevar. See |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import scipy.stats | |
class ChineseRestaurantProcess(object): | |
def __init__(self, alpha): | |
self.alpha = alpha | |
self.customers = [] | |
def sample(self, n_samples=1): | |
samples = [] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Non-Negative Garotte implementation with the scikit-learn | |
""" | |
# Author: Alexandre Gramfort <[email protected]> | |
# Jaques Grobler (__main__ script) <[email protected]> | |
# | |
# License: BSD Style. | |
import numpy as np |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Latency Comparison Numbers (~2012) | |
---------------------------------- | |
L1 cache reference 0.5 ns | |
Branch mispredict 5 ns | |
L2 cache reference 7 ns 14x L1 cache | |
Mutex lock/unlock 25 ns | |
Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
Compress 1K bytes with Zippy 3,000 ns 3 us | |
Send 1K bytes over 1 Gbps network 10,000 ns 10 us | |
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD |
Note: this is a summary of different git workflows putting together to a small git bible. references are in between the text
try to keep your hacking out of the master and create feature branches. the [feature-branch workflow][4] is a good median between noobs (i have no idea how to branch) and git veterans (let's do some rocket sience with git branches!). everybody get the idea!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" Non-negative matrix factorization for I divergence | |
This code was implements Lee and Seung's multiplicative updates algorithm | |
for NMF with I divergence cost. | |
Lee D. D., Seung H. S., Learning the parts of objects by non-negative | |
matrix factorization. Nature, 1999 | |
""" | |
# Author: Olivier Mangin <[email protected]> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Wordlist ver 0.732 - EXPECT INCOMPATIBLE CHANGES; | |
acrobat africa alaska albert albino album | |
alcohol alex alpha amadeus amanda amazon | |
america analog animal antenna antonio apollo | |
april aroma artist aspirin athlete atlas | |
banana bandit banjo bikini bingo bonus | |
camera canada carbon casino catalog cinema | |
citizen cobra comet compact complex context | |
credit critic crystal culture david delta | |
dialog diploma doctor domino dragon drama |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Multiclass SVMs (Crammer-Singer formulation). | |
A pure Python re-implementation of: | |
Large-scale Multiclass Support Vector Machine Training via Euclidean Projection onto the Simplex. | |
Mathieu Blondel, Akinori Fujino, and Naonori Ueda. | |
ICPR 2014. | |
http://www.mblondel.org/publications/mblondel-icpr2014.pdf | |
""" |
This is just a quick list of resourses on TDA that I put together for @rickasaurus after he was asking for links to papers, books, etc on Twitter and is by no means an exhaustive list.
Both Carlsson's and Ghrist's survey papers offer a very good introduction to the subject
- Topology and Data by Gunnar Carlsson
- Barcodes: The Persistent Topology of Data by Robert Ghrist
- Extracting insights from the shape of complex data using topology A good introductory paper in Nature on the
Mapper
algorithm.
OlderNewer