Skip to content

Instantly share code, notes, and snippets.

View ahurriyetoglu's full-sized avatar

Ali Hürriyetoğlu ahurriyetoglu

View GitHub Profile
@sixtenbe
sixtenbe / analytic_wfm.py
Last active May 27, 2024 01:24 — forked from endolith/peakdet.m
Peak detection in Python
#!/usr/bin/python2
# Copyright (C) 2016 Sixten Bergman
# License WTFPL
#
# This program is free software. It comes without any warranty, to the extent
# permitted by applicable law.
# You can redistribute it and/or modify it under the terms of the Do What The
# Fuck You Want To Public License, Version 2, as published by Sam Hocevar. See
@ses4j
ses4j / incrementalmr.py
Created March 29, 2012 03:42
Periodically-updating pymongo/MongoDB incremental MapReduce example
def incremental_map_reduce(
map_f,
reduce_f,
db,
source_table_name,
target_table_name,
source_queued_date_field_name,
counter_table_name = "IncrementalMRCounters",
counter_key = None,
max_datetime = None,
@bwhite
bwhite / rank_metrics.py
Created September 15, 2012 03:23
Ranking Metrics
"""Information Retrieval metrics
Useful Resources:
http://www.cs.utexas.edu/~mooney/ir-course/slides/Evaluation.ppt
http://www.nii.ac.jp/TechReports/05-014E.pdf
http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
http://hal.archives-ouvertes.fr/docs/00/72/67/60/PDF/07-busa-fekete.pdf
Learning to Rank for Information Retrieval (Tie-Yan Liu)
"""
import numpy as np
@larsmans
larsmans / gist:3745866
Created September 18, 2012 21:00
Inspecting scikit-learn CountVectorizer output with a Pandas DataFrame
>>> from pandas import DataFrame
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> docs = ["You can catch more flies with honey than you can with vinegar.",
... "You can lead a horse to water, but you can't make him drink."]
>>> vect = CountVectorizer(min_df=0., max_df=1.0)
>>> X = vect.fit_transform(docs)
>>> print(DataFrame(X.A, columns=vect.get_feature_names()).to_string())
but can catch drink flies him honey horse lead make more than to vinegar water with you
0 0 2 1 0 1 0 1 0 0 0 1 1 0 1 0 2 2
1 1 2 0 1 0 1 0 1 1 1 0 0 1 0 1 0 2
@larsmans
larsmans / kmeans.py
Created February 14, 2013 13:38
k-means clustering in pure Python
#!/usr/bin/python
#
# K-means clustering using Lloyd's algorithm in pure Python.
# Written by Lars Buitinck. This code is in the public domain.
#
# The main program runs the clustering algorithm on a bunch of text documents
# specified as command-line arguments. These documents are first converted to
# sparse vectors, represented as lists of (index, value) pairs.
from collections import defaultdict
@miguelgrinberg
miguelgrinberg / rest-server.py
Last active August 4, 2024 14:41
The code from my article on building RESTful web services with Python and the Flask microframework. See the article here: http://blog.miguelgrinberg.com/post/designing-a-restful-api-with-python-and-flask
#!flask/bin/python
from flask import Flask, jsonify, abort, request, make_response, url_for
from flask_httpauth import HTTPBasicAuth
app = Flask(__name__, static_url_path = "")
auth = HTTPBasicAuth()
@auth.get_password
def get_password(username):
if username == 'miguel':
@debasishg
debasishg / gist:8172796
Last active November 11, 2024 07:10
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
import numpy as np
import marisa_trie
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.externals import six
class MarisaCountVectorizer(CountVectorizer):
# ``CountVectorizer.fit`` method calls ``fit_transform`` so
# ``fit`` is not provided
def fit_transform(self, raw_documents, y=None):
def plot_correlogram(df,figsize=(20,20)):
''' Creat an n x n matrix of scatter plots for every
combination of numeric columns in a dataframe'''
cols = list(df.columns[df.dtypes=='float64'])
n = len(cols)
fig, ax = plt.subplots(n,n,figsize=figsize)
for i,y in enumerate(cols):
for j,x in enumerate(cols):
if i != n-1:
"Ecrin"
"Eymen"
"Ceylin"
"Ebrar"
"Tuana"
"Esila"
"Esra"
"Enes"
"Talha"
"Ömer"