Mikhail Korobov kmike

kmike / robots_mw.py

Last active February 21, 2017 10:47

RobotsCrawlDelayMiddleware

	# -- coding: utf-8 --
	from __future__ import absolute_import
	import logging
	import urlparse

	from reppy.parser import Rules

	from scrapy import log
	from scrapy.exceptions import NotConfigured
	from scrapy.http import Request

kmike / Abbr.ipynb

Created August 20, 2014 15:37

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

kmike / idealspider.py

Last active August 29, 2015 14:05 — forked from dangra/idealspider.py

	import scrapy
	from scrapy.http import safeurl


	class Spider(scrapy.Spider):

	name = 'loremipsum'
	start_urls = ('https://www.lipsum.com',)

	def parse(self, response):

kmike / Ms-f.ipynb

Created August 19, 2014 22:56

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

kmike / BytesIO-Copy0.ipynb

Created July 15, 2014 12:11

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

kmike / BytesIO.ipynb

Created July 15, 2014 11:59

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

kmike / matplotlib-gc.ipynb

Last active August 29, 2015 14:01

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

kmike / README.txt

Last active December 21, 2017 05:17

prototypes for https://github.com/scikit-learn/scikit-learn/issues/2639

	Folder structure should be the following:

	vectorizers/
	vec/
	stored/
	string_dict/
	string_dict.pyx
	setup.py
	marisa_vectorizers.py
	memusage_fit.py

kmike / hattrie_vectorizer.py

Created March 27, 2014 11:21

	import numpy as np
	import scipy.sparse as sp
	import hat_trie
	from sklearn.feature_extraction.text import CountVectorizer, _make_int_array


	class HatTrieCountVectorizer(CountVectorizer):

	def _count_vocab(self, raw_documents, fixed_vocab):
	"""Create sparse feature matrix, and vocabulary where fixed_vocab=False

kmike / marisa_count_vectorizer.py

Last active June 28, 2021 02:39

	import numpy as np
	import marisa_trie
	from sklearn.feature_extraction.text import CountVectorizer
	from sklearn.externals import six

	class MarisaCountVectorizer(CountVectorizer):

	# ``CountVectorizer.fit`` method calls ``fit_transform`` so
	# ``fit`` is not provided
	def fit_transform(self, raw_documents, y=None):