kyu999’s gists

kyu999 / query_little

Created November 23, 2014 08:38


	def findWeeklyContents(page: Int = 1, amount: Int = 8): List[LightContent] = {
	toLightContents(coll.find("date" $gt get_last_week()).take(amount))
	}

	def findMonthlyContents(page: Int = 1, amount: Int = 8): List[LightContent]
	= toLightContents(coll.find("date" $gt get_last_month()).take(amount))

	def findWeeklyPopularContents(page: Int = 1, amount: Int = 8): List[LightContent]
	= toLightContents(coll.find("date" $gt get_last_week()).sort(DBObject("views" -> -1)))

kyu999 / mongo_query

Created November 21, 2014 06:17

mongo_query

	課題：多くのデータがあるカテゴリーから12個のデータを各ページにて取り出すときに毎度find queryを発行していては遅すぎるしアクセスが増えた時にすぐ死ぬる。

	解決方針：

	1. if(iterator does not take so much time to extract even 1000 data) get iterator and pass through it to another page
	=> what about the user visit the first page and go to the last page? => out?

	2. slice index the page => if page is 3, the data to slice is 3 * 12 to (3 + 1) * 12.
	page : 1 -> 1 * 1 to 1 * 12 = 1 to 12
	page : 2 -> (2 - 1) * 12 to 2 * 12 = 12 to 24

kyu999 / background_color

Created November 20, 2014 18:20

background

#e9e6de

kyu999 / backup

Created November 20, 2014 04:06

backup manga mining

	/*
	def constructContents(i: Int): List[Content]
	= (1 to amount).map{ i =>
	val id = cache.get("week_pop_id_" + i)
	val title = cache.get("week_pop_title_" + i)
	val views = cache.get("week_pop_views_" + i)
	val category = cache.get("week_pop_category_" + i)
	val date = cache.get("week_pop_date_" + i)
	val storages = content.storages.map( link => cache.rpush("week_pop_storages_" + i, link)
	}

kyu999 / backup

Created November 20, 2014 03:32

backup

	"""
	def swap_iterate(self, cursor):
	if(not cursor.alive):
	return "Finish"

	data = cursor.next()

	swapped_date = self.swap_date(data.get("date"))
	data["date"] = swapped_date
	self.coll.save(data)

kyu999 / mangarian_design

Created November 17, 2014 12:08

mangarian design

backは何かの作品の画像をドーン、containerは落ち着いた色で。白系統。

kyu999 / pymongo

Created November 17, 2014 11:04

pymongo

	found = coll.find()[0:x]
	found.alive # if could get more date potentially, return True, else false
	found.next() # get more date. it occcurs error when no more data in the coll

kyu999 / save_content

Created November 17, 2014 05:00

save content

	'''
	for each_content in whole_dom.cssselect("div.base"):

	#h1が余分に含まれている場合そのdivは対象外の可能性高し。
	if(len(each_content.cssselect("h1")) > 1):
	continue

	for each in each_content.iter():

	self.get_title(each)

kyu999 / scrapy

Created November 16, 2014 13:48

scrapy

scrapy genspider example example.com

kyu999 / agenda

Created November 15, 2014 08:50

mangarina agenda

	開発手順：
	excnnのみ着手 => 他のサイトも同様のステップを踏む
	各漫画ページからスクレイピング => 取得したデータを保存 => サイト全体のクローリングステップを考える => その実装

	技術面：
	python, mysql(cloud sql served by Google), mongoDB(mongoLab), scrapy, scala, play, slick

	基本的にクローラーをpythonで作成し、webサーバーをplayで作成する。

	mongodbはクローリングの際に取得したDOMを全て記録するためのもの。

kyu999 kyu999