This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def findWeeklyContents(page: Int = 1, amount: Int = 8): List[LightContent] = { | |
toLightContents(coll.find("date" $gt get_last_week()).take(amount)) | |
} | |
def findMonthlyContents(page: Int = 1, amount: Int = 8): List[LightContent] | |
= toLightContents(coll.find("date" $gt get_last_month()).take(amount)) | |
def findWeeklyPopularContents(page: Int = 1, amount: Int = 8): List[LightContent] | |
= toLightContents(coll.find("date" $gt get_last_week()).sort(DBObject("views" -> -1))) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
課題:多くのデータがあるカテゴリーから12個のデータを各ページにて取り出すときに毎度find queryを発行していては遅すぎるしアクセスが増えた時にすぐ死ぬる。 | |
解決方針: | |
1. if(iterator does not take so much time to extract even 1000 data) get iterator and pass through it to another page | |
=> what about the user visit the first page and go to the last page? => out? | |
2. slice index the page => if page is 3, the data to slice is 3 * 12 to (3 + 1) * 12. | |
page : 1 -> 1 * 1 to 1 * 12 = 1 to 12 | |
page : 2 -> (2 - 1) * 12 to 2 * 12 = 12 to 24 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#e9e6de |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
def constructContents(i: Int): List[Content] | |
= (1 to amount).map{ i => | |
val id = cache.get("week_pop_id_" + i) | |
val title = cache.get("week_pop_title_" + i) | |
val views = cache.get("week_pop_views_" + i) | |
val category = cache.get("week_pop_category_" + i) | |
val date = cache.get("week_pop_date_" + i) | |
val storages = content.storages.map( link => cache.rpush("week_pop_storages_" + i, link) | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
def swap_iterate(self, cursor): | |
if(not cursor.alive): | |
return "Finish" | |
data = cursor.next() | |
swapped_date = self.swap_date(data.get("date")) | |
data["date"] = swapped_date | |
self.coll.save(data) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
backは何かの作品の画像をドーン、containerは落ち着いた色で。白系統。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
found = coll.find()[0:x] | |
found.alive # if could get more date potentially, return True, else false | |
found.next() # get more date. it occcurs error when no more data in the coll |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
for each_content in whole_dom.cssselect("div.base"): | |
#h1が余分に含まれている場合そのdivは対象外の可能性高し。 | |
if(len(each_content.cssselect("h1")) > 1): | |
continue | |
for each in each_content.iter(): | |
self.get_title(each) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
scrapy genspider example example.com |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
開発手順: | |
excnnのみ着手 => 他のサイトも同様のステップを踏む | |
各漫画ページからスクレイピング => 取得したデータを保存 => サイト全体のクローリングステップを考える => その実装 | |
技術面: | |
python, mysql(cloud sql served by Google), mongoDB(mongoLab), scrapy, scala, play, slick | |
基本的にクローラーをpythonで作成し、webサーバーをplayで作成する。 | |
mongodbはクローリングの際に取得したDOMを全て記録するためのもの。 |