Last active
August 29, 2015 14:05
-
-
Save kyu999/5bb18b231494c3ded7cc to your computer and use it in GitHub Desktop.
改行ルールほぼ最終
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
separate編: | |
1. 助詞もしくは副詞の後は切る | |
2. 記号のあとは切る | |
3. 長いと切る | |
regulate編: | |
1. 左括弧の前で切れてたらくっつける | |
2. 記号の連続、助詞の連続の場合もくっつける | |
3. 右括弧以外の記号の前ではくっつける | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-- coding:utf-8 --
import re
import unicodedata
import MeCab
import os
import title_cleaner
class SpeechTitleSplitter(object):
"""This class split a given title and produces split points for beatiful rendering of that by the simple way"""