Skip to content

Instantly share code, notes, and snippets.

@crhan
Created May 4, 2012 15:41
Show Gist options
  • Save crhan/2595604 to your computer and use it in GitHub Desktop.
Save crhan/2595604 to your computer and use it in GitHub Desktop.
无聊抓百度贴吧上的小说更新, 原理是贴吧更新小说都会置顶, 所以只要在置顶里面判断是否符合 '第几章节' 即可, 最后用各种手段推送就行了. 这里使用了飞信和Gtalk.

去年写的暴力抓百度贴吧上小说更新的脚本, 已经跑了很久了. 因为当时蛋碎的翻 man bash, 所以强迫自己用了很多很不常见的变量用法, 比如 ${!URL_*} 这样的, 所以还是能有一点点学习的价值

特点

  • 支持多人, 写多个 conf 即可
  • 有完善的 log, 会统一打到 log 文件夹下.
  • 今天刚加上发送给 gtalk, 需要 ruby 支持 gem install blather
  • 可以直接测试结果, 使用 -t 开关

用法

测试

bash $0 -t

运行

bash $0 -r

配置

全部配置文件都放在 conf.d 中, 增加新小说放到 source.conf 中 cliofetion 来自 ofetion

require 'optparse'
options = {}
OptionParser.new do |opts|
opts.banner = "Usage: #{$0} -r JID -m content"
opts.on('-m', '--message MESSAGE','Message to be sent') do |m|
options[:message] = m
end
opts.on('-r', '--rec RECEIVER', 'Receiver') do |rec|
options[:receiver] = rec+"@gmail.com"
end
opts.on_tail("-h", '--help', 'Print this help') do
puts opts
end
end.parse!
p options
require 'blather/client'
setup '发信机器人账号', '机器人密码', 'talk.google.com', '5222'
when_ready do
write_to_stream Blather::Stanza::Message.new( options[:receiver], options[:message])
shutdown
end
# 先加好飞信哦!
TEL_NUM="YOUR TEL NUMBER"
NAME="crhan"
JINGPIN_FLAG=false
# Gtalk 账号, 只要帐号名, 就是@gmail.com 前面的部分
GTALK=""
# 小说的名字, 要跟 source.conf 对应
WATCH_LIST="
tunshixingkong
tianzhubian
xiuzhenshijie
zhetian
"
*/5 * * * * /absolute/path/to/xiaoshuo -r
# 必须遵守下面的规则, 很容易看出来吧!
# 最后发送的信息内容就是 "书名 章节名, url连接"
tunshixingkong="吞噬星空"
URL_tunshixingkong="%CD%CC%CA%C9%D0%C7%BF%D5"
tianzhubian="天珠变"
URL_tianzhubian="%CC%EC%D6%E9%B1%E4"
zhetian="遮天"
URL_zhetian="%D5%DA%CC%EC"
xiuzhenshijie="修真世界"
URL_xiuzhenshijie="%D0%DE%D5%E6%CA%C0%BD%E7"
quanzhigaoshou="全职高手"
URL_quanzhigaoshou="%C8%AB%D6%B0%B8%DF%CA%D6"
wudongqiankun="武动乾坤"
URL_wudongqiankun="%CE%E4%B6%AF%C7%AC%C0%A4"
dazhouhuangzu="大周皇族"
URL_dazhouhuangzu="%B4%F3%D6%DC%BB%CA%D7%E5"
xianni="仙逆"
URL_xianni="%CF%C9%C4%E6"
lieguo="猎国"
URL_lieguo="%C1%D4%B9%FA"
fanrenxiuxianzhuan="凡人修仙传"
URL_fanrenxiuxianzhuan="%B7%B2%C8%CB%D0%DE%CF%C9%B4%AB"
tongtianzhilu="通天之路"
URL_tongtianzhilu="%CD%A8%CC%EC%D6%AE%C2%B7"
laozishilaihama="老子是癞蛤蟆"
URL_laozishilaihama="%C0%CF%D7%D3%CA%C7%F1%AE%B8%F2%F3%A1"
yongsheng="永生"
URL_yongsheng="%D3%C0%C9%FA"
chaojiyisheng="超级医生"
URL_chaojiyisheng="%B3%AC%BC%B6%D2%BD%C9%FA"
zaoshen="造神"
URL_zaoshen="%D4%EC%C9%F1%20%B1%B1%BE%A9%B0%AE%CA%E9"
#!/bin/bash
XIAOSHUO_CWD="$( cd -P "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
RUN_FLAG=false
TEST_FLAG=false
JINGPIN_FLAG=false
CLIOFETION_LOG=$XIAOSHUO_CWD/log/cliofetion.log
REGEX='s#.*href="/p/(\d{10})".*(第.*(章|节).*?)<.*#$2, http://wapp.baidu.com/m?kz=$1\n#'
[ -d "$XIAOSHUO_CWD/log" ] || mkdir $XIAOSHUO_CWD/log
source "source.conf"
# 使用 $REGEX 的规则处理抓取到的信息
# 一般只要满足'置顶'即可, 如果打开了 JINGPING_FLAG, 则还需要满足'精品'
parse_zhiding(){
eval curl -s 'http://tieba.baidu.com/f?kw=${!URL_QUERY}' | iconv -f gb2312 -t utf-8 -c | grep -e '>\[\?置顶\]\?<\|alt="置顶"' $( $JINGPIN_FLAG && echo '-e >\[\?精品\]\?<' ) -B2 -A2 | eval "perl -ne 'if (/第.{1,18}[章节]/) { print if $REGEX};'"
}
test_source(){
for SELECTED in ${!URL_*}; do
URL_QUERY="${SELECTED}"
name=${SELECTED#URL_}
zhiding=$(parse_zhiding)
echo ${!name}
echo "$zhiding" |sed '/^$/d'
echo
done
}
main(){
for i in ./conf.d/*.conf; do
# 重置所有配置变量并读取下一份配置
unset TMP_FILE ID_FILE LOG_FILE NAME TEL_NUM WATCH_LIST JINGPIN_FLAG DISABLE GTALK
JINGPIN_FLAG=false
DISABLE=false
source $i
$DISABLE && continue
[ -z "$TEL_NUM" ] && echo 'Cannot find $TEL_NUM' >&2 && exit 1
[ -z "$NAME" ] && echo 'Cannot find $NAME' >&2 && exit 1
[ -z "$WATCH_LIST" ] && echo 'Cannot find $WATCH_LIST' >&2 && exit 1
TMP_FILE="$NAME.tmp"
ID_FILE="log/$NAME.id"
LOG_FILE="log/$NAME.log"
[ -f $ID_FILE ] || touch $ID_FILE
{
for SELECTED in $WATCH_LIST; do
URL_QUERY="URL_${SELECTED}"
zhiding=$(parse_zhiding)
ids=$( echo "$zhiding" | perl -ne 'print if s/.*?kz=(\d{10}).*/$1/g' )
# 用 id 检查本贴是否已经发过
for i in $ids; do
if ! grep $i $ID_FILE > /dev/null; then
echo ${!SELECTED}
echo "$zhiding" | grep "$i"
echo "$(date): $i" >> $ID_FILE
fi
done
done
} >> $TMP_FILE # 即将发送的内容存到每个人独立的 $TMP_FILE 中
if [ 0 -ne $(du $TMP_FILE | cut -f1 ) ];then
[ -f ${LOG_FILE} ] && mv ${LOG_FILE} ${LOG_FILE}.old
{
echo "-----------$(date), Start-------------"
echo "######################## Contents #########################"
cat $TMP_FILE
echo "###################### Contents Over ######################"
echo "############## Sent Contents to $TEL_NUM ###############"
# 飞信程序
/usr/local/bin/cliofetion -t $TEL_NUM -d "$(cat $TMP_FILE)" 2>&1 1>> $CLIOFETION_LOG
# 如果有 GTALK 配置, 就也发给 GTALK 账号
if [ -n $GTALK ]
then
ruby bot.rb -r$GTALK -m"$(cat $TMP_FILE)"
fi
# 发送失败时候保留未发内容, 待下次发送
if [ $? -eq 0 ]; then
rm $TMP_FILE
fi
echo "-----------$(date), Stop--------------"
echo
} >> $LOG_FILE
# 这么烦的一行其实就是为了把最新的 log 打到文件的最上面
[ -f ${LOG_FILE}.old ] && cat ${LOG_FILE}.old >> ${LOG_FILE} && rm ${LOG_FILE}.old
fi
rm $TMP_FILE
done
}
# 如果不是直接调用就不执行任何动作
if [[ "$BASH_SOURCE" == "$0" ]]
then
cd $XIAOSHUO_CWD
while getopts rtj OPTS
do
case $OPTS in
r)RUN_FLAG=true
;;
t)TEST_FLAG=true
;;
j)JINGPIN_FLAG=true
;;
?)echo "what do you mean?"
exit 2
;;
esac
done
shift $(( $OPTIND -1 ))
$RUN_FLAG && main
$TEST_FLAG && test_source
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment