Skip to content

Instantly share code, notes, and snippets.

@YoshihitoAso
Created May 16, 2013 06:40
Show Gist options
  • Select an option

  • Save YoshihitoAso/5589805 to your computer and use it in GitHub Desktop.

Select an option

Save YoshihitoAso/5589805 to your computer and use it in GitHub Desktop.
[Treasure Data][Fluentd]Apache の access_log を treasure data に送ってみた際のメモ
Apacheのaccess_logをTreasure Dataに送ってみた際のメモ
前提:TDのアカウントを作成する
▼ 監視サーバにfluentd(td-agent)をインストールする
$ curl -OL http://toolbelt.treasure-data.com/sh/install-redhat.sh
$ chmod 755 install-redhat.sh
$ ./install-redhat.sh
$ rm -f install-redhat.sh
$ service td-agent start
$ chkconfig td-agent on
▼ td-agentが利用するディレクトリの権限を変更 chmod, chgrp
$ sudo chgrp td-agent /var/log/httpd/
$ sudo chgrp td-agent /var/log/messages
$ sudo chgrp td-agent /var/log/secure
$ sudo chgrp td-agent /var/log/cron
$ sudo chmod g+rx /var/log/httpd/
$ sudo chmod g+rx /var/log/messages
$ sudo chmod g+rx /var/log/secure
$ sudo chmod g+rx /var/log/cron
▼ クライアント(自分はWindows PC)にTreasure Data Toolbeltのインストール
Windows用のインストーラは下記から取得
http://toolbelt.treasure-data.com/win
アカウント設定をする
$ td account -f
Enter your Treasure Data credentials.
Email:
Password (typing will be hidden):
APIキーを確認する
$ td apikey:show
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
▼ td-agent.conf の設定変更
# tail apache access_log
<source>
type tail
format apache
path /var/log/httpd/access_log
tag td.testdb.www_access
</source>
<match td.*.*>
type tdlog
apikey XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
auto_create_table
buffer_type file
buffer_path /var/log/td-agent/buffer/td
use_ssl true
</match>
apikeyは上記で取得したもの
DBは予め作成しておく(td db:create testdb とか)
▼サービスを起動
設定が完了したらサービスを再起動する
$ service td-agent restart
実際にログを発生させて、5分くらいするとデータが格納されているはず。
$ td tables
+----------+------------+------+-------+--------+
| Database | Table | Type | Count | Schema |
+----------+------------+------+-------+--------+
| testdb | www_access | log | 175 | |
+----------+------------+------+-------+--------+
1 row in set
▼ サンプルクエリ
試しに以下のようなクエリを投げてみる。
○ユーザエージェントごとの集計をするクエリ
$ td query -w -d testdb "SELECT v['agent'] AS agent, COUNT(1) AS cnt FROM www_access GROUP BY v['agent'] ORDER BY cnt DESC LIMIT 3"
---- こんな感じでHiveのログが出てくる -----
Job 2909409 is queued.
Use 'td job:show 2909409' to show the status.
queued...
started at 2013-05-16T06:24:30Z
Hive history file=/mnt/hive/tmp/932/hive_job_log__1147327208.txt
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Defaulting to jobconf value of: 4
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201305140230_3827, Tracking URL = http://ip-10-143-152-77.ec2.internal:50030/jobdetails.jsp?jobid=
ob_201305140230_3827
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201305140230_3827
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 4
2013-05-16 06:24:48,177 Stage-1 map = 0%, reduce = 0%
2013-05-16 06:24:53,259 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec
2013-05-16 06:24:54,289 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec
2013-05-16 06:24:55,344 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec
2013-05-16 06:24:56,363 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec
2013-05-16 06:24:57,383 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec
2013-05-16 06:24:58,404 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec
2013-05-16 06:24:59,422 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 5.09 sec
2013-05-16 06:25:00,441 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 5.09 sec
2013-05-16 06:25:01,461 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 5.09 sec
2013-05-16 06:25:02,480 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 7.95 sec
2013-05-16 06:25:03,500 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 7.95 sec
2013-05-16 06:25:04,519 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 7.95 sec
2013-05-16 06:25:05,528 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 7.95 sec
2013-05-16 06:25:06,538 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 10.45 sec
2013-05-16 06:25:07,547 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 10.45 sec
2013-05-16 06:25:08,559 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 10.45 sec
2013-05-16 06:25:09,576 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.02 sec
2013-05-16 06:25:10,585 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.02 sec
2013-05-16 06:25:11,595 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.02 sec
2013-05-16 06:25:12,604 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.02 sec
MapReduce Total cumulative CPU time: 13 seconds 20 msec
Ended Job = job_201305140230_3827
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201305140230_3828, Tracking URL = http://ip-10-143-152-77.ec2.internal:50030/jobdetails.jsp?jobid=
ob_201305140230_3828
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201305140230_3828
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2013-05-16 06:25:20,184 Stage-2 map = 0%, reduce = 0%
2013-05-16 06:25:25,239 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.83 sec
2013-05-16 06:25:26,257 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.83 sec
2013-05-16 06:25:27,267 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.83 sec
2013-05-16 06:25:28,287 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.83 sec
2013-05-16 06:25:29,301 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.4 sec
2013-05-16 06:25:30,325 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.4 sec
2013-05-16 06:25:31,334 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.4 sec
MapReduce Total cumulative CPU time: 3 seconds 400 msec
Ended Job = job_201305140230_3828
finished at 2013-05-16T06:25:32Z
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 4 Cumulative CPU: 13.02 sec HDFS Read: 537 HDFS Write: 1070 SUCCESS
Job 1: Map: 1 Reduce: 1 Cumulative CPU: 3.4 sec HDFS Read: 2179 HDFS Write: 357 SUCCESS
Total MapReduce CPU Time Spent: 16 seconds 420 msec
OK
MapReduce time taken: 52.696 seconds
Time taken: 52.916 seconds
Status : success
Result :
+--------------------------------------------------------------------------------------------------------------+-----+
| agent | cnt |
+--------------------------------------------------------------------------------------------------------------+-----+
| Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31 | 167 |
| check_http/v1.4.15 (nagios-plugins 1.4.15) | 5 |
| facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) | 2 |
+--------------------------------------------------------------------------------------------------------------+-----+
…Facebookさんのクローラがいる(・_・;)
○Top Path
$ td query -w -d testdb \
"SELECT v['path'] AS path, COUNT(1) AS cnt \
FROM www_access \
GROUP BY v['path'] ORDER BY cnt DESC LIMIT 3"
○ある日のアクセスランキング
$ td query -w -d testdb \
"SELECT v['referer'] AS referer, COUNT(1) AS cnt \
FROM www_access6 \
WHERE \
TD_TIME_RANGE(time, '2013-05-16', '2012-05-16', 'PDT') \
GROUP BY v['referer'] ORDER BY cnt DESC LIMIT 3"
その他のサンプルは公式の方を参照
http://docs.treasure-data.com/articles/analyzing-apache-logs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment