Created
May 16, 2013 06:40
-
-
Save YoshihitoAso/5589805 to your computer and use it in GitHub Desktop.
[Treasure Data][Fluentd]Apache の access_log を treasure data に送ってみた際のメモ
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Apacheのaccess_logをTreasure Dataに送ってみた際のメモ | |
| 前提:TDのアカウントを作成する | |
| ▼ 監視サーバにfluentd(td-agent)をインストールする | |
| $ curl -OL http://toolbelt.treasure-data.com/sh/install-redhat.sh | |
| $ chmod 755 install-redhat.sh | |
| $ ./install-redhat.sh | |
| $ rm -f install-redhat.sh | |
| $ service td-agent start | |
| $ chkconfig td-agent on | |
| ▼ td-agentが利用するディレクトリの権限を変更 chmod, chgrp | |
| $ sudo chgrp td-agent /var/log/httpd/ | |
| $ sudo chgrp td-agent /var/log/messages | |
| $ sudo chgrp td-agent /var/log/secure | |
| $ sudo chgrp td-agent /var/log/cron | |
| $ sudo chmod g+rx /var/log/httpd/ | |
| $ sudo chmod g+rx /var/log/messages | |
| $ sudo chmod g+rx /var/log/secure | |
| $ sudo chmod g+rx /var/log/cron | |
| ▼ クライアント(自分はWindows PC)にTreasure Data Toolbeltのインストール | |
| Windows用のインストーラは下記から取得 | |
| http://toolbelt.treasure-data.com/win | |
| アカウント設定をする | |
| $ td account -f | |
| Enter your Treasure Data credentials. | |
| Email: | |
| Password (typing will be hidden): | |
| APIキーを確認する | |
| $ td apikey:show | |
| XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | |
| ▼ td-agent.conf の設定変更 | |
| # tail apache access_log | |
| <source> | |
| type tail | |
| format apache | |
| path /var/log/httpd/access_log | |
| tag td.testdb.www_access | |
| </source> | |
| <match td.*.*> | |
| type tdlog | |
| apikey XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | |
| auto_create_table | |
| buffer_type file | |
| buffer_path /var/log/td-agent/buffer/td | |
| use_ssl true | |
| </match> | |
| apikeyは上記で取得したもの | |
| DBは予め作成しておく(td db:create testdb とか) | |
| ▼サービスを起動 | |
| 設定が完了したらサービスを再起動する | |
| $ service td-agent restart | |
| 実際にログを発生させて、5分くらいするとデータが格納されているはず。 | |
| $ td tables | |
| +----------+------------+------+-------+--------+ | |
| | Database | Table | Type | Count | Schema | | |
| +----------+------------+------+-------+--------+ | |
| | testdb | www_access | log | 175 | | | |
| +----------+------------+------+-------+--------+ | |
| 1 row in set | |
| ▼ サンプルクエリ | |
| 試しに以下のようなクエリを投げてみる。 | |
| ○ユーザエージェントごとの集計をするクエリ | |
| $ td query -w -d testdb "SELECT v['agent'] AS agent, COUNT(1) AS cnt FROM www_access GROUP BY v['agent'] ORDER BY cnt DESC LIMIT 3" | |
| ---- こんな感じでHiveのログが出てくる ----- | |
| Job 2909409 is queued. | |
| Use 'td job:show 2909409' to show the status. | |
| queued... | |
| started at 2013-05-16T06:24:30Z | |
| Hive history file=/mnt/hive/tmp/932/hive_job_log__1147327208.txt | |
| Total MapReduce jobs = 2 | |
| Launching Job 1 out of 2 | |
| Number of reduce tasks not specified. Defaulting to jobconf value of: 4 | |
| In order to change the average load for a reducer (in bytes): | |
| set hive.exec.reducers.bytes.per.reducer=<number> | |
| In order to limit the maximum number of reducers: | |
| set hive.exec.reducers.max=<number> | |
| In order to set a constant number of reducers: | |
| set mapred.reduce.tasks=<number> | |
| Starting Job = job_201305140230_3827, Tracking URL = http://ip-10-143-152-77.ec2.internal:50030/jobdetails.jsp?jobid= | |
| ob_201305140230_3827 | |
| Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201305140230_3827 | |
| Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 4 | |
| 2013-05-16 06:24:48,177 Stage-1 map = 0%, reduce = 0% | |
| 2013-05-16 06:24:53,259 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec | |
| 2013-05-16 06:24:54,289 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec | |
| 2013-05-16 06:24:55,344 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec | |
| 2013-05-16 06:24:56,363 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec | |
| 2013-05-16 06:24:57,383 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec | |
| 2013-05-16 06:24:58,404 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec | |
| 2013-05-16 06:24:59,422 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 5.09 sec | |
| 2013-05-16 06:25:00,441 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 5.09 sec | |
| 2013-05-16 06:25:01,461 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 5.09 sec | |
| 2013-05-16 06:25:02,480 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 7.95 sec | |
| 2013-05-16 06:25:03,500 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 7.95 sec | |
| 2013-05-16 06:25:04,519 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 7.95 sec | |
| 2013-05-16 06:25:05,528 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 7.95 sec | |
| 2013-05-16 06:25:06,538 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 10.45 sec | |
| 2013-05-16 06:25:07,547 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 10.45 sec | |
| 2013-05-16 06:25:08,559 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 10.45 sec | |
| 2013-05-16 06:25:09,576 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.02 sec | |
| 2013-05-16 06:25:10,585 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.02 sec | |
| 2013-05-16 06:25:11,595 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.02 sec | |
| 2013-05-16 06:25:12,604 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.02 sec | |
| MapReduce Total cumulative CPU time: 13 seconds 20 msec | |
| Ended Job = job_201305140230_3827 | |
| Launching Job 2 out of 2 | |
| Number of reduce tasks determined at compile time: 1 | |
| In order to change the average load for a reducer (in bytes): | |
| set hive.exec.reducers.bytes.per.reducer=<number> | |
| In order to limit the maximum number of reducers: | |
| set hive.exec.reducers.max=<number> | |
| In order to set a constant number of reducers: | |
| set mapred.reduce.tasks=<number> | |
| Starting Job = job_201305140230_3828, Tracking URL = http://ip-10-143-152-77.ec2.internal:50030/jobdetails.jsp?jobid= | |
| ob_201305140230_3828 | |
| Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201305140230_3828 | |
| Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 | |
| 2013-05-16 06:25:20,184 Stage-2 map = 0%, reduce = 0% | |
| 2013-05-16 06:25:25,239 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.83 sec | |
| 2013-05-16 06:25:26,257 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.83 sec | |
| 2013-05-16 06:25:27,267 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.83 sec | |
| 2013-05-16 06:25:28,287 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.83 sec | |
| 2013-05-16 06:25:29,301 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.4 sec | |
| 2013-05-16 06:25:30,325 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.4 sec | |
| 2013-05-16 06:25:31,334 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.4 sec | |
| MapReduce Total cumulative CPU time: 3 seconds 400 msec | |
| Ended Job = job_201305140230_3828 | |
| finished at 2013-05-16T06:25:32Z | |
| MapReduce Jobs Launched: | |
| Job 0: Map: 1 Reduce: 4 Cumulative CPU: 13.02 sec HDFS Read: 537 HDFS Write: 1070 SUCCESS | |
| Job 1: Map: 1 Reduce: 1 Cumulative CPU: 3.4 sec HDFS Read: 2179 HDFS Write: 357 SUCCESS | |
| Total MapReduce CPU Time Spent: 16 seconds 420 msec | |
| OK | |
| MapReduce time taken: 52.696 seconds | |
| Time taken: 52.916 seconds | |
| Status : success | |
| Result : | |
| +--------------------------------------------------------------------------------------------------------------+-----+ | |
| | agent | cnt | | |
| +--------------------------------------------------------------------------------------------------------------+-----+ | |
| | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31 | 167 | | |
| | check_http/v1.4.15 (nagios-plugins 1.4.15) | 5 | | |
| | facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) | 2 | | |
| +--------------------------------------------------------------------------------------------------------------+-----+ | |
| …Facebookさんのクローラがいる(・_・;) | |
| ○Top Path | |
| $ td query -w -d testdb \ | |
| "SELECT v['path'] AS path, COUNT(1) AS cnt \ | |
| FROM www_access \ | |
| GROUP BY v['path'] ORDER BY cnt DESC LIMIT 3" | |
| ○ある日のアクセスランキング | |
| $ td query -w -d testdb \ | |
| "SELECT v['referer'] AS referer, COUNT(1) AS cnt \ | |
| FROM www_access6 \ | |
| WHERE \ | |
| TD_TIME_RANGE(time, '2013-05-16', '2012-05-16', 'PDT') \ | |
| GROUP BY v['referer'] ORDER BY cnt DESC LIMIT 3" | |
| その他のサンプルは公式の方を参照 | |
| http://docs.treasure-data.com/articles/analyzing-apache-logs | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment