This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python | |
| import sys | |
| # input comes from STDIN (standard input) | |
| for line in sys.stdin: | |
| #clean and split in words | |
| linechars = [c for c in line.lower() if c.isalpha() or c==' '] | |
| words = ''.join(linechars).strip().split() | |
| #emit the key-balue pairs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| class ScrubFunction extends BaseOperation implements Function | |
| { | |
| public ScrubFunction( Fields fieldDeclaration ) | |
| { | |
| super( 1, fieldDeclaration ); | |
| } | |
| public void operate( FlowProcess flowProcess, FunctionCall functionCall ) | |
| { | |
| TupleEntry argument = functionCall.getArguments(); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/sh | |
| #check if the directory exists on hdfs | |
| $HADOOP_HOME/bin/hadoop fs -ls wordcount-input | |
| if [ $? -ne 0 ] | |
| then $HADOOP_HOME/bin/hadoop fs -mkdir wordcount-input/ | |
| fi | |
| #check if the lorem.txt exists on hdfs | |
| $HADOOP_HOME/bin/hadoop fs -ls wordcount-input/lorem.txt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| A = load 'wordcount-input/lorem.txt' as (line:chararray); | |
| B = foreach A generate FLATTEN(TOKENIZE(line)) as word; | |
| C = foreach B generate LOWER(REPLACE(word,'\\W+','')) as word; | |
| D = group C by word; | |
| E = foreach D generate group, COUNT(C); | |
| store E into 'wordcount-pig-output'; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| class WordCount(args : Args) extends Job(args) { | |
| TextLine(args("input")) | |
| .read | |
| .flatMap('line -> 'word){ line : String => line.split("\\s")} | |
| .groupBy('word){group => group.size} | |
| .write(Tsv(args("output"))) | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| class AnalyticsActor extends Actor { | |
| def actorRefFactory = context | |
| val dataActor = actorRefFactory.actorOf(Props[NoSqlActor], "cassandra-client") | |
| val statActor = actorRefFactory.actorOf(Props[StatActor], "statistical-engine") | |
| def receive = { | |
| case (a: String, c: String, ctx: RequestContext) => | |
| val f:Future[Result] = |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| @app.route('/word/<keyword>') | |
| def fetch_word(keyword): | |
| db = get_cassandra() | |
| pages = [] | |
| results = db.fetchWordResults(keyword) | |
| for hit in results: | |
| pages.append(db.fetchPageDetails(hit["url"])) | |
| return Response(json.dumps(pages), status=200, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| var all = { | |
| //screen | |
| 'screen.width' : screen.width, | |
| 'screen.height' : screen.height, | |
| 'screen.availWidth' : screen.availWidth, | |
| 'screen.availHeight' : screen.availHeight, | |
| 'screen.colorDepth' : screen.colorDepth, | |
| 'screen.pixelDepth' : screen.pixelDepth, | |
| //location | |
| 'location.href' : location.href, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| $> cat data.txt | grep "streming is awesome" > results.txt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from httpMethods import * | |
| # Create the graph (profiling tags) | |
| # get (as a http client) every 10 seconds json and emit it on | |
| post('/api/actors', | |
| { | |
| "type":"httpclient", | |
| "trigger": null, # can also be omitted altogether | |
| "collect":null, # can also be omitted altogether |