Created
June 20, 2016 12:56
-
-
Save developer-sdk/f0fd6e703b5f8e17514b9eee91b3d80f to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# url의 파일을 유니코드 인코딩으로 읽음 | |
textFile = sc.textFile("file_url", use_unicode=True) | |
# utf-8 인코딩을 이용하여 처리 | |
counts = textFile.flatMap(lambda line: str(line.encode('utf-8')).split("\n"))\ | |
.map(lambda line: (line.split("\t")[0], 1))\ | |
.reduceByKey(lambda a, b: a + b) | |
# hdfs에 result 폴더에 저장 | |
counts.saveAsTextFile("result") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks you. It works!!!!