Skip to content

Instantly share code, notes, and snippets.

View yu-iskw's full-sized avatar

Yu Ishikawa yu-iskw

View GitHub Profile
@yu-iskw
yu-iskw / gist:7a663dbea295ee767849
Created June 20, 2015 00:53
The result of `lint-r`: 2015-06-19 17:50
inst/profile/shell.R:29:3: style: Variable and function names should be all lowercase.
  sqlContext <- SparkR::sparkRSQL.init(sc)
  ^~~~~~~~~~
inst/profile/shell.R:30:24: style: Variable and function names should be all lowercase.
  assign("sqlContext", sqlContext, envir=.GlobalEnv)
                       ^~~~~~~~~~
inst/profile/shell.R:32:1: style: lines should not be more than 80 characters.
  cat("\n Spark context is available as sc, SQL context is available as sqlContext\n")
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@yu-iskw
yu-iskw / gist:12e92c2d718ca41dea90
Last active August 29, 2015 14:23
How are int and long int type treated between python2.6/python3.4 and Java?

I have done a survey about int type long int type in python2.6/python3.4 which were treated in Java. I created some simple methods to deal with int and long int in python and java. And then, I run unit tests.

Diff

@yu-iskw
yu-iskw / gist:f086c889097c615e1f07
Created June 22, 2015 21:29
Command to run the SparkR unit testing in Apache Spark
./R/install-dev.sh && ./R/run-tests.sh
@yu-iskw
yu-iskw / gist:f4c20dd78ab274cb99ec
Created June 22, 2015 21:55
The result of `lint-r` in Spark at b1f3a489efc6f4f9d172344c3345b9b38ae235e0
inst/tests/test_binary_function.R:33:1: style: Trailing whitespace is superfluous.
  
^~
inst/tests/test_binary_function.R:43:6: style: Put spaces around all infix operators.
  rdd<- map(text.rdd, function(x) {x})
    ~^
inst/tests/test_binary_function.R:55:57: style: Trailing whitespace is superfluous.
  cogroup.rdd <- cogroup(rdd1, rdd2, numPartitions = 2L) 
                                                        ^
@yu-iskw
yu-iskw / gist:0019b37a2c1167f33986
Created June 23, 2015 00:00
The result of `lintr` after removing the trailing whitespace for SPARK-8548

Grep

> grep 'style: Trailing whitespace' ./dev/lint-r-report.log  | wc -l
0

The full result

@yu-iskw
yu-iskw / gist:4d7ede75475ba9dc6f9a
Last active August 29, 2015 14:24
Removed HierarchicalClustering in Python

Scala Code

  /**
   * Java stub for Python mllib HierarchicalClustering.run()
   */
  def trainHierarchicalClusteringModel(
    data: JavaRDD[Vector],
    k: Int,
    maxIterations: Int,
@yu-iskw
yu-iskw / gist:d3f414f2c18b2abc5766
Created July 6, 2015 00:37
The result of `./dev/lint-r` at d9838196ff48faeac19756852a7f695129c08047
inst/tests/test_binary_function.R:43:6: style: Put spaces around all infix operators.
  rdd<- map(text.rdd, function(x) {x})
    ~^
inst/tests/test_binary_function.R:79:12: style: Use <-, not =, for assignment.
  mockFile = c("Spark is pretty.", "Spark is awesome.")
           ^
inst/tests/test_binaryFile.R:23:10: style: Use <-, not =, for assignment.
mockFile = c("Spark is pretty.", "Spark is awesome.")
         ^
@yu-iskw
yu-iskw / gist:822e84c08e0c26199ca4
Last active August 29, 2015 14:24
scipy.hierarchical.clustering.dendrogram
import os
import sys
from numpy import array
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram
merge_list = [
[0.0, 1.0, 0.866, 2],
@yu-iskw
yu-iskw / gist:2f1329ed5b0922460e80
Last active August 29, 2015 14:24
A command to launch a Spark cluster on EC2
_REGION='ap-northeast-1'
_ZONE='ap-northeast-1b'
_VERSION='1.4.0'
_MASTER_INSTANCE_TYPE='r3.large'
_SLAVE_INSTANCE_TYPE='r3.8xlarge'
_SLAVES=5
_PRICE=1.0
_CLUSTER_NAME="spark-cluster-v${_VERSION}-${_SLAVE_INSTANCE_TYPE}x${_SLAVES}"