Skip to content

Instantly share code, notes, and snippets.

View kmader's full-sized avatar

Kevin Mader kmader

  • Zurich, Switzerland
View GitHub Profile
@kmader
kmader / pom.xml
Created October 8, 2015 15:40
Compile spark-csv using maven instead of SBT which throws a few annoying error messages by using this pom
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>net.imagej</groupId>
<artifactId>pom-imagej</artifactId>
<version>5.0</version>
<relativePath />
</parent>
<groupId>fourquant</groupId>
@kmader
kmader / README.md
Last active January 11, 2016 18:34
Create markdown from nested latex files by following input and include commands using a simple python script

About

The script is to supplement the conversion in pandoc from latex to markdown by following simple commands like include and input since this is common in larger documents which are the ones that benefit most from pandoc's automatic conversion.

Example Command

To convert cv-stylish.tex to cv.md without intermediate files

python latexExpander.py cv-stylish.tex | pandoc --from=latex -o cv.md
@kmader
kmader / CellWorkflow.zip
Last active August 29, 2015 14:20 — forked from kmader/batchimage.condor
The basic condor setup for a job processing a simple cell image using KNIME
@kmader
kmader / issueFile.md
Created April 15, 2015 13:33
If you let the toDF and implicit conversion in SparkSQL convert a case class with a tuple as one of the fields, you cannot access this in the %sql environment. The test example I have is a field structured as follows
case class ShapeInformation2D(label: (Long, Double), comX: Double, comY: Double, extentsX: Double,extentsY: Double, area: Long)

which in SparkSQL is represented as

org.apache.spark.sql.DataFrame = [label: struct<_1:bigint,_2:double>, comX: double, comY: double, extentsX: double, extentsY: double, area: bigint]
%sql
select comX, comY, label._1 from mapdata
@kmader
kmader / README.md
Last active October 31, 2023 14:21
Beating Serialization in Spark

Serialization

As all objects must be Serializable to be used as part of RDD operations in Spark, it can be difficult to work with libraries which do not implement these featuers.

Java Solutions

Simple Classes

For simple classes, it is easiest to make a wrapper interface that extends Serializable. This means that even though UnserializableObject cannot be serialized we can pass in the following object without any issue

public interface UnserializableWrapper extends Serializable {
 public UnserializableObject create(String parm1, String parm2);
@kmader
kmader / Example.md
Last active August 29, 2015 14:15
Reactive Beamline Programming

Futures and Promises

Scala has a good overview of the idea of futures and promises in their standard documentation (http://docs.scala-lang.org/overviews/core/futures.html) and there is another nice (but very web-oriented) reactive programming introduction here (https://gist.github.com/staltz/868e7e9bc2a7b8c1f754). Finally the reactive manifest (http://www.reactivemanifesto.org/) The idea is rather than having a function return a response immediately, you deal with futures or potential responses

Standard Beamline Code

Not invented, taken from (https://bitbucket.org/psitomcat/beamline-scripts/src/61b78044fadf34563375fc55e763cc774b482897/scan/fasttomo_edge_Aerotech.mac?at=master)

myNtries = 0
busy="MOVING"
# Note: this loop is exited via a break.
while (busy=="MOVING") {
@kmader
kmader / README.md
Last active August 29, 2015 14:14
Manually Install spark-repl_2.10 for 1.2.0 without building

spark-repl 1.2 is missing

The repl was not included in the maven repositories which breaks the build for a number of other projects. There are two easy solutions for getting around this until it gets included in the next updates.

Install precompiled version

Take the jar file from this gist (click the download gist button) and run the following command in the terminal with it to include it in your local maven repository (~/.m2)

mvn install:install-file -Dfile=spark-repl_2.10-1.2.0.jar -DgroupId=org.apache.spark -DartifactId=spark-repl_2.10 -Dversion=1.2.0 -Dpackaging=jar

Build just REPL

@kmader
kmader / dup_deleter.py
Created December 18, 2014 21:02
Delete duplicate entries from a bibtex library (only the first)
bibstr='\n'.join(open('library.bib').readlines())
bibents=bibstr.split('@article')
def get_key(tstr):
(stpos,endpos) = (tstr.find('{'),tstr.find(','))
if(stpos>=0) & (endpos>stpos): return tstr[stpos+1:endpos]
else: return tstr # keep everything
biblist=map(lambda x: (get_key(x),x),bibents)
outlist=[]
keyalready={}
@kmader
kmader / foamAnimation.pvsm
Last active August 29, 2015 14:08
Viewing 3D animated data using Paraview
<ParaView>
<ServerManagerState version="3.14.1">
<Proxy group="animation" type="AnimationScene" id="261" servers="16">
<Property name="AnimationTime" id="261.AnimationTime" number_of_elements="1">
<Element index="0" value="1"/>
</Property>
<Property name="Caching" id="261.Caching" number_of_elements="1">
<Element index="0" value="1"/>
<Domain name="bool" id="261.Caching.bool"/>
</Property>
@kmader
kmader / PCADemo
Created September 17, 2014 13:25
Simple PCA Demo in Spark
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.linalg.{Matrix, Matrices}
import org.apache.spark.mllib.linalg.distributed.{IndexedRow, IndexedRowMatrix, RowMatrix}
val vecList = sc.parallelize(1 to 10).map( i => Vectors.dense(0,i*5+5,0,i,0))
val rm = new RowMatrix(b)
// pca
val prinComp = rm.computePrincipalComponents(3)
// calcupate projections
val projs: RowMatrix = rm.multiply(pc)