Skip to content

Instantly share code, notes, and snippets.

View kjsingh's full-sized avatar
🦇

Batman kjsingh

🦇
  • AWS
  • Vancouver, Canada
View GitHub Profile
@kjsingh
kjsingh / Test.java
Created October 12, 2023 18:48
Avro All Data Types
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.generic.GenericData.Array;
import org.apache.avro.generic.GenericData.EnumSymbol;
import org.apache.avro.generic.GenericData.Fixed;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.io.JsonEncoder;
@kjsingh
kjsingh / Split.scala
Created August 6, 2019 22:52
Split RDDs based on index
val rdd = sc.parallelize(1 to 100)
//rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24
var max = 3
//max: Int = 3
var multiRdds:List[org.apache.spark.rdd.RDD[Int]] = Nil
//multiRdds: List[org.apache.spark.rdd.RDD[Int]] = List()
for(i <- 0 until max) {multiRdds = rdd.zipWithIndex.filter(_._2 % max == i).map(_._1)::multiRdds}