Skip to content

Instantly share code, notes, and snippets.

View raidery's full-sized avatar
🎯
Focusing

Raider raidery

🎯
Focusing
  • IBM
  • China
View GitHub Profile
@ololobus
ololobus / Spark+ipython_on_MacOS.md
Last active September 26, 2024 08:50
Apache Spark installation + ipython/jupyter notebook integration guide for macOS

Apache Spark installation + ipython/jupyter notebook integration guide for macOS

Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112

For older versions of Spark and ipython, please, see also previous version of text.

Install Java Development Kit

@oza
oza / SparkOnYARN.md
Last active October 9, 2022 08:53
How to run Spark on YARN with dynamic resource allocation

YARN

  1. General resource management layer on HDFS
  2. A part of Hadoop

Spark

  1. In memory processing framework

Spark on YARN

@sadikovi
sadikovi / CollectionUDAF.scala
Last active June 12, 2020 11:05
UDAF for generating collection of values (for a specific limit)
import org.apache.spark.sql.Row
import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction}
import org.apache.spark.sql.types.{ArrayType, LongType, DataType, StructType, StructField}
class CollectionFunction(private val limit: Int) extends UserDefinedAggregateFunction {
def inputSchema: StructType =
StructType(StructField("value", LongType, false) :: Nil)
def bufferSchema: StructType =
StructType(StructField("list", ArrayType(LongType, true), true) :: Nil)
@frgomes
frgomes / AnyToDouble.scala
Last active January 23, 2022 23:15
Scala - Converts Any to Double, to LocalDate and Date
// this flavour is pure magic...
def toDouble: (Any) => Double = { case i: Int => i case f: Float => f case d: Double => d }
// whilst this flavour is longer but you are in full control...
object any2Double extends Function[Any,Double] {
def apply(any: Any): Double =
any match { case i: Int => i case f: Float => f case d: Double => d }
}
// like when you can invoke any2Double from another similar conversion...
@alopresto
alopresto / Merging PR for 2 branches
Last active May 15, 2024 15:16
Instructions to merge pull requests for multiple branches (master, support, etc.)
#Steps to merge/close pull requests with two main branches
As NiFi now has a 1.0 (master) and 0.x (support) branch, pull requests (PR) must be applied to both. Here is a step-by-step guide for committers to ensure this occurs for all PRs.
1. Check out the latest master
``` $ git checkout master
$ git pull upstream master
```
2. Check out the PR (example #327). This will be in `detached-HEAD` state. (Note: You may need to edit the `.git/config` file to add the `fetch` lines [below](#fetch))
import java.security.SecureRandom;
public class RandomString {
private static final String ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_";
private static final SecureRandom RANDOM = new SecureRandom();
/**
* Generates random string of given length from Base65 alphabet (numbers, lowercase letters, uppercase letters).
*
* @param count length
@tomysmile
tomysmile / mac-setup-redis.md
Last active May 5, 2025 13:06
Brew install Redis on Mac

type below:

brew update
brew install redis

To have launchd start redis now and restart at login:

brew services start redis
@littlecodersh
littlecodersh / main.py
Last active April 28, 2025 05:39
Main script behind itchat robot
#coding=utf8
import itchat
# tuling plugin can be get here:
# https://github.com/littlecodersh/EasierLife/tree/master/Plugins/Tuling
from tuling import get_response
@itchat.msg_register('Text')
def text_reply(msg):
if u'作者' in msg['Text'] or u'主人' in msg['Text']:
return u'你可以在这里了解他:https://github.com/littlecodersh'
@bernhardschaefer
bernhardschaefer / spark-submit-streaming-yarn.sh
Last active March 21, 2022 05:04
spark-submit template for running Spark Streaming on YARN (referenced in https://www.inovex.de/blog/247-spark-streaming-on-yarn-in-production/)
#!/bin/bash
# Minimum TODOs on a per job basis:
# 1. define name, application jar path, main class, queue and log4j-yarn.properties path
# 2. remove properties not applicable to your Spark version (Spark 1.x vs. Spark 2.x)
# 3. tweak num_executors, executor_memory (+ overhead), and backpressure settings
# the two most important settings:
num_executors=6
executor_memory=3g
import hashlib as hasher
import datetime as date
# Define what a Snakecoin block is
class Block:
def __init__(self, index, timestamp, data, previous_hash):
self.index = index
self.timestamp = timestamp
self.data = data
self.previous_hash = previous_hash