Skip to content

Instantly share code, notes, and snippets.

@MLnick
MLnick / HyperLogLogStoreUDAF.scala
Last active March 16, 2022 05:31
Experimenting with Spark SQL UDAF - HyperLogLog UDAF for distinct counts, that stores the actual HLL for each row to allow further aggregation
class HyperLogLogStoreUDAF extends UserDefinedAggregateFunction {
override def inputSchema = new StructType()
.add("stringInput", BinaryType)
override def update(buffer: MutableAggregationBuffer, input: Row) = {
// This input Row only has a single column storing the input value in String (or other Binary data).
// We only update the buffer when the input value is not null.
if (!input.isNullAt(0)) {
if (buffer.isNullAt(0)) {
@chadrien
chadrien / README.md
Last active April 22, 2025 15:52
Debug PHP in Docker with PHPStorm and Xdebug

Debug your PHP in Docker with Intellij/PHPStorm and Xdebug

  1. For your local dev, create a Dockerfile that is based on your production image and simply install xdebug into it. Exemple:
FROM php:5

RUN yes | pecl install xdebug \
&& echo "zend_extension=$(find /usr/local/lib/php/extensions/ -name xdebug.so)" > /usr/local/etc/php/conf.d/xdebug.ini \
@philipz
philipz / gist:67892bdc8a385ecb3b8c
Last active November 14, 2017 22:57
Rsync & Docker save

Rsync & Docker save

docker save $1 > $1.tar && rsync -ravP -e ssh $1.tar [email protected]:/home/philipz/tmp && rm $1.tar ##Add Gzip docker save busybox | gzip -c - > busybox.tar.gz

gzip -d busybox.tar.gz && docker load < busybox.tar

docker save busybox | gzip | pv | ssh -i ~/.ssh/id_rsa USER@HOSTNAME sudo docker load

import org.apache.commons.lang.StringUtils;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import scala.Tuple2;

使用 gcloud 由 snapshot 建出機器並加入 loading balancer

--

THE_SNAPSHOT_NAME snapshot 建出一顆 disk MY_INSTANCE

gcloud compute disks create MY_INSTANCE --source-snapshot THE_SNAPSHOT_NAME --zone asia-east1-c
### DML ###
# Keyspace Name
keyspace: stresscql
# The CQL for creating a keyspace (optional if it already exists)
keyspace_definition: |
CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
# Table name
@staltz
staltz / introrx.md
Last active May 12, 2025 23:22
The introduction to Reactive Programming you've been missing
@jkreps
jkreps / benchmark-commands.txt
Last active May 13, 2025 07:28
Kafka Benchmark Commands
Producer
Setup
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3
Single thread, no replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
@stanlemon
stanlemon / func_speed.php
Last active April 6, 2024 04:03
Test the speed of calling a function in various ways in PHP.
<?php
function fooBar($hello, $world) {
// Nothing to see here...
}
$results = array();
// TEST 1: call_user_func
@qrtt1
qrtt1 / gradle.tips.condition.md
Last active August 29, 2015 13:57
Gradle Notes

在 gradle 內,執行特定 task 時,需要不同的設定時,可以透 taskGraph 來判斷。下面這例子,很典型地是在 publish 新的 library 時,不含設定檔:

gradle.taskGraph.whenReady { graph ->
    if (graph.hasTask(uploadArchives)) {
        jar {
            exclude '**/*.properties'
        }
 }