Skip to content

Instantly share code, notes, and snippets.

View rounakdatta's full-sized avatar
💭
roses are sunny, noses are runny

Rounak Datta rounakdatta

💭
roses are sunny, noses are runny
View GitHub Profile
@isaacmg
isaacmg / run_jar.py
Created January 29, 2017 08:56
A simple example of using a DAG to run a jar file.
from airflow import DAG
from airflow.operators import BashOperator
from datetime import datetime
import os
import sys
args = {
'owner': 'airflow'
, 'start_date': datetime(2017, 1, 27)
, 'provide_context': True
@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active January 10, 2025 07:36
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
pragma solidity ^0.4.7;
contract Factory {
bytes32[] Names;
address[] newContracts;
function createContract (bytes32 name) {
address newContract = new Contract(name);
newContracts.push(newContract);
@w0rd-driven
w0rd-driven / passwords.txt
Created November 18, 2016 20:19
BFG Repo-Cleaner --replace-text example
PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass # replace with 'examplePass' instead
PASSWORD3==> # replace with the empty string
regex:password=\w+==>password= # Replace, using a regex
regex:\r(\n)==>$1 # Replace Windows newlines with Unix newlines
@laobubu
laobubu / ABOUT.md
Last active March 12, 2025 21:04
A very simple HTTP server in C, for Unix, using fork()

Pico HTTP Server in C

This is a very simple HTTP server for Unix, using fork(). It's very easy to use

How to use

  1. include header httpd.h
  2. write your route method, handling requests.
  3. call serve_forever("12913") to start serving on port 12913
@MichMich
MichMich / Amazon S3 Client Side Upload
Created November 2, 2016 13:00
Example of an Amazon S3 upload.
<!DOCTYPE html>
<html>
<head>
<title>AWS S3 File Upload</title>
<script src="https://sdk.amazonaws.com/js/aws-sdk-2.1.12.min.js"></script>
</head>
<body>
<input type="file" id="file-chooser" />
@luckydonald
luckydonald / answer.md
Last active January 9, 2023 19:14
How to get the hostnames and informations of other hosts in the same docker scale grouping. http://stackoverflow.com/a/39895650/3423324

The way I could do it was by using the docker api. I used the docker-py package to access it.

The api exposes a labels dictionary for each container, and the keys com.docker.compose.container-number, com.docker.compose.project and com.docker.compose.service did what was needed to build the hostname.

The code below is a simplified for code I am now using. You can find my advanced code with caching and fancy stuff that at Github at luckydonald/pbft/dockerus.ServiceInfos (backup at gist.github.com).

Host

gst-launch-1.0 -v v4l2src device=/dev/video0
! "image/jpeg,width=1280, height=720,framerate=30/1"
! rtpjpegpay
! udpsink host=$myip port=$myport

Client

gst-launch-1.0 -e -v udpsrc port=$myport !
application/x-rtp, encoding-name=JPEG,payload=26 !
rtpjpegdepay ! jpegdec ! \

@eddies
eddies / setup-notes.md
Created July 29, 2016 08:00
Spark 2.0.0 and Hadoop 2.7 with s3a setup

Standalone Spark 2.0.0 with s3

###Tested with:

  • Spark 2.0.0 pre-built for Hadoop 2.7
  • Mac OS X 10.11
  • Python 3.5.2

Goal

Use s3 within pyspark with minimal hassle.

@adamnew123456
adamnew123456 / diff.py
Last active February 1, 2025 03:18
An implementation of the Myers diff algorithm
# This is free and unencumbered software released into the public domain.
#
# Anyone is free to copy, modify, publish, use, compile, sell, or
# distribute this software, either in source code form or as a compiled
# binary, for any purpose, commercial or non-commercial, and by any
# means.
#
# In jurisdictions that recognize copyright laws, the author or authors
# of this software dedicate any and all copyright interest in the
# software to the public domain. We make this dedication for the benefit