Skip to content

Instantly share code, notes, and snippets.

View prestonph's full-sized avatar
🎶

Preston Pham prestonph

🎶
  • Viet Nam
View GitHub Profile
@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active January 10, 2025 07:36
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@micahgodbolt
micahgodbolt / wsl_install_node.md
Last active December 22, 2022 09:37
WSL install Node

The apt-get version of node is incredibly old, and installing a new copy is a bit of a runaround.

So here's how you can use NVM to quickly get a fresh copy of Node on your new Bash on Windows install

$ touch ~/.bashrc
$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | bash
// restart bash
$ nvm install --lts
@SteefH
SteefH / 0 - blog.md
Last active May 11, 2019 05:24
Amorphous: Writing a Scala library for boilerplate-free object mapping

Amorphous: Writing a Scala library for boilerplate-free object mapping

At Infi, we started our first Scala project (link in Dutch) in mid-2016. When it became clear that Scala might be one of the technologies used in the project, I jumped at the chance to be part of it, because I'm always eager to learn new tech, and doing a project in a functional programming language was already near the top of my professional wish list.

As always when learning new technology, I like to push the envelope to see where things start to break down. I think that's a nice way to get to know the limits of that technology. As it turns out, Scala is a powerful language, with a strong type system that lets you use many advanced concepts I won't detail here (eg. type classes, high-level abstractions like the ones in the Typeclassopedia with the help of scalaz or [Cats](https://github.com

@non
non / seeds.md
Last active July 10, 2024 20:34
Simple example of using seeds with ScalaCheck for deterministic property-based testing.

introduction

ScalaCheck 1.14.0 was just released with support for deterministic testing using seeds. Some folks have asked for examples, so I wanted to produce a Gist to help people use this feature.

simple example

These examples will assume the following imports:

@gvolpe
gvolpe / di-in-fp.md
Last active September 16, 2024 07:18
Dependency Injection in Functional Programming

Dependency Injection in Functional Programming

There exist several DI frameworks / libraries in the Scala ecosystem. But the more functional code you write the more you'll realize there's no need to use any of them.

A few of the most claimed benefits are the following:

  • Dependency Injection.
  • Life cycle management.
  • Dependency graph rewriting.