Skip to content

Instantly share code, notes, and snippets.

View sbatururimi's full-sized avatar

Stas Batururimi sbatururimi

View GitHub Profile
@sbatururimi
sbatururimi / spark_tips_and_tricks.md
Created January 18, 2019 13:52 — forked from dusenberrymw/spark_tips_and_tricks.md
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@sbatururimi
sbatururimi / sshd_config
Last active January 10, 2019 10:10 — forked from HacKanCuBa/sshd_config
Modern secure SSH daemon config
# Modern secure (OpenSSH Server 7+) SSHd config by HacKan
# Refer to the manual for more info: https://www.freebsd.org/cgi/man.cgi?sshd_config(5)
# Server fingerprint
# Regenerate with: ssh-keygen -f /etc/ssh/ssh_host_rsa_key -N '' -t rsa -b 4096
HostKey /etc/ssh/ssh_host_rsa_key
# Regerate with: ssh-keygen -f /etc/ssh/ssh_host_ed25519_key -N '' -t ed25519
HostKey /etc/ssh/ssh_host_ed25519_key
# Log for audit, even users' key fingerprint
@sbatururimi
sbatururimi / URL parsing Regex.js
Created September 4, 2018 15:03 — forked from metafeather/URL parsing Regex.js
URL parsing regex.js
/*
A single regex to parse and breakup a full URL including query parameters and anchors e.g.
https://www.google.com/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
*/
Url.regex = /^((http[s]?|ftp):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$/;
url: RegExp['$&'],
protocol: RegExp.$2,
host: RegExp.$3,
@sbatururimi
sbatururimi / bash-colors.md
Created September 4, 2018 10:23 — forked from iamnewton/bash-colors.md
The entire table of ANSI color codes.

Regular Colors

Value Color
\e[0;30m Black
\e[0;31m Red
\e[0;32m Green
\e[0;33m Yellow
\e[0;34m Blue
\e[0;35m Purple
@sbatururimi
sbatururimi / Install NVIDIA Driver and CUDA.md
Created August 31, 2018 15:19 — forked from wangruohui/Install NVIDIA Driver and CUDA.md
Install NVIDIA Driver and CUDA on Ubuntu / CentOS / Fedora Linux OS
@sbatururimi
sbatururimi / configure_cuda_p70.md
Last active October 1, 2018 00:02 — forked from alexlee-gk/configure_cuda_p70.md
Use integrated graphics for display and NVIDIA GPU for CUDA on Ubuntu 14.04

This was tested on a ThinkPad P70 laptop with an Intel integrated graphics and an NVIDIA GPU:

lspci | egrep 'VGA|3D'
00:02.0 VGA compatible controller: Intel Corporation Device 191b (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GM204GLM [Quadro M3000M] (rev a1)

A reason to use the integrated graphics for display is if installing the NVIDIA drivers causes the display to stop working properly. In my case, Ubuntu would get stuck in a login loop after installing the NVIDIA drivers. This happened regardless if I installed the drivers from the "Additional Drivers" tab in "System Settings" or the ppa:graphics-drivers/ppa in the command-line.

@sbatururimi
sbatururimi / vim
Last active August 12, 2019 05:15 — forked from while0pass/listchars.vim
show/hide hidden characters in Vim
" show hidden characters in Vim
:set list
" settings for hidden chars
" what particular chars they are displayed with
:set lcs=tab:▒░,trail:▓
" or
:set listchars=tab:▒░,trail:▓
" used \u2592\u2591 for tab and \u2593 for trailing spaces in line.
" In Vim help they suggest using ">-" for tab and "-" for trail.
@sbatururimi
sbatururimi / tmux-cheatsheet.markdown
Created June 29, 2018 11:27 — forked from MohamedAlaa/tmux-cheatsheet.markdown
tmux shortcuts & cheatsheet

tmux shortcuts & cheatsheet

start new:

tmux

start new with session name:

tmux new -s myname
@sbatururimi
sbatururimi / faster_toPandas.py
Created June 27, 2018 14:05 — forked from joshlk/faster_toPandas.py
PySpark faster toPandas using mapPartitions
import pandas as pd
def _map_to_pandas(rdds):
""" Needs to be here due to pickling issues """
return [pd.DataFrame(list(rdds))]
def toPandas(df, n_partitions=None):
"""
Returns the contents of `df` as a local `pandas.DataFrame` in a speedy fashion. The DataFrame is
repartitioned if `n_partitions` is passed.
@sbatururimi
sbatururimi / generate-ssh-key.sh
Created June 20, 2018 08:47 — forked from grenade/01-generate-ed25519-ssh-key.sh
Correct file permissions for ssh keys and config.
ssh-keygen -t rsa -b 4096 -N '' -C "[email protected]" -f ~/.ssh/id_rsa
ssh-keygen -t rsa -b 4096 -N '' -C "[email protected]" -f ~/.ssh/github_rsa
ssh-keygen -t rsa -b 4096 -N '' -C "[email protected]" -f ~/.ssh/mozilla_rsa