- If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
- Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
- Pay particular attention to the number of partitions when using
flatMap
, especially if the following operation will result in high memory usage. TheflatMap
op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output offlatMap
to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
% bin/nutch parsechecker -Dplugin.includes='protocol-selenium|parse-tika' \ | |
-Dselenium.grid.binary=.../geckodriver \ | |
-Dselenium.enable.headless=true \ | |
-followRedirects \ | |
-dumpText https://nutch.apache.org |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Terminal Cheat Sheet | |
pwd # print working directory | |
ls # list files in directory | |
cd # change directory | |
~ # home directory | |
.. # up one directory | |
- # previous working directory | |
help # get help | |
-h # get help |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To mount a partition at startup for all users, we need an entry in the fstab file. What is happening presently is, the HDD is getting mounted for the user who logs in which gives access permissions to only that user. By adding an entry in the fstab, the partition will be mounted by root with access to all users. this r/w access can be controlled later on. | |
sudo blkid lists down all partitions available on your system. Note down the UUID of the NTFS partition that you want to mount at boot. In your case, it seems 00148BDE148BD4D6 | |
now create a folder, for example sudo mkdir /media/ExtHDD01. This is the folder where your external HDD partition will be mounted at. This folder will be owned by root. To give other users permission to r/w into this folder we need to give the proper permissions. so chmod -R 777 /media/ExtHDD01 would be good enough. Now you need to edit your fstab file. to do so, type the following command. | |
sudo nano /etc/fstab | |
go to the bottom of the file and add the following line there. | |
UUID=001 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
set nocompatible " required | |
filetype off " required | |
" set the runtime path to include Vundle and initialize | |
set rtp+=~/.vim/bundle/Vundle.vim | |
call vundle#begin() | |
" alternatively, pass a path where Vundle should install plugins | |
"call vundle#begin('~/some/path/here') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def show_mem_usage(): | |
'''Displays memory usage from inspection | |
of global variables in this notebook''' | |
gl = sys._getframe(1).f_globals | |
vars= {} | |
for k,v in list(gl.items()): | |
# for pandas dataframes | |
if hasattr(v, 'memory_usage'): | |
mem = v.memory_usage(deep=True) | |
if not np.isscalar(mem): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
docker exec -it container_name /bin/bash -c "export COLUMNS=`tput cols`; export LINES=`tput lines`; exec bash" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"snippets" : [ | |
{ | |
"name" : "timing-notifications", | |
"code": [ | |
"%load_ext autoreload", | |
"%load_ext jupyternotify", | |
"%load_ext autotime" | |
] | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# -*- coding: utf-8 -*- | |
""" | |
Created on Thu Mar 29 09:57:55 2018 | |
@author: avsthiago | |
""" | |
from keras.preprocessing.image import ImageDataGenerator | |
import numpy as np |