Sometimes, you’d like to look at delimited files on the command line:
cat test.csv
# To reproduce a random sample, we need a fixed seed. | |
"{:_}".format(np.random.randint(np.iinfo(np.uint32).max)) |
import numpy as np | |
import pandas as pd | |
def frequency_histogram( | |
data: pd.DataFrame, | |
n_bins=20, | |
bins=None, | |
log_bins=False, | |
normalize=False, |
import cytoolz.curried as tz | |
from pathlib import Path | |
def find_project_dir(here: Path = None) -> Path: | |
""" | |
Get the path to the project directory | |
“Project directory” means the nearest parent directory of the | |
current directory that contains a `.git` directory. If there | |
is no such directory, returns this directory. |
N.B. SQLAlchemy now incorporates all of this information in its documentation; I’m leaving this post here, but recommend referring to SQLAlchemy instead of these instructions.
pip
or conda
, for example:""" | |
The schemas that Spark produces for DataFrames are typically | |
nested, and these nested schemas are quite difficult to work with | |
interactively. In many cases, it's possible to flatten a schema | |
into a single level of column names. | |
""" | |
import typing as T | |
import cytoolz.curried as tz |
""" | |
Remove the input cells from an HTML document generated from a Jupyter notebook | |
Reads from either STDIN or the named file, and writes to STDOUT | |
""" | |
import fileinput | |
from bs4 import BeautifulSoup | |
text = "".join(fileinput.input()) |
One of the things you end up with when you spend too much time reading Hacker News is a folder of very slick monospaced fonts designed for code editors. Are any of these fonts measurably better than whatever’s already installed on your system? Nope! Here’s my list.
This one is kind of a gimmick, but an incredibly clever one. It translates sequences of characters like 123{30,60,90}456
into spark lines, using some fancy features of the OTF format. See also their source code repository for the project. I haven’t used this nearly enough to tell if it works well in practice, but I will now be on the constant lookout for use cases.