Skip to content

Instantly share code, notes, and snippets.

@martinsotir
martinsotir / pandas_conditionnal_merge.py
Last active June 7, 2019 22:00
Conditional merge in pandas
import pandas as pd
def join_part(A, B, cond, left_on, right_on):
C = A.merge(B, left_on=left_on, right_on=right_on, how="inner", copy=False)
return C[cond].copy()
def conditional_join(A, B, cond, left_on, right_on, batch_size=50000):
indices = range(0, len(A) + batch_size, batch_size)
batches = (A.iloc[b_start : b_start + batch_size] for b_start in indices)
merges = (join_part(subset, B, cond, left_on, right_on) for subset in batches)
@martinsotir
martinsotir / ssh-multi.sh
Created January 27, 2019 20:44 — forked from dmytro/ssh-multi.sh
Start multiple synchronized SSH connections with Tmux
#!/bin/bash
# ssh-multi
# D.Kovalov
# Based on http://linuxpixies.blogspot.jp/2011/06/tmux-copy-mode-and-how-to-control.html
# a script to ssh multiple servers over multiple tmux panes
starttmux() {
if [ -z "$HOSTS" ]; then
@martinsotir
martinsotir / conda_4.6_powershell.md
Last active March 5, 2025 22:05
Enable conda in powershell

Enabling conda in Windows Powershell

First, in an administrator command prompt, enable unrestricted Powershell script execution (see About Execution Policies):

set-executionpolicy unrestricted

Then makes sure that the conda Script directory in is your Path.

@martinsotir
martinsotir / imagezipdataset.py
Created January 21, 2019 16:24
ImageZipDataset
import torch
from torch.utils.data import Dataset, DataLoader
import tarfile
import zipfile
from pathlib import Path
from PIL import Image
from tqdm import tqdm
from torchvision import transforms
import mmap
import torch.multiprocessing as mp
@martinsotir
martinsotir / geotiff_tiling_intro_with_gdal.ipynb
Created August 2, 2018 05:40
Introduction to GDAL raster tiling
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import itertools
import pandas as pd
def flatten_df(df, list_col, elem_col_name="elem"):
"""Convert a series of list to individual rows, within a dataframe.
Adapted from https://stackoverflow.com/a/48532692
This function can be used on a dask dataframe:
```python
df.map_partitions(lambda x: flatten_df(x, "list_col", elem_col_name="elem")).clear_divisions()
@martinsotir
martinsotir / Dockerfile
Last active March 14, 2018 18:48
patchwork_test_for_valera_1
# Instructions (requires docker):
# docker build -t patchwork .
# docker run -it --rm patchwork
FROM openjdk:8
RUN apt-get update
RUN apt-get install apt-transport-https
# Install sbt
"""
Utility script to visualize embeddings using the tensorboard projector module.
Usage
-----
Dependencies : numpy, pillow, pandas, tensorflow
Call `prepare_projection(embedding, metadata, image_paths, ...)`, where :
- `embedding` is a 2D numpy array (`n_sample` x `dim_embedding`)