Skip to content

Instantly share code, notes, and snippets.

@datavudeja
datavudeja / analysis.py
Created February 5, 2025 20:08 — forked from dwrodri/analysis.py
Video Analysis Tool
#!/usr/bin/env python3
import os
import itertools
import argparse
import json
import logging
import multiprocessing
import pathlib
import subprocess
import sys
@datavudeja
datavudeja / pload.py
Created February 5, 2025 20:09 — forked from DocDilbert/pload.py
Plugin based file operations
# coding=utf-8
import datetime
import os
import yaml
import click
import concurrent.futures
from tqdm import tqdm
import fnmatch
import re
import pathlib
"""Object-oriented filesystem paths.
This module provides classes to represent abstract paths and concrete
paths with operations that have semantics appropriate for different
operating systems.
"""
import fnmatch
import functools
import io
import os
import hashlib
import fnmatch
from pathlib import Path
from collections import namedtuple
def calculate_sha1(path, chunksize=8192):
"""Calculate sha1 hexdigest of file
@datavudeja
datavudeja / README.md
Created February 5, 2025 20:11 — forked from m-bartlett/README.md
git-like filesystem snapshotting utility which hashes file contents for uniqueness and hardlinks unchanged files to existing hashed files to avoid redundancy

SnapSHAt

This tool uses the existing Python standard library as of 3.11, no external dependencies are needed.

Note

When I was initially began writing this tool I was using SHA256 to compute file hashes but later found empirically that BLAKE2 was considerably faster, and since filesystems can easily exceed millions of files to hash it was a significant speedup to change the hashing algorithm. However, I still feel this pun encapsulates the essence of the tool and couldn't think of a better name.

Features

  • Stores unique file contents in a "blob cache" where the content files are renamed after the hash of their contents (this is similar to how git stores files).
  • Hardlinks files to these hashed blobs. The usefulness of this comes with subsequent snapshots, where presumably the majority of your filesystem is unchanged. Unchanged files will produce the same hash, and therefore can be hardlinked once again to the same blob which reuses the storage in the blob instead of creating a redundant copy o
@datavudeja
datavudeja / docker_utils.py
Created February 5, 2025 20:11 — forked from Roffild/docker_utils.py
Utilities for creating a docker image.
"""
Utilities for creating a docker image.
"""
import fnmatch
import json
import pathlib
import re
import stat
import subprocess
import tarfile
@datavudeja
datavudeja / 05_mixed_sorting.py
Created February 9, 2025 21:49 — forked from bgoonz/05_mixed_sorting.py
python scripts
# Mixed sorting
"""
Given a list of integers nums, sort the array such that:
All even numbers are sorted in increasing order
All odd numbers are sorted in decreasing order
The relative positions of the even and odd numbers remain the same
Example 1
Input
@datavudeja
datavudeja / install.sh
Created February 9, 2025 21:51 — forked from tranphuquy19/install.sh
jupyter_notebook_config.py
sudo apt update && sudo apt upgrade -y
sudo apt-get install python3 python3-pip python3-dev -y
python3 --version
pip3 install --upgrade pip
pip3 --version
pip3 install virtualenv
#!/usr/bin/env python
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------
#
# This script will install the CLI into a directory and create an executable
# at a specified file path that is the entry point into the CLI.
@datavudeja
datavudeja / check_requirements.py
Created February 9, 2025 21:55 — forked from JoaoG250/check_requirements.py
Python script that checks for differences between virtualenv and requirements.txt
import subprocess
def make_reqs_dict(req_list):
reqs = {}
for req in req_list:
if req != "" and "git+" not in req:
req_name, req_version = req.split("==")
reqs[req_name] = req_version
return reqs