Skip to content

Instantly share code, notes, and snippets.

View cnmoro's full-sized avatar
🎯
Focusing

Carlo Moro cnmoro

🎯
Focusing
View GitHub Profile
@cnmoro
cnmoro / sequence_mining_analysis.py
Created April 30, 2025 19:42
Sequence Mining Analysis
import pandas as pd
def sequence_mining_analysis(df, id_col, date_col, cat_col):
if not pd.api.types.is_datetime64_any_dtype(df[date_col]):
raise TypeError(f"Column '{date_col}' must be datetime.")
df = (df[[id_col, date_col, cat_col]]
.sort_values([id_col, date_col]))
grp = df.groupby(id_col)[cat_col]
@cnmoro
cnmoro / log_exception.py
Created March 11, 2025 19:20
log_exception
import traceback, inspect, re, datetime
def log_exception(e, custom_msg=None):
# Get the frame of the caller (the function where the exception was caught)
frame = inspect.currentframe().f_back
func_name = frame.f_code.co_name
filename = frame.f_code.co_filename
line_no = frame.f_lineno
@cnmoro
cnmoro / antelopev2-face-features.py
Created March 7, 2025 14:00
antelopev2-face-features.py
# pip install -U insightface mxnet onnx onnxruntime
# Initialize the FaceAnalysis app with the Antelope model
from insightface.app import FaceAnalysis
app = FaceAnalysis(name='antelopev2', root='./', providers=['CPUExecutionProvider'])
# Remove nested folder from download if needed
app.prepare(ctx_id=-1, det_size=(640, 640))
from PIL import Image
img = Image.open('82.png')
@cnmoro
cnmoro / SemanticDiffusionEncoder.py
Created January 8, 2025 04:01
SemanticDiffusionEncoder
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
from torchtext.vocab import build_vocab_from_iterator
@cnmoro
cnmoro / SpatioTemporalGraphEncoding.py
Created January 8, 2025 03:59
SpatioTemporalGraphEncoding
import numpy as np
import torch
from torch.utils.data import DataLoader, Dataset
from collections import defaultdict
class WordGraph:
@cnmoro
cnmoro / GraphWalkEncoder.py
Created January 8, 2025 03:57
GraphWalkEncoder
import numpy as np
import random
from collections import defaultdict
class GraphWalkEncoder:
def __init__(self, vocab_size, vector_size=64, walk_length=5):
@cnmoro
cnmoro / Docker-as-a-non-root-user.md
Created July 12, 2024 12:42 — forked from VictorNS69/Docker-as-a-non-root-user.md
Manage Docker as a non-root user

Manage Docker as a non-root user

The Docker daemon binds to a Unix socket instead of a TCP port. By default that Unix socket is owned by the user root and other users can only access it using sudo. The Docker daemon always runs as the root user.

If you don’t want to preface the docker command with sudo, create a Unix group called docker and add users to it. When the Docker daemon starts, it creates a Unix socket accessible by members of the docker group.

Warning: The docker group grants privileges equivalent to the root user. For details on how this impacts security in your system, see Docker Daemon Attack Surface.

Note: To run Docker without root privileges, see Run the Docker daemon as a non-root user (Rootless mode) .

@cnmoro
cnmoro / semantic_util.py
Created June 21, 2024 18:51
Semantic Chunking & Compressing
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from minivectordb.embedding_model import EmbeddingModel
from sklearn.metrics.pairwise import cosine_similarity
import tiktoken, nltk, numpy as np, fasttext, base64
from nltk.tokenize import sent_tokenize
from nltk.corpus import stopwords
nltk.download('punkt')
nltk.download('stopwords')
@cnmoro
cnmoro / tiktoken_chunkenizer_with_overlap.py
Created March 13, 2024 23:35
tiktoken_chunkenizer_with_overlap.py
import tiktoken
gpt_encoding = tiktoken.encoding_for_model("gpt-3.5-turbo-16k")
def chunk_text(full_text, tokens_per_chunk=300, chunk_overlap=20):
chunks = []
current_chunk = []
current_chunk_length = 0
tokens = gpt_encoding.encode(full_text)
for i, token in enumerate(tokens):
@cnmoro
cnmoro / zram Arch.md
Created January 17, 2024 19:07 — forked from zax4r0/zram Arch.md
Zram On Arch

zRam is a virtual memory compression using block devices named /dev/zram using a fast compression algorithm (LZ4) that compress the least recently used (LRU) or inactive space in the memory allows the GNU/Linux kernel to free up more memory with less performance hit.

zRam is greatly increased the available amount of memory by compressing memory without swap disks/partition. It is recommended for the user to use zRam instead of not use/disable the swap to prevent out of memory (OOM) killer. Create a zRam block devices Load the zRam modules to the kernel using modprobe:

sudo modprobe zram

Set the zRam extremely fast compression algorithm using lz4: