Skip to content

Instantly share code, notes, and snippets.

View trojblue's full-sized avatar

yada trojblue

  • Toronto
  • 07:03 (UTC -07:00)
View GitHub Profile
@trojblue
trojblue / llm_lecturer_prompt.md
Last active January 15, 2025 20:52
Let ChatGPT-o1 explain subject matters and complex topics, in an engaging and easily digestible way.

Lecturer Prompt for LLMs

Let ChatGPT-o1 explain subject matters and complex topics, in an engaging and easily digestible way.

To use the prompt, you can either:

  1. Save this this in "Custom instructions" part of ChatGPT / Claude,

Or in a new chat:

  1. Copy the prompt in front, using prompt structure similar to this:
@trojblue
trojblue / argilla_custom-field_s3_access.md
Created October 10, 2024 12:37
Label S3 images with Argilla [WIP]
  1. edit dockerfile to allow host gateway access:
version: '3'
services:
  argilla:
    image: argilla/argilla-server:latest
    ports:
      - "6900:6900"  # Existing Argilla port
    extra_hosts:

用几种办法来减少dataframe占用的内存:

  1. 去掉信息重复的columns
  2. 提前去掉不需要的行
  3. 转换数字到最小精度(-50%)
  4. 转换Python string (objects)为pyarrow str (-30%)
  5. 转换date string为pd datetime (-85%)
  6. 转换大量重复出现的string为category (-95%)

using functools.partial to pass in real arguments into kedro:

from functools import partial, update_wrapper
from kedro.pipeline import Pipeline, node

from .nodes import process_todo, DemoMerger


def create_wrapped_partial(func, *args, **kwargs):
import numpy as np
import matplotlib.pyplot as plt

def plot_quadratic_coefficients(coefficients):
    """
    Plots y = ax^2 + bx + c for each set of coefficients within specified x and y ranges.

    Parameters:
    - coefficients: dict, a dictionary of coefficient sets with 'a', 'b', and 'c' for each key.

用来数出df里某列 tag counts数量, 然后可视化的代码:

def safe_split_tag_str(tag_str, separator=","):
    """
    Splits a tag string into a list of non-empty, whitespace-stripped tag strings.
    """
    if not tag_str:
        return []

(pixiv-data-process/yada/13_pixiv_streamlined.ipynb)

输入一个(本地或者s3地址), 返回包含了所有文件的列表, 上传图片-meta的关系到s3:

(没那么多数据的时候可以直接这么用:)

# https://github.com/troph-team/build-it/blob/f996fe55a6fd2beda9e62a6624be0f0fe2a05848/buildit/sagemaker/parquet_splitter.py#L13
import os
from dataproc3.sagemaker import ParquetSplitter

nd setup, works on lambda h100 pcie:

conda:

cd ~/ && mkdir -p miniconda3 && wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.5.2-0-Linux-x86_64.sh -O ./miniconda3/miniconda.sh --no-check-certificate && bash ./miniconda3/miniconda.sh -b -u -p ./miniconda3 && rm ./miniconda3/miniconda.sh && ./miniconda3/bin/conda init bash && source ~/.bashrc  && python -m pip install unibox ipykernel jupyter poetry && python -m ipykernel install --user --name=conda310 

nd:

@trojblue
trojblue / extract_url_from_artstation_json.py
Created November 16, 2023 01:10
User fevercell_projects.json File extract all links from twitter or x.com from this json:
import json
# Function to extract handles from a given domain in a nested dictionary
def extract_handles(data, domain):
def find_handles(d):
handles = []
for k, v in d.items():
if isinstance(v, dict):
handles.extend(find_handles(v))
elif isinstance(v, list):