Skip to content

Instantly share code, notes, and snippets.

View bigsnarfdude's full-sized avatar

BigsnarfDude bigsnarfdude

View GitHub Profile
# All model tests
bundle exec rspec spec/models/
# All controller tests
bundle exec rspec spec/controllers/
# All request tests
bundle exec rspec spec/requests/
# All feature tests
pip install outline-wiki-api
import os
os.environ["OUTLINE_API_KEY"] = "your_api_key"
os.environ["OUTLINE_INSTANCE_URL"] = "your_instance_url"
from outline_wiki_api import OutlineClient
client = OutlineClient(api_key=os.environ["OUTLINE_API_KEY"], server_url=os.environ["OUTLINE_INSTANCE_URL"])
collection_id = "your_collection_id"
new_document = client.documents.create(title="My Document", collection_id=collection_id, text="Content of my document")
@bigsnarfdude
bigsnarfdude / mac_multi.py
Last active May 19, 2025 18:01
mac_multi.py
#!/usr/bin/env python3
"""
MP4 to MP3 Converter with MLX Parakeet Transcription (Threaded)
--------------------------------------------------
This script processes MP4 files from a CSV file using multiple threads with a staged pipeline approach:
1. Reads a CSV file with video information
2. Downloads MP4 files if needed
3. Converts MP4 files to MP3
4. Generates transcripts using MLX Parakeet
@bigsnarfdude
bigsnarfdude / download_service.py
Created May 19, 2025 16:24
download_service.py
#!/usr/bin/env python3
"""
MP4 Download Service
-------------------
This script handles only the downloading aspect of the MP4 processing pipeline:
1. Reads a CSV file with video information
2. Downloads MP4 files up to a configurable disk space limit
3. Maintains a queue of pending downloads
4. Provides robust resumption capability
@bigsnarfdude
bigsnarfdude / mp4_processor.py
Last active May 19, 2025 20:09
mp4_processor.py
#!/usr/bin/env python3
"""
Multi-Process MP4 to MP3 Converter with Transcription
-----------------------------------------------------
This script processes MP4 files from a CSV file using multiple parallel processes:
1. Reads a CSV file with video information
2. Divides the work among multiple processes
3. Each process handles downloading, converting, and transcribing its assigned videos
4. Maintains robust resumption capability for each process
from nemo.collections.asr import models as nemo_asr
import numpy as np
import librosa
import soundfile as sf
import os
def transcribe_with_chunking(audio_path, asr_model, chunk_duration=30, overlap_duration=2):
"""
Transcribe audio file by breaking it into overlapping chunks
@bigsnarfdude
bigsnarfdude / archive.birs.ca-scraper.py
Last active May 19, 2025 02:39
archive.birs.ca-scraper.py
import requests
from bs4 import BeautifulSoup
import csv
import time
import re
from urllib.parse import urljoin
def fetch_page(url, max_retries=3, retry_delay=2):
"""Fetch a page with retry logic and return BeautifulSoup object"""
headers = {
@bigsnarfdude
bigsnarfdude / transcribe.py
Last active May 19, 2025 00:45
transcribe.py
#!/usr/bin/env python3
"""
Video Processing Script
This script processes a list of video files from a CSV file:
1. Downloads the MP4 videos from URLs
2. Converts the videos to MP3 format
3. Generates transcripts from the MP3 files
4. Stores files in appropriate folders

Steps to Set Up BIRS Workshops with Docker

  1. Clone the repository (if you haven't already):

    git clone https://github.com/birs-math/workshops.git
    cd workshops
  2. Copy the example configuration files to create your actual configuration files:

@bigsnarfdude
bigsnarfdude / parakeet.py
Last active May 16, 2025 03:33
parakeet.py
from parakeet_mlx import from_pretrained
import numpy as np
import librosa
import soundfile as sf
import os
def transcribe_with_chunking(audio_path, model, chunk_duration=30, overlap_duration=2):
"""
Transcribe audio file by breaking it into overlapping chunks