-
Clone the repository (if you haven't already):
git clone https://github.com/birs-math/workshops.git cd workshops
-
Copy the example configuration files to create your actual configuration files:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# All model tests | |
bundle exec rspec spec/models/ | |
# All controller tests | |
bundle exec rspec spec/controllers/ | |
# All request tests | |
bundle exec rspec spec/requests/ | |
# All feature tests |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pip install outline-wiki-api | |
import os | |
os.environ["OUTLINE_API_KEY"] = "your_api_key" | |
os.environ["OUTLINE_INSTANCE_URL"] = "your_instance_url" | |
from outline_wiki_api import OutlineClient | |
client = OutlineClient(api_key=os.environ["OUTLINE_API_KEY"], server_url=os.environ["OUTLINE_INSTANCE_URL"]) | |
collection_id = "your_collection_id" | |
new_document = client.documents.create(title="My Document", collection_id=collection_id, text="Content of my document") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
""" | |
MP4 to MP3 Converter with MLX Parakeet Transcription (Threaded) | |
-------------------------------------------------- | |
This script processes MP4 files from a CSV file using multiple threads with a staged pipeline approach: | |
1. Reads a CSV file with video information | |
2. Downloads MP4 files if needed | |
3. Converts MP4 files to MP3 | |
4. Generates transcripts using MLX Parakeet |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
""" | |
MP4 Download Service | |
------------------- | |
This script handles only the downloading aspect of the MP4 processing pipeline: | |
1. Reads a CSV file with video information | |
2. Downloads MP4 files up to a configurable disk space limit | |
3. Maintains a queue of pending downloads | |
4. Provides robust resumption capability |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
""" | |
Multi-Process MP4 to MP3 Converter with Transcription | |
----------------------------------------------------- | |
This script processes MP4 files from a CSV file using multiple parallel processes: | |
1. Reads a CSV file with video information | |
2. Divides the work among multiple processes | |
3. Each process handles downloading, converting, and transcribing its assigned videos | |
4. Maintains robust resumption capability for each process |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from nemo.collections.asr import models as nemo_asr | |
import numpy as np | |
import librosa | |
import soundfile as sf | |
import os | |
def transcribe_with_chunking(audio_path, asr_model, chunk_duration=30, overlap_duration=2): | |
""" | |
Transcribe audio file by breaking it into overlapping chunks | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests | |
from bs4 import BeautifulSoup | |
import csv | |
import time | |
import re | |
from urllib.parse import urljoin | |
def fetch_page(url, max_retries=3, retry_delay=2): | |
"""Fetch a page with retry logic and return BeautifulSoup object""" | |
headers = { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
""" | |
Video Processing Script | |
This script processes a list of video files from a CSV file: | |
1. Downloads the MP4 videos from URLs | |
2. Converts the videos to MP3 format | |
3. Generates transcripts from the MP3 files | |
4. Stores files in appropriate folders |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from parakeet_mlx import from_pretrained | |
import numpy as np | |
import librosa | |
import soundfile as sf | |
import os | |
def transcribe_with_chunking(audio_path, model, chunk_duration=30, overlap_duration=2): | |
""" | |
Transcribe audio file by breaking it into overlapping chunks | |
NewerOlder